Search (3467 results, page 1 of 174)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.31

0.3140459 = sum of:
  0.092609614 = product of:
    0.27782884 = sum of:
      0.27782884 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
        0.27782884 = score(doc=562,freq=2.0), product of:
          0.49434152 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.058308665 = queryNorm
          0.56201804 = fieldWeight in 562, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
    0.33333334 = coord(1/3)
  0.22143628 = sum of:
    0.17403616 = weight(_text_:mining in 562) [ClassicSimilarity], result of:
      0.17403616 = score(doc=562,freq=4.0), product of:
        0.3290036 = queryWeight, product of:
          5.642448 = idf(docFreq=425, maxDocs=44218)
          0.058308665 = queryNorm
        0.5289795 = fieldWeight in 562, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.642448 = idf(docFreq=425, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.047400113 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
      0.047400113 = score(doc=562,freq=2.0), product of:
        0.204187 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.058308665 = queryNorm
        0.23214069 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.26

0.25834233 = product of:
  0.51668465 = sum of:
    0.51668465 = sum of:
      0.40608436 = weight(_text_:mining in 4577) [ClassicSimilarity], result of:
        0.40608436 = score(doc=4577,freq=4.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          1.2342855 = fieldWeight in 4577, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.109375 = fieldNorm(doc=4577)
      0.11060026 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
        0.11060026 = score(doc=4577,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.5416616 = fieldWeight in 4577, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.109375 = fieldNorm(doc=4577)
  0.5 = coord(1/2)

Date: 2. 4.2000 18:01:22
Theme: Data Mining

Kostoff, R.N.; Rio, J.A. del; Humenik, J.A.; Garcia, E.O.; Ramirez, A.M.: Citation mining : integrating text mining and bibliometrics for research user profiling (2001) 0.17
```
0.17327127 = sum of:
  0.05020912 = product of:
    0.15062736 = sum of:
      0.15062736 = weight(_text_:themes in 6850) [ClassicSimilarity], result of:
        0.15062736 = score(doc=6850,freq=4.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.4018143 = fieldWeight in 6850, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.03125 = fieldNorm(doc=6850)
    0.33333334 = coord(1/3)
  0.12306215 = product of:
    0.2461243 = sum of:
      0.2461243 = weight(_text_:mining in 6850) [ClassicSimilarity], result of:
        0.2461243 = score(doc=6850,freq=18.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.74808997 = fieldWeight in 6850, product of:
            4.2426405 = tf(freq=18.0), with freq of:
              18.0 = termFreq=18.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.03125 = fieldNorm(doc=6850)
    0.5 = coord(1/2)
```
Abstract

Identifying the users and impact of research is important for research performers, managers, evaluators, and sponsors. It is important to know whether the audience reached is the audience desired. It is useful to understand the technical characteristics of the other research/development/applications impacted by the originating research, and to understand other characteristics (names, organizations, countries) of the users impacted by the research. Because of the many indirect pathways through which fundamental research can impact applications, identifying the user audience and the research impacts can be very complex and time consuming. The purpose of this article is to describe a novel approach for identifying the pathways through which research can impact other research, technology development, and applications, and to identify the technical and infrastructure characteristics of the user population. A novel literature-based approach was developed to identify the user community and its characteristics. The research performed is characterized by one or more articles accessed by the Science Citation Index (SCI) database, beccause the SCI's citation-based structure enables the capability to perform citation studies easily. The user community is characterized by the articles in the SCI that cite the original research articles, and that cite the succeeding generations of these articles as well. Text mining is performed on the citing articles to identify the technical areas impacted by the research, the relationships among these technical areas, and relationships among the technical areas and the infrastructure (authors, journals, organizations). A key component of text mining, concept clustering, was used to provide both a taxonomy of the citing articles' technical themes and further technical insights based on theme relationships arising from the grouping process. Bibliometrics is performed on the citing articles to profile the user characteristics. Citation Mining, this integration of citation bibliometrics and text mining, is applied to the 307 first generation citing articles of a fundamental physics article on the dynamics of vibrating sand-piles. Most of the 307 citing articles were basic research whose main themes were aligned with those of the cited article. However, about 20% of the citing articles were research or development in other disciplines, or development within the same discipline. The text mining alone identified the intradiscipline applications and extradiscipline impacts and applications; this was confirmed by detailed reading of the 307 abstracts. The combination of citation bibliometrics and text mining provides a synergy unavailable with each approach taken independently. Furthermore, text mining is a REQUIREMENT for a feasible comprehensive research impact determination. The integrated multigeneration citation analysis required for broad research impact determination of highly cited articles will produce thousands or tens or hundreds of thousands of citing article Abstracts.
Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.17
```
0.16997878 = sum of:
  0.04437901 = product of:
    0.13313703 = sum of:
      0.13313703 = weight(_text_:themes in 605) [ClassicSimilarity], result of:
        0.13313703 = score(doc=605,freq=2.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.35515702 = fieldWeight in 605, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.0390625 = fieldNorm(doc=605)
    0.33333334 = coord(1/3)
  0.12559977 = product of:
    0.25119954 = sum of:
      0.25119954 = weight(_text_:mining in 605) [ClassicSimilarity], result of:
        0.25119954 = score(doc=605,freq=12.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.7635161 = fieldWeight in 605, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.0390625 = fieldNorm(doc=605)
    0.5 = coord(1/2)
```
Abstract

Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Theme

Data Mining

Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.15

0.14762418 = product of:
  0.29524836 = sum of:
    0.29524836 = sum of:
      0.23204821 = weight(_text_:mining in 1737) [ClassicSimilarity], result of:
        0.23204821 = score(doc=1737,freq=4.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.705306 = fieldWeight in 1737, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.0625 = fieldNorm(doc=1737)
      0.06320015 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
        0.06320015 = score(doc=1737,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.30952093 = fieldWeight in 1737, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=1737)
  0.5 = coord(1/2)

Abstract: Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
Date: 22.11.1998 18:57:22
Theme: Data Mining

Polanco, X.; Francois, C.: Data clustering and cluster mapping or visualization in text processing and mining (2000) 0.14

0.1402729 = sum of:
  0.05325482 = product of:
    0.15976445 = sum of:
      0.15976445 = weight(_text_:themes in 129) [ClassicSimilarity], result of:
        0.15976445 = score(doc=129,freq=2.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.42618844 = fieldWeight in 129, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.046875 = fieldNorm(doc=129)
    0.33333334 = coord(1/3)
  0.08701808 = product of:
    0.17403616 = sum of:
      0.17403616 = weight(_text_:mining in 129) [ClassicSimilarity], result of:
        0.17403616 = score(doc=129,freq=4.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.5289795 = fieldWeight in 129, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.046875 = fieldNorm(doc=129)
    0.5 = coord(1/2)

Abstract: The focus of this paper is on a cooperative use of the text data clustering and mapping as visualization-based analysis tools. Whether we expose a generic approach in text processing and mining, we only concentrate on the two-middle steps of the process: data clustering and cluster mapping. In the data clustering analysis step, we use the axial k-means (AKM) algorithm: an iterative partitioning unsupervised winner-take-all (WTA) method, producing overlapping clusters. In the step of mapping the clusters, we use a nonlinear multilayer perceptron (MLP) with two hidden layers. Finally, the map is proposed as an analysis device rather than of visualization. It allows the analyst to evaluate the relative position of clusters which are indicators of themes induced from data themselves.

Keim, D.A.: Data Mining mit bloßem Auge (2002) 0.14

0.13758767 = product of:
  0.27517533 = sum of:
    0.27517533 = product of:
      0.55035067 = sum of:
        0.55035067 = weight(_text_:mining in 1086) [ClassicSimilarity], result of:
          0.55035067 = score(doc=1086,freq=10.0), product of:
            0.3290036 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.058308665 = queryNorm
            1.67278 = fieldWeight in 1086, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.09375 = fieldNorm(doc=1086)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Visualisierungen, die möglichst instruktive grafische Darstellung von Daten, ist wesentlicher Bestandteil des Data Mining
Footnote: Teil eines Heftthemas 'Data Mining'
Series: Data Mining
Theme: Data Mining

Miksa, S.D.: ¬The challenges of change : a review of cataloging and classification literature, 2003-2004 (2007) 0.13

0.13201831 = sum of:
  0.10041824 = product of:
    0.30125472 = sum of:
      0.30125472 = weight(_text_:themes in 266) [ClassicSimilarity], result of:
        0.30125472 = score(doc=266,freq=4.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.8036286 = fieldWeight in 266, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.0625 = fieldNorm(doc=266)
    0.33333334 = coord(1/3)
  0.031600077 = product of:
    0.06320015 = sum of:
      0.06320015 = weight(_text_:22 in 266) [ClassicSimilarity], result of:
        0.06320015 = score(doc=266,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.30952093 = fieldWeight in 266, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=266)
    0.5 = coord(1/2)

Abstract: This paper reviews the enormous changes in cataloging and classification reflected in the literature of 2003 and 2004, and discusses major themes and issues. Traditional cataloging and classification tools have been re-vamped and new resources have emerged. Most notable themes are: the continuing influence of the Functional Requirements for Bibliographic Control (FRBR); the struggle to understand the ever-broadening concept of an "information entity"; steady developments in metadata-encoding standards; and the globalization of information systems, including multilinguistic challenges.
Date: 10. 9.2000 17:38:22

Gnoli, C.: Classifying phenomena : part 4: themes and rhemes (2018) 0.13
```
0.1302097 = sum of:
  0.10650964 = product of:
    0.3195289 = sum of:
      0.3195289 = weight(_text_:themes in 4152) [ClassicSimilarity], result of:
        0.3195289 = score(doc=4152,freq=8.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.8523769 = fieldWeight in 4152, product of:
            2.828427 = tf(freq=8.0), with freq of:
              8.0 = termFreq=8.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.046875 = fieldNorm(doc=4152)
    0.33333334 = coord(1/3)
  0.023700057 = product of:
    0.047400113 = sum of:
      0.047400113 = weight(_text_:22 in 4152) [ClassicSimilarity], result of:
        0.047400113 = score(doc=4152,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.23214069 = fieldWeight in 4152, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=4152)
    0.5 = coord(1/2)
```
Abstract

This is the fourth in a series of papers on classification based on phenomena instead of disciplines. Together with types, levels and facets that have been discussed in the previous parts, themes and rhemes are further structural components of such a classification. In a statement or in a longer document, a base theme and several particular themes can be identified. Base theme should be cited first in a classmark, followed by particular themes, each with its own facets. In some cases, rhemes can also be expressed, that is new information provided about a theme, converting an abstract statement ("wolves, affected by cervids") into a claim that some thing actually occurs ("wolves are affected by cervids"). In the Integrative Levels Classification rhemes can be expressed by special deictic classes, including those for actual specimens, anaphoras, unknown values, conjunctions and spans, whole universe, anthropocentric favoured classes, and favoured host classes. These features, together with rules for pronounciation, make a classification of phenomena a true language, that may be suitable for many uses.

Date

17. 2.2018 18:22:25

Saz, J.T.: Perspectivas en recuperacion y explotacion de informacion electronica : el 'data mining' (1997) 0.13

0.12559977 = product of:
  0.25119954 = sum of:
    0.25119954 = product of:
      0.5023991 = sum of:
        0.5023991 = weight(_text_:mining in 3723) [ClassicSimilarity], result of:
          0.5023991 = score(doc=3723,freq=12.0), product of:
            0.3290036 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.058308665 = queryNorm
            1.5270323 = fieldWeight in 3723, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.078125 = fieldNorm(doc=3723)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Presents the concept and the techniques identified by the term data mining. Explains the principles and phases of developing a data mining process, and the main types of data mining tools
Footnote: Übers. des Titels: Perspectives on the retrieval and exploitation of electronic information: data mining
Theme: Data Mining

Wrobel, S.: Lern- und Entdeckungsverfahren (2002) 0.12

0.12306215 = product of:
  0.2461243 = sum of:
    0.2461243 = product of:
      0.4922486 = sum of:
        0.4922486 = weight(_text_:mining in 1105) [ClassicSimilarity], result of:
          0.4922486 = score(doc=1105,freq=8.0), product of:
            0.3290036 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.058308665 = queryNorm
            1.4961799 = fieldWeight in 1105, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.09375 = fieldNorm(doc=1105)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Betrügerische Kreditkartenkäufe, besonders fähige Basketballspieler und umweltbewusste Saftverkäufer ausfindig machen - Data-Mining-Verfahren lernen selbständig das Wesentliche
Footnote: Teil eines Heftthemas 'Data Mining'
Series: Data Mining
Theme: Data Mining

Peters, G.; Gaese, V.: ¬Das DocCat-System in der Textdokumentation von G+J (2003) 0.12
```
0.11627986 = product of:
  0.23255973 = sum of:
    0.23255973 = sum of:
      0.20095965 = weight(_text_:mining in 1507) [ClassicSimilarity], result of:
        0.20095965 = score(doc=1507,freq=12.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.6108129 = fieldWeight in 1507, product of:
            3.4641016 = tf(freq=12.0), with freq of:
              12.0 = termFreq=12.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.03125 = fieldNorm(doc=1507)
      0.031600077 = weight(_text_:22 in 1507) [ClassicSimilarity], result of:
        0.031600077 = score(doc=1507,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.15476047 = fieldWeight in 1507, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1507)
  0.5 = coord(1/2)
```
Abstract

Wir werden einmal die Grundlagen des Text-Mining-Systems bei IBM darstellen, dann werden wir das Projekt etwas umfangreicher und deutlicher darstellen, da kennen wir uns aus. Von daher haben wir zwei Teile, einmal Heidelberg, einmal Hamburg. Noch einmal zur Technologie. Text-Mining ist eine von IBM entwickelte Technologie, die in einer besonderen Ausformung und Programmierung für uns zusammengestellt wurde. Das Projekt hieß bei uns lange Zeit DocText Miner und heißt seit einiger Zeit auf Vorschlag von IBM DocCat, das soll eine Abkürzung für Document-Categoriser sein, sie ist ja auch nett und anschaulich. Wir fangen an mit Text-Mining, das bei IBM in Heidelberg entwickelt wurde. Die verstehen darunter das automatische Indexieren als eine Instanz, also einen Teil von Text-Mining. Probleme werden dabei gezeigt, und das Text-Mining ist eben eine Methode zur Strukturierung von und der Suche in großen Dokumentenmengen, die Extraktion von Informationen und, das ist der hohe Anspruch, von impliziten Zusammenhängen. Das letztere sei dahingestellt. IBM macht das quantitativ, empirisch, approximativ und schnell. das muss man wirklich sagen. Das Ziel, und das ist ganz wichtig für unser Projekt gewesen, ist nicht, den Text zu verstehen, sondern das Ergebnis dieser Verfahren ist, was sie auf Neudeutsch a bundle of words, a bag of words nennen, also eine Menge von bedeutungstragenden Begriffen aus einem Text zu extrahieren, aufgrund von Algorithmen, also im Wesentlichen aufgrund von Rechenoperationen. Es gibt eine ganze Menge von linguistischen Vorstudien, ein wenig Linguistik ist auch dabei, aber nicht die Grundlage der ganzen Geschichte. Was sie für uns gemacht haben, ist also die Annotierung von Pressetexten für unsere Pressedatenbank. Für diejenigen, die es noch nicht kennen: Gruner + Jahr führt eine Textdokumentation, die eine Datenbank führt, seit Anfang der 70er Jahre, da sind z.Z. etwa 6,5 Millionen Dokumente darin, davon etwas über 1 Million Volltexte ab 1993. Das Prinzip war lange Zeit, dass wir die Dokumente, die in der Datenbank gespeichert waren und sind, verschlagworten und dieses Prinzip haben wir auch dann, als der Volltext eingeführt wurde, in abgespeckter Form weitergeführt. Zu diesen 6,5 Millionen Dokumenten gehören dann eben auch ungefähr 10 Millionen Faksimileseiten, weil wir die Faksimiles auch noch standardmäßig aufheben.

Date

22. 4.2003 11:45:36

Theme

Data Mining

Tunbridge, N.: Semiology put to data mining (1999) 0.12

0.11602411 = product of:
  0.23204821 = sum of:
    0.23204821 = product of:
      0.46409643 = sum of:
        0.46409643 = weight(_text_:mining in 6782) [ClassicSimilarity], result of:
          0.46409643 = score(doc=6782,freq=4.0), product of:
            0.3290036 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.058308665 = queryNorm
            1.410612 = fieldWeight in 6782, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.125 = fieldNorm(doc=6782)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Theme: Data Mining

Grivel, L.; Mutschke, P.; Polanco, X.: Thematic mapping on bibliographic databases by cluster analysis : a description of the SDOC environment with SOLIS (1995) 0.12

0.11551604 = sum of:
  0.08786597 = product of:
    0.2635979 = sum of:
      0.2635979 = weight(_text_:themes in 1900) [ClassicSimilarity], result of:
        0.2635979 = score(doc=1900,freq=4.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.70317507 = fieldWeight in 1900, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1900)
    0.33333334 = coord(1/3)
  0.027650066 = product of:
    0.05530013 = sum of:
      0.05530013 = weight(_text_:22 in 1900) [ClassicSimilarity], result of:
        0.05530013 = score(doc=1900,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.2708308 = fieldWeight in 1900, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=1900)
    0.5 = coord(1/2)

Abstract: The paper presents a coword-analysis-based system called SDOC which is able to pupport the intellectual work of an end-user who is searching for information in a bibliographic database. This is done by presenting its thematical structure as a map of keyword clusters (themes) on a graphical user interface. These mapping facilities are demonstrated on the basis of the research field Social History given by a set of documents from the social science literature database SOLIS. Besides the traditional way of analysing a coword map as a strategic diagram, the notion of cluster relationships analysis is introduced which provides an adequate interpretation of links between themes
Source: Knowledge organization. 22(1995) no.2, S.70-77

Lin, X.; Li, J.; Zhou, X.: Theme creation for digital collections (2008) 0.12

0.11551604 = sum of:
  0.08786597 = product of:
    0.2635979 = sum of:
      0.2635979 = weight(_text_:themes in 2635) [ClassicSimilarity], result of:
        0.2635979 = score(doc=2635,freq=4.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.70317507 = fieldWeight in 2635, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2635)
    0.33333334 = coord(1/3)
  0.027650066 = product of:
    0.05530013 = sum of:
      0.05530013 = weight(_text_:22 in 2635) [ClassicSimilarity], result of:
        0.05530013 = score(doc=2635,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.2708308 = fieldWeight in 2635, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=2635)
    0.5 = coord(1/2)

Abstract: This paper presents an approach for integrating multiple sources of semantics for the creating metadata. A new framework is proposed to define topics and themes with both manually and automatically generated terms. The automatically generated terms include: terms from a semantic analysis of the collections and terms from previous user's queries. An interface is developed to facilitate the creation and use of such topics and themes for metadata creation. The framework and the interface promote human-computer collaboration in metadata creation. Several principles underlying such approach are also discussed.
Source: Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas

Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.11

0.114785895 = sum of:
  0.05325482 = product of:
    0.15976445 = sum of:
      0.15976445 = weight(_text_:themes in 1165) [ClassicSimilarity], result of:
        0.15976445 = score(doc=1165,freq=2.0), product of:
          0.3748681 = queryWeight, product of:
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.058308665 = queryNorm
          0.42618844 = fieldWeight in 1165, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            6.429029 = idf(docFreq=193, maxDocs=44218)
            0.046875 = fieldNorm(doc=1165)
    0.33333334 = coord(1/3)
  0.061531074 = product of:
    0.12306215 = sum of:
      0.12306215 = weight(_text_:mining in 1165) [ClassicSimilarity], result of:
        0.12306215 = score(doc=1165,freq=2.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.37404498 = fieldWeight in 1165, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.046875 = fieldNorm(doc=1165)
    0.5 = coord(1/2)

Abstract: One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.

Spertus, E.: ParaSite : mining structural information on the Web (1997) 0.11

0.1136415 = product of:
  0.227283 = sum of:
    0.227283 = sum of:
      0.16408285 = weight(_text_:mining in 2740) [ClassicSimilarity], result of:
        0.16408285 = score(doc=2740,freq=2.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.49872664 = fieldWeight in 2740, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.0625 = fieldNorm(doc=2740)
      0.06320015 = weight(_text_:22 in 2740) [ClassicSimilarity], result of:
        0.06320015 = score(doc=2740,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.30952093 = fieldWeight in 2740, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=2740)
  0.5 = coord(1/2)

Date: 1. 8.1996 22:08:06

Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.11

0.1136415 = product of:
  0.227283 = sum of:
    0.227283 = sum of:
      0.16408285 = weight(_text_:mining in 1270) [ClassicSimilarity], result of:
        0.16408285 = score(doc=1270,freq=2.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.49872664 = fieldWeight in 1270, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.0625 = fieldNorm(doc=1270)
      0.06320015 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
        0.06320015 = score(doc=1270,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.30952093 = fieldWeight in 1270, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=1270)
  0.5 = coord(1/2)

Source: Information systems. 22(1997) nos.5/6, S.333-347
Theme: Data Mining

Lawson, M.: Automatic extraction of citations from the text of English-language patents : an example of template mining (1996) 0.11
```
0.11071814 = product of:
  0.22143628 = sum of:
    0.22143628 = sum of:
      0.17403616 = weight(_text_:mining in 2654) [ClassicSimilarity], result of:
        0.17403616 = score(doc=2654,freq=4.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.5289795 = fieldWeight in 2654, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.046875 = fieldNorm(doc=2654)
      0.047400113 = weight(_text_:22 in 2654) [ClassicSimilarity], result of:
        0.047400113 = score(doc=2654,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.23214069 = fieldWeight in 2654, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=2654)
  0.5 = coord(1/2)
```
Abstract

Describes and evaluates methods for automatically isolating and extracting biliographic references from the full texts of patents, designed to facilitate the work of patent examiners who currently perform this task manually. These references include citations both to patents and to other bibliographic sources. Notes that patents are unusual as citing documents in that the citations occur maily in the body of the text, rather than as footnotes or in separate sections. Describes the natural language processing technique of template mining used to extract data directly from the text where either the data or the text surrounding the data form recognizable patterns. When text matches a template, the system extracts data according to instructions associated with that template. Examines the sub languages of citations and the development of templates for the extraction of citations to patent. Reports results of running 2 reference extraction systems against a sample of 100 European Patent Office patent documents, with recall and prescision data for patent and non patent citations, and concludes with suggestions for future improvements

Source

Journal of information science. 22(1996) no.6, S.423-436

Li, D.: Knowledge representation and discovery based on linguistic atoms (1998) 0.11

0.11071814 = product of:
  0.22143628 = sum of:
    0.22143628 = sum of:
      0.17403616 = weight(_text_:mining in 3836) [ClassicSimilarity], result of:
        0.17403616 = score(doc=3836,freq=4.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.5289795 = fieldWeight in 3836, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.046875 = fieldNorm(doc=3836)
      0.047400113 = weight(_text_:22 in 3836) [ClassicSimilarity], result of:
        0.047400113 = score(doc=3836,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.23214069 = fieldWeight in 3836, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=3836)
  0.5 = coord(1/2)

Abstract: Describes a new concept of linguistic atoms with 3 digital characteristics: expected value Ex, entropy En, and deviation D. The mathematical description has effectively integrated the fuzziness and randomness of linguistic terms in a unified way. Develops a method of knowledge representation in KDD, which bridges the gap between quantitative and qualitative knowledge. Mapping between quantities and qualities becomes much easier and interchangeable. In order to discover generalised knowledge from a database, uses virtual linguistic terms and cloud transfer for the auto-generation of concept hierarchies to attributes. Predicitve data mining with the cloud model is given for implementation. Illustrates the advantages of this linguistic model in KDD
Footnote: Contribution to a special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997

Search (3467 results, page 1 of 174)

Authors

Years

Languages

Types

Themes

Subjects

Classifications