Search (2 results, page 1 of 1)

Did you mean:
themen%3a%22Data mining%22 2

Wu, Y.-f.B.; Li, Q.; Bot, R.S.; Chen, X.: Finding nuggets in documents : a machine learning approach (2006) 0.07
```
0.07102594 = product of:
  0.14205188 = sum of:
    0.14205188 = sum of:
      0.10255179 = weight(_text_:mining in 5290) [ClassicSimilarity], result of:
        0.10255179 = score(doc=5290,freq=2.0), product of:
          0.3290036 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.058308665 = queryNorm
          0.31170416 = fieldWeight in 5290, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5290)
      0.039500095 = weight(_text_:22 in 5290) [ClassicSimilarity], result of:
        0.039500095 = score(doc=5290,freq=2.0), product of:
          0.204187 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.058308665 = queryNorm
          0.19345059 = fieldWeight in 5290, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5290)
  0.5 = coord(1/2)
```
Abstract

Document keyphrases provide a concise summary of a document's content, offering semantic metadata summarizing a document. They can be used in many applications related to knowledge management and text mining, such as automatic text summarization, development of search engines, document clustering, document classification, thesaurus construction, and browsing interfaces. Because only a small portion of documents have keyphrases assigned by authors, and it is time-consuming and costly to manually assign keyphrases to documents, it is necessary to develop an algorithm to automatically generate keyphrases for documents. This paper describes a Keyphrase Identification Program (KIP), which extracts document keyphrases by using prior positive samples of human identified phrases to assign weights to the candidate keyphrases. The logic of our algorithm is: The more keywords a candidate keyphrase contains and the more significant these keywords are, the more likely this candidate phrase is a keyphrase. KIP's learning function can enrich the glossary database by automatically adding new identified keyphrases to the database. KIP's personalization feature will let the user build a glossary database specifically suitable for the area of his/her interest. The evaluation results show that KIP's performance is better than the systems we compared to and that the learning function is effective.

Date

22. 7.2006 17:25:48
Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.04
```
0.036257535 = product of:
  0.07251507 = sum of:
    0.07251507 = product of:
      0.14503014 = sum of:
        0.14503014 = weight(_text_:mining in 4215) [ClassicSimilarity], result of:
          0.14503014 = score(doc=4215,freq=4.0), product of:
            0.3290036 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.058308665 = queryNorm
            0.44081625 = fieldWeight in 4215, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4215)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.

Search (2 results, page 1 of 1)

Authors

Themes