Document (#31931)

Editor
Berry, M.W.
Title
Survey of text mining : clustering, classification, and retrieval
Imprint
New York, NY : Springer
Year
2004
Pages
XVII, 244 S
Isbn
0-387-95563-1
Abstract
Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments. This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.
Theme
Data Mining
LCSH
Data mining ; Information retrieval
Data mining / Congresses (GBV)
Cluster analysis / Congresses (GBV)
Discriminant analysis / Congresses (GBV)
RSWK
Text Mining / Aufsatzsammlung
BK
54.72 / Künstliche Intelligenz
06.74 / Informationssysteme
DDC
005.741
006.3
LCC
QA76.9.D343
RVK
ST 270 Informatik / Monographien / Software und -entwicklung / Datenbanken, Datenbanksysteme, Data base management, Informationssysteme
ST 302 Informatik / Monographien / Künstliche Intelligenz / Expertensysteme; Wissensbasierte Systeme

Similar documents (content)

  1. Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany (2008) 0.51
    0.5063276 = sum of:
      0.5063276 = product of:
        2.5316381 = sum of:
          0.024260096 = weight(abstract_txt:will in 4669) [ClassicSimilarity], result of:
            0.024260096 = score(doc=4669,freq=2.0), product of:
              0.07090596 = queryWeight, product of:
                1.036374 = boost
                3.8709252 = idf(docFreq=2376, maxDocs=41962)
                0.017674677 = queryNorm
              0.34214467 = fieldWeight in 4669, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8709252 = idf(docFreq=2376, maxDocs=41962)
                0.0625 = fieldNorm(doc=4669)
          0.0689068 = weight(abstract_txt:scalable in 4669) [ClassicSimilarity], result of:
            0.0689068 = score(doc=4669,freq=1.0), product of:
              0.14220966 = queryWeight, product of:
                1.0378263 = boost
                7.7526994 = idf(docFreq=48, maxDocs=41962)
                0.017674677 = queryNorm
              0.4845437 = fieldWeight in 4669, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.7526994 = idf(docFreq=48, maxDocs=41962)
                0.0625 = fieldNorm(doc=4669)
          0.026076982 = weight(abstract_txt:needs in 4669) [ClassicSimilarity], result of:
            0.026076982 = score(doc=4669,freq=1.0), product of:
              0.09374237 = queryWeight, product of:
                1.1916345 = boost
                4.450834 = idf(docFreq=1330, maxDocs=41962)
                0.017674677 = queryNorm
              0.2781771 = fieldWeight in 4669, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.450834 = idf(docFreq=1330, maxDocs=41962)
                0.0625 = fieldNorm(doc=4669)
          0.062706135 = weight(abstract_txt:models in 4669) [ClassicSimilarity], result of:
            0.062706135 = score(doc=4669,freq=1.0), product of:
              0.21198952 = queryWeight, product of:
                2.5342374 = boost
                4.7327724 = idf(docFreq=1003, maxDocs=41962)
                0.017674677 = queryNorm
              0.29579827 = fieldWeight in 4669, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7327724 = idf(docFreq=1003, maxDocs=41962)
                0.0625 = fieldNorm(doc=4669)
          2.349688 = weight(subject_txt:congresses in 4669) [ClassicSimilarity], result of:
            2.349688 = score(doc=4669,freq=2.0), product of:
              0.4677993 = queryWeight, product of:
                3.260247 = boost
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.017674677 = queryNorm
              5.022855 = fieldWeight in 4669, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.4375 = fieldNorm(doc=4669)
        0.2 = coord(5/25)
    
  2. Emerging frameworks and methods : Proceedings of the Fourth International Conference on the Conceptions of Library and Information Science (CoLIS4), Seattle, WA, July 21 - 25, 2002 (2002) 0.30
    0.30268452 = sum of:
      0.30268452 = product of:
        1.5134225 = sum of:
          0.028862534 = weight(abstract_txt:theory in 2056) [ClassicSimilarity], result of:
            0.028862534 = score(doc=2056,freq=1.0), product of:
              0.10030457 = queryWeight, product of:
                1.2326378 = boost
                4.6039834 = idf(docFreq=1141, maxDocs=41962)
                0.017674677 = queryNorm
              0.28774896 = fieldWeight in 2056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.6039834 = idf(docFreq=1141, maxDocs=41962)
                0.0625 = fieldNorm(doc=2056)
          0.043277428 = weight(abstract_txt:approaches in 2056) [ClassicSimilarity], result of:
            0.043277428 = score(doc=2056,freq=2.0), product of:
              0.10429464 = queryWeight, product of:
                1.2569156 = boost
                4.694663 = idf(docFreq=1042, maxDocs=41962)
                0.017674677 = queryNorm
              0.4149535 = fieldWeight in 2056, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.694663 = idf(docFreq=1042, maxDocs=41962)
                0.0625 = fieldNorm(doc=2056)
          0.035897452 = weight(abstract_txt:processing in 2056) [ClassicSimilarity], result of:
            0.035897452 = score(doc=2056,freq=1.0), product of:
              0.11600413 = queryWeight, product of:
                1.325598 = boost
                4.951196 = idf(docFreq=806, maxDocs=41962)
                0.017674677 = queryNorm
              0.30944976 = fieldWeight in 2056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951196 = idf(docFreq=806, maxDocs=41962)
                0.0625 = fieldNorm(doc=2056)
          0.062706135 = weight(abstract_txt:models in 2056) [ClassicSimilarity], result of:
            0.062706135 = score(doc=2056,freq=1.0), product of:
              0.21198952 = queryWeight, product of:
                2.5342374 = boost
                4.7327724 = idf(docFreq=1003, maxDocs=41962)
                0.017674677 = queryNorm
              0.29579827 = fieldWeight in 2056, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7327724 = idf(docFreq=1003, maxDocs=41962)
                0.0625 = fieldNorm(doc=2056)
          1.3426789 = weight(subject_txt:congresses in 2056) [ClassicSimilarity], result of:
            1.3426789 = score(doc=2056,freq=2.0), product of:
              0.4677993 = queryWeight, product of:
                3.260247 = boost
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.017674677 = queryNorm
              2.8702028 = fieldWeight in 2056, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.25 = fieldNorm(doc=2056)
        0.2 = coord(5/25)
    
  3. TREC: experiment and evaluation in information retrieval (2005) 0.19
    0.19170737 = sum of:
      0.19170737 = product of:
        1.1981711 = sum of:
          0.01715448 = weight(abstract_txt:will in 1762) [ClassicSimilarity], result of:
            0.01715448 = score(doc=1762,freq=1.0), product of:
              0.07090596 = queryWeight, product of:
                1.036374 = boost
                3.8709252 = idf(docFreq=2376, maxDocs=41962)
                0.017674677 = queryNorm
              0.24193282 = fieldWeight in 1762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.8709252 = idf(docFreq=2376, maxDocs=41962)
                0.0625 = fieldNorm(doc=1762)
          0.035897452 = weight(abstract_txt:processing in 1762) [ClassicSimilarity], result of:
            0.035897452 = score(doc=1762,freq=1.0), product of:
              0.11600413 = queryWeight, product of:
                1.325598 = boost
                4.951196 = idf(docFreq=806, maxDocs=41962)
                0.017674677 = queryNorm
              0.30944976 = fieldWeight in 1762, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.951196 = idf(docFreq=806, maxDocs=41962)
                0.0625 = fieldNorm(doc=1762)
          1.0070091 = weight(subject_txt:congresses in 1762) [ClassicSimilarity], result of:
            1.0070091 = score(doc=1762,freq=2.0), product of:
              0.4677993 = queryWeight, product of:
                3.260247 = boost
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.017674677 = queryNorm
              2.152652 = fieldWeight in 1762, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.1875 = fieldNorm(doc=1762)
          0.13811009 = weight(abstract_txt:text in 1762) [ClassicSimilarity], result of:
            0.13811009 = score(doc=1762,freq=4.0), product of:
              0.27242732 = queryWeight, product of:
                3.800445 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.017674677 = queryNorm
              0.5069612 = fieldWeight in 1762, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.0625 = fieldNorm(doc=1762)
        0.16 = coord(4/25)
    
  4. Research and advanced technology for digital libraries : 8th European conference, ECDL 2004, Bath, UK, September 12-17, 2004 : proceedings (2004) 0.19
    0.18929368 = sum of:
      0.18929368 = product of:
        1.1830856 = sum of:
          0.03611454 = weight(abstract_txt:indexing in 4428) [ClassicSimilarity], result of:
            0.03611454 = score(doc=4428,freq=1.0), product of:
              0.08888427 = queryWeight, product of:
                1.1603462 = boost
                4.3339696 = idf(docFreq=1495, maxDocs=41962)
                0.017674677 = queryNorm
              0.40630966 = fieldWeight in 4428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3339696 = idf(docFreq=1495, maxDocs=41962)
                0.09375 = fieldNorm(doc=4428)
          0.045902643 = weight(abstract_txt:approaches in 4428) [ClassicSimilarity], result of:
            0.045902643 = score(doc=4428,freq=1.0), product of:
              0.10429464 = queryWeight, product of:
                1.2569156 = boost
                4.694663 = idf(docFreq=1042, maxDocs=41962)
                0.017674677 = queryNorm
              0.44012466 = fieldWeight in 4428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.694663 = idf(docFreq=1042, maxDocs=41962)
                0.09375 = fieldNorm(doc=4428)
          0.0940592 = weight(abstract_txt:models in 4428) [ClassicSimilarity], result of:
            0.0940592 = score(doc=4428,freq=1.0), product of:
              0.21198952 = queryWeight, product of:
                2.5342374 = boost
                4.7327724 = idf(docFreq=1003, maxDocs=41962)
                0.017674677 = queryNorm
              0.4436974 = fieldWeight in 4428, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7327724 = idf(docFreq=1003, maxDocs=41962)
                0.09375 = fieldNorm(doc=4428)
          1.0070091 = weight(subject_txt:congresses in 4428) [ClassicSimilarity], result of:
            1.0070091 = score(doc=4428,freq=2.0), product of:
              0.4677993 = queryWeight, product of:
                3.260247 = boost
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.017674677 = queryNorm
              2.152652 = fieldWeight in 4428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.118159 = idf(docFreq=33, maxDocs=41962)
                0.1875 = fieldNorm(doc=4428)
        0.16 = coord(4/25)
    
  5. Zhan, J.; Loh, H.T.: Using latent semantic indexing to improve the accuracy of document clustering (2007) 0.17
    0.17299798 = sum of:
      0.17299798 = product of:
        0.72082496 = sum of:
          0.084806964 = weight(abstract_txt:recommend in 2265) [ClassicSimilarity], result of:
            0.084806964 = score(doc=2265,freq=1.0), product of:
              0.14074579 = queryWeight, product of:
                1.032471 = boost
                7.712694 = idf(docFreq=50, maxDocs=41962)
                0.017674677 = queryNorm
              0.6025542 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.712694 = idf(docFreq=50, maxDocs=41962)
                0.078125 = fieldNorm(doc=2265)
          0.05802056 = weight(abstract_txt:document in 2265) [ClassicSimilarity], result of:
            0.05802056 = score(doc=2265,freq=4.0), product of:
              0.086734585 = queryWeight, product of:
                1.1462287 = boost
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.017674677 = queryNorm
              0.66894376 = fieldWeight in 2265, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.28124 = idf(docFreq=1576, maxDocs=41962)
                0.078125 = fieldNorm(doc=2265)
          0.030095447 = weight(abstract_txt:indexing in 2265) [ClassicSimilarity], result of:
            0.030095447 = score(doc=2265,freq=1.0), product of:
              0.08888427 = queryWeight, product of:
                1.1603462 = boost
                4.3339696 = idf(docFreq=1495, maxDocs=41962)
                0.017674677 = queryNorm
              0.33859137 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3339696 = idf(docFreq=1495, maxDocs=41962)
                0.078125 = fieldNorm(doc=2265)
          0.23633377 = weight(abstract_txt:clustering in 2265) [ClassicSimilarity], result of:
            0.23633377 = score(doc=2265,freq=7.0), product of:
              0.18357304 = queryWeight, product of:
                1.6675526 = boost
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.017674677 = queryNorm
              1.28741 = fieldWeight in 2265, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.2284193 = idf(docFreq=224, maxDocs=41962)
                0.078125 = fieldNorm(doc=2265)
          0.086318806 = weight(abstract_txt:text in 2265) [ClassicSimilarity], result of:
            0.086318806 = score(doc=2265,freq=1.0), product of:
              0.27242732 = queryWeight, product of:
                3.800445 = boost
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.017674677 = queryNorm
              0.31685078 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05569 = idf(docFreq=1975, maxDocs=41962)
                0.078125 = fieldNorm(doc=2265)
          0.22524938 = weight(abstract_txt:mining in 2265) [ClassicSimilarity], result of:
            0.22524938 = score(doc=2265,freq=1.0), product of:
              0.46157977 = queryWeight, product of:
                4.1808877 = boost
                6.246357 = idf(docFreq=220, maxDocs=41962)
                0.017674677 = queryNorm
              0.48799664 = fieldWeight in 2265, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.246357 = idf(docFreq=220, maxDocs=41962)
                0.078125 = fieldNorm(doc=2265)
        0.24 = coord(6/25)