Document (#33498)

Author
Srinivasan, P.
Title
Text mining in biomedicine : challenges and opportunities
Source
Knowledge organization, information systems and other essays: Professor A. Neelameghan Festschrift. Ed. by K.S. Raghavan and K.N. Prasad
Imprint
New Delhi : Ess Ess Publications
Year
2006
Pages
S.221-236
Abstract
Text mining is about making serendipity more likely. Serendipity, the chance discovery of interesting ideas, has been responsible for many discoveries in science. Text mining systems strive to explore large text collections, separate the potentially meaningfull connections from a vast and mostly noisy background of random associations. In this paper we provide a summary of our text mining approach and also illustrate briefly some of the experiments we have conducted with this approach. In particular we use a profile-based text mining method. We have used these profiles to explore the global distribution of disease research, replicate discoveries made by others and propose new hypotheses. Text mining holds much potential that has yet to be tapped.
Theme
Data Mining
Field
Medizin

Similar documents (author)

  1. Srinivasan, P.: Expert interface to Library of Congress Subject Headings (1990/91) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2209) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2209,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2209, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2209)
    
  2. Srinivasan, P.: Query expansion and MEDLINE (1996) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 8453) [ClassicSimilarity], result of:
        5.4077277 = score(doc=8453,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 8453, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=8453)
    
  3. Srinivasan, P.: Intelligent information retrieval using rough set approximations (1989) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2526) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2526,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2526, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2526)
    
  4. Srinivasan, P.: On generalizing the Two-Poisson Model (1990) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2880) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2880,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2880, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2880)
    
  5. Srinivasan, P.: Optimal document-indexing vocabulary for MEDLINE (1996) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 6634) [ClassicSimilarity], result of:
        5.4077277 = score(doc=6634,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 6634, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=6634)
    

Similar documents (content)

  1. Srinivasan, P.: Text mining : generating hypotheses from MEDLINE (2004) 0.29
    0.2928115 = sum of:
      0.2928115 = product of:
        1.0457554 = sum of:
          0.05172563 = weight(abstract_txt:connections in 2225) [ClassicSimilarity], result of:
            0.05172563 = score(doc=2225,freq=1.0), product of:
              0.10306657 = queryWeight, product of:
                1.0022911 = boost
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.016007593 = queryNorm
              0.5018662 = fieldWeight in 2225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.08849527 = weight(abstract_txt:profiles in 2225) [ClassicSimilarity], result of:
            0.08849527 = score(doc=2225,freq=2.0), product of:
              0.11701746 = queryWeight, product of:
                1.067973 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.016007593 = queryNorm
              0.75625694 = fieldWeight in 2225, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.067891724 = weight(abstract_txt:chance in 2225) [ClassicSimilarity], result of:
            0.067891724 = score(doc=2225,freq=1.0), product of:
              0.12355449 = queryWeight, product of:
                1.0973982 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.016007593 = queryNorm
              0.5494881 = fieldWeight in 2225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.07111411 = weight(abstract_txt:hypotheses in 2225) [ClassicSimilarity], result of:
            0.07111411 = score(doc=2225,freq=1.0), product of:
              0.12743376 = queryWeight, product of:
                1.1144927 = boost
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.016007593 = queryNorm
              0.55804765 = fieldWeight in 2225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.22016059 = weight(abstract_txt:discoveries in 2225) [ClassicSimilarity], result of:
            0.22016059 = score(doc=2225,freq=1.0), product of:
              0.3410492 = queryWeight, product of:
                2.5784485 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.016007593 = queryNorm
              0.6455391 = fieldWeight in 2225, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.1564448 = weight(abstract_txt:text in 2225) [ClassicSimilarity], result of:
            0.1564448 = score(doc=2225,freq=3.0), product of:
              0.28589967 = queryWeight, product of:
                4.416628 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016007593 = queryNorm
              0.54720175 = fieldWeight in 2225, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
          0.38992327 = weight(abstract_txt:mining in 2225) [ClassicSimilarity], result of:
            0.38992327 = score(doc=2225,freq=2.0), product of:
              0.5714881 = queryWeight, product of:
                5.781149 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.016007593 = queryNorm
              0.68229467 = fieldWeight in 2225, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=2225)
        0.28 = coord(7/25)
    
  2. Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.17
    0.16536167 = sum of:
      0.16536167 = product of:
        0.82680833 = sum of:
          0.012842932 = weight(abstract_txt:have in 1205) [ClassicSimilarity], result of:
            0.012842932 = score(doc=1205,freq=1.0), product of:
              0.05129796 = queryWeight, product of:
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.016007593 = queryNorm
              0.2503595 = fieldWeight in 1205, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.078125 = fieldNorm(doc=1205)
          0.12062918 = weight(abstract_txt:biomedicine in 1205) [ClassicSimilarity], result of:
            0.12062918 = score(doc=1205,freq=1.0), product of:
              0.18125175 = queryWeight, product of:
                1.3291563 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.016007593 = queryNorm
              0.66553384 = fieldWeight in 1205, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.078125 = fieldNorm(doc=1205)
          0.059334878 = weight(abstract_txt:explore in 1205) [ClassicSimilarity], result of:
            0.059334878 = score(doc=1205,freq=1.0), product of:
              0.14229752 = queryWeight, product of:
                1.6655153 = boost
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.016007593 = queryNorm
              0.41697758 = fieldWeight in 1205, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.078125 = fieldNorm(doc=1205)
          0.1564448 = weight(abstract_txt:text in 1205) [ClassicSimilarity], result of:
            0.1564448 = score(doc=1205,freq=3.0), product of:
              0.28589967 = queryWeight, product of:
                4.416628 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016007593 = queryNorm
              0.54720175 = fieldWeight in 1205, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1205)
          0.47755653 = weight(abstract_txt:mining in 1205) [ClassicSimilarity], result of:
            0.47755653 = score(doc=1205,freq=3.0), product of:
              0.5714881 = queryWeight, product of:
                5.781149 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.016007593 = queryNorm
              0.8356369 = fieldWeight in 1205, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=1205)
        0.2 = coord(5/25)
    
  3. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.15
    0.1516746 = sum of:
      0.1516746 = product of:
        0.9479663 = sum of:
          0.010274346 = weight(abstract_txt:have in 4019) [ClassicSimilarity], result of:
            0.010274346 = score(doc=4019,freq=1.0), product of:
              0.05129796 = queryWeight, product of:
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.016007593 = queryNorm
              0.20028761 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.047467902 = weight(abstract_txt:explore in 4019) [ClassicSimilarity], result of:
            0.047467902 = score(doc=4019,freq=1.0), product of:
              0.14229752 = queryWeight, product of:
                1.6655153 = boost
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.016007593 = queryNorm
              0.33358207 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.22850226 = weight(abstract_txt:text in 4019) [ClassicSimilarity], result of:
            0.22850226 = score(doc=4019,freq=10.0), product of:
              0.28589967 = queryWeight, product of:
                4.416628 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016007593 = queryNorm
              0.79923934 = fieldWeight in 4019, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.66172177 = weight(abstract_txt:mining in 4019) [ClassicSimilarity], result of:
            0.66172177 = score(doc=4019,freq=9.0), product of:
              0.5714881 = queryWeight, product of:
                5.781149 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.016007593 = queryNorm
              1.1578925 = fieldWeight in 4019, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
        0.16 = coord(4/25)
    
  4. Björneborn, L.: Three key affordances for serendipity : toward a framework connecting environmental and personal factors in serendipitous encounters (2017) 0.14
    0.13678192 = sum of:
      0.13678192 = product of:
        0.6839096 = sum of:
          0.0077057597 = weight(abstract_txt:have in 3947) [ClassicSimilarity], result of:
            0.0077057597 = score(doc=3947,freq=1.0), product of:
              0.05129796 = queryWeight, product of:
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.016007593 = queryNorm
              0.15021572 = fieldWeight in 3947, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.046875 = fieldNorm(doc=3947)
          0.053754855 = weight(abstract_txt:connections in 3947) [ClassicSimilarity], result of:
            0.053754855 = score(doc=3947,freq=3.0), product of:
              0.10306657 = queryWeight, product of:
                1.0022911 = boost
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.016007593 = queryNorm
              0.5215547 = fieldWeight in 3947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.046875 = fieldNorm(doc=3947)
          0.021307053 = weight(abstract_txt:approach in 3947) [ClassicSimilarity], result of:
            0.021307053 = score(doc=3947,freq=3.0), product of:
              0.070069924 = queryWeight, product of:
                1.1687343 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.016007593 = queryNorm
              0.30408272 = fieldWeight in 3947, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.046875 = fieldNorm(doc=3947)
          0.050347313 = weight(abstract_txt:explore in 3947) [ClassicSimilarity], result of:
            0.050347313 = score(doc=3947,freq=2.0), product of:
              0.14229752 = queryWeight, product of:
                1.6655153 = boost
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.016007593 = queryNorm
              0.35381722 = fieldWeight in 3947, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.046875 = fieldNorm(doc=3947)
          0.5507946 = weight(abstract_txt:serendipity in 3947) [ClassicSimilarity], result of:
            0.5507946 = score(doc=3947,freq=19.0), product of:
              0.33110446 = queryWeight, product of:
                2.5405777 = boost
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.016007593 = queryNorm
              1.6635071 = fieldWeight in 3947, product of:
                4.358899 = tf(freq=19.0), with freq of:
                  19.0 = termFreq=19.0
                8.14154 = idf(docFreq=34, maxDocs=44218)
                0.046875 = fieldNorm(doc=3947)
        0.2 = coord(5/25)
    
  5. Gordon, M.D.; Dumais, S.: Using latent semantic indexing for literature based discovery (1998) 0.12
    0.1215292 = sum of:
      0.1215292 = product of:
        0.607646 = sum of:
          0.015411519 = weight(abstract_txt:have in 4892) [ClassicSimilarity], result of:
            0.015411519 = score(doc=4892,freq=1.0), product of:
              0.05129796 = queryWeight, product of:
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.016007593 = queryNorm
              0.30043143 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2046018 = idf(docFreq=4876, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.06207076 = weight(abstract_txt:connections in 4892) [ClassicSimilarity], result of:
            0.06207076 = score(doc=4892,freq=1.0), product of:
              0.10306657 = queryWeight, product of:
                1.0022911 = boost
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.016007593 = queryNorm
              0.6022395 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4238877 = idf(docFreq=194, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.08533694 = weight(abstract_txt:hypotheses in 4892) [ClassicSimilarity], result of:
            0.08533694 = score(doc=4892,freq=1.0), product of:
              0.12743376 = queryWeight, product of:
                1.1144927 = boost
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.016007593 = queryNorm
              0.66965723 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.07120185 = weight(abstract_txt:explore in 4892) [ClassicSimilarity], result of:
            0.07120185 = score(doc=4892,freq=1.0), product of:
              0.14229752 = queryWeight, product of:
                1.6655153 = boost
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.016007593 = queryNorm
              0.5003731 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.337313 = idf(docFreq=577, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.37362492 = weight(abstract_txt:discoveries in 4892) [ClassicSimilarity], result of:
            0.37362492 = score(doc=4892,freq=2.0), product of:
              0.3410492 = queryWeight, product of:
                2.5784485 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.016007593 = queryNorm
              1.0955162 = fieldWeight in 4892, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
        0.2 = coord(5/25)