Document (#27226)

Author
Srinivasan, P.
Title
Text mining : generating hypotheses from MEDLINE
Source
Journal of the American Society for Information Science and technology. 55(2004) no.5, S.396-413
Year
2004
Abstract
Hypothesis generation, a crucial initial step for making scientific discoveries, relies an prior knowledge, experience, and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automatically discovering a small set of interesting hypotheses from a suitable text collection. In this report, we present open and closed text mining algorithms that are built within the discovery framework established by Swanson and Smalheiser. Our algorithms represent topics using metadata profiles. When applied to MEDLINE, these are McSH based profiles. We present experiments that demonstrate the effectiveness of our algorithms. Specifically, our algorithms successfully generate ranked term lists where the key terms representing novel relationships between topics are ranked high.
Theme
Data Mining
Field
Medizin
Object
Medline

Similar documents (author)

  1. Srinivasan, P.: Expert interface to Library of Congress Subject Headings (1990/91) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2209) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2209,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2209, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2209)
    
  2. Srinivasan, P.: Query expansion and MEDLINE (1996) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 8453) [ClassicSimilarity], result of:
        5.4077277 = score(doc=8453,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 8453, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=8453)
    
  3. Srinivasan, P.: Intelligent information retrieval using rough set approximations (1989) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2526) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2526,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2526, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2526)
    
  4. Srinivasan, P.: On generalizing the Two-Poisson Model (1990) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 2880) [ClassicSimilarity], result of:
        5.4077277 = score(doc=2880,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 2880, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=2880)
    
  5. Srinivasan, P.: Optimal document-indexing vocabulary for MEDLINE (1996) 5.41
    5.4077277 = sum of:
      5.4077277 = weight(author_txt:srinivasan in 6634) [ClassicSimilarity], result of:
        5.4077277 = score(doc=6634,freq=1.0), product of:
          0.99999994 = queryWeight, product of:
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.115575336 = queryNorm
          5.407728 = fieldWeight in 6634, product of:
            1.0 = tf(freq=1.0), with freq of:
              1.0 = termFreq=1.0
            8.652365 = idf(docFreq=20, maxDocs=44218)
            0.625 = fieldNorm(doc=6634)
    

Similar documents (content)

  1. Srinivasan, P.: Text mining in biomedicine : challenges and opportunities (2006) 0.25
    0.24724542 = sum of:
      0.24724542 = product of:
        1.0301893 = sum of:
          0.075560294 = weight(abstract_txt:chance in 1497) [ClassicSimilarity], result of:
            0.075560294 = score(doc=1497,freq=1.0), product of:
              0.13751033 = queryWeight, product of:
                1.0913684 = boost
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.017914126 = queryNorm
              0.5494881 = fieldWeight in 1497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.033448 = idf(docFreq=105, maxDocs=44218)
                0.078125 = fieldNorm(doc=1497)
          0.17326118 = weight(abstract_txt:discoveries in 1497) [ClassicSimilarity], result of:
            0.17326118 = score(doc=1497,freq=2.0), product of:
              0.18978581 = queryWeight, product of:
                1.2821405 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.017914126 = queryNorm
              0.91293013 = fieldWeight in 1497, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.078125 = fieldNorm(doc=1497)
          0.13928741 = weight(abstract_txt:profiles in 1497) [ClassicSimilarity], result of:
            0.13928741 = score(doc=1497,freq=1.0), product of:
              0.26046985 = queryWeight, product of:
                2.1242101 = boost
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.017914126 = queryNorm
              0.53475446 = fieldWeight in 1497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8448567 = idf(docFreq=127, maxDocs=44218)
                0.078125 = fieldNorm(doc=1497)
          0.15829329 = weight(abstract_txt:hypotheses in 1497) [ClassicSimilarity], result of:
            0.15829329 = score(doc=1497,freq=1.0), product of:
              0.28365552 = queryWeight, product of:
                2.216738 = boost
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.017914126 = queryNorm
              0.55804765 = fieldWeight in 1497, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.078125 = fieldNorm(doc=1497)
          0.14070672 = weight(abstract_txt:text in 1497) [ClassicSimilarity], result of:
            0.14070672 = score(doc=1497,freq=6.0), product of:
              0.18182448 = queryWeight, product of:
                2.5099201 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017914126 = queryNorm
              0.77386016 = fieldWeight in 1497, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=1497)
          0.34308037 = weight(abstract_txt:mining in 1497) [ClassicSimilarity], result of:
            0.34308037 = score(doc=1497,freq=5.0), product of:
              0.3180196 = queryWeight, product of:
                2.874692 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.017914126 = queryNorm
              1.0788026 = fieldWeight in 1497, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=1497)
        0.24 = coord(6/25)
    
  2. Liu, B.: Web data mining : exploring hyperlinks, contents, and usage data (2011) 0.13
    0.1258662 = sum of:
      0.1258662 = product of:
        0.7866638 = sum of:
          0.04571795 = weight(abstract_txt:topics in 354) [ClassicSimilarity], result of:
            0.04571795 = score(doc=354,freq=1.0), product of:
              0.14381827 = queryWeight, product of:
                1.5784316 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.017914126 = queryNorm
              0.31788695 = fieldWeight in 354, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
          0.091909245 = weight(abstract_txt:text in 354) [ClassicSimilarity], result of:
            0.091909245 = score(doc=354,freq=4.0), product of:
              0.18182448 = queryWeight, product of:
                2.5099201 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017914126 = queryNorm
              0.5054833 = fieldWeight in 354, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
          0.4251983 = weight(abstract_txt:mining in 354) [ClassicSimilarity], result of:
            0.4251983 = score(doc=354,freq=12.0), product of:
              0.3180196 = queryWeight, product of:
                2.874692 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.017914126 = queryNorm
              1.3370191 = fieldWeight in 354, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
          0.22383824 = weight(abstract_txt:algorithms in 354) [ClassicSimilarity], result of:
            0.22383824 = score(doc=354,freq=3.0), product of:
              0.36225578 = queryWeight, product of:
                3.5427573 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.017914126 = queryNorm
              0.6179011 = fieldWeight in 354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=354)
        0.16 = coord(4/25)
    
  3. Menczer, F.: Lexical and semantic clustering by Web links (2004) 0.12
    0.11758325 = sum of:
      0.11758325 = product of:
        0.58791625 = sum of:
          0.02551736 = weight(abstract_txt:between in 3090) [ClassicSimilarity], result of:
            0.02551736 = score(doc=3090,freq=2.0), product of:
              0.06668529 = queryWeight, product of:
                1.074815 = boost
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.017914126 = queryNorm
              0.3826535 = fieldWeight in 3090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4633842 = idf(docFreq=3764, maxDocs=44218)
                0.078125 = fieldNorm(doc=3090)
          0.03572481 = weight(abstract_txt:present in 3090) [ClassicSimilarity], result of:
            0.03572481 = score(doc=3090,freq=1.0), product of:
              0.105146825 = queryWeight, product of:
                1.3496366 = boost
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.017914126 = queryNorm
              0.3397612 = fieldWeight in 3090, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.348943 = idf(docFreq=1552, maxDocs=44218)
                0.078125 = fieldNorm(doc=3090)
          0.08123706 = weight(abstract_txt:text in 3090) [ClassicSimilarity], result of:
            0.08123706 = score(doc=3090,freq=2.0), product of:
              0.18182448 = queryWeight, product of:
                2.5099201 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017914126 = queryNorm
              0.44678837 = fieldWeight in 3090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3090)
          0.21698308 = weight(abstract_txt:mining in 3090) [ClassicSimilarity], result of:
            0.21698308 = score(doc=3090,freq=2.0), product of:
              0.3180196 = queryWeight, product of:
                2.874692 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.017914126 = queryNorm
              0.68229467 = fieldWeight in 3090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=3090)
          0.22845395 = weight(abstract_txt:algorithms in 3090) [ClassicSimilarity], result of:
            0.22845395 = score(doc=3090,freq=2.0), product of:
              0.36225578 = queryWeight, product of:
                3.5427573 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.017914126 = queryNorm
              0.63064265 = fieldWeight in 3090, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.078125 = fieldNorm(doc=3090)
        0.2 = coord(5/25)
    
  4. Weeber, M.; Klein, H.; Jong-van den Berg, L.T.W. de; Vos, R.: Using concepts in literature-based discovery : simulating Swanson's Raynaud-Fish Oil and Migraine-Manesium discoveries (2001) 0.11
    0.11353613 = sum of:
      0.11353613 = product of:
        0.7096008 = sum of:
          0.07789654 = weight(abstract_txt:successfully in 5910) [ClassicSimilarity], result of:
            0.07789654 = score(doc=5910,freq=1.0), product of:
              0.124269396 = queryWeight, product of:
                1.0374945 = boost
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.017914126 = queryNorm
              0.6268361 = fieldWeight in 5910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.686252 = idf(docFreq=149, maxDocs=44218)
                0.09375 = fieldNorm(doc=5910)
          0.14701699 = weight(abstract_txt:discoveries in 5910) [ClassicSimilarity], result of:
            0.14701699 = score(doc=5910,freq=1.0), product of:
              0.18978581 = queryWeight, product of:
                1.2821405 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.017914126 = queryNorm
              0.7746469 = fieldWeight in 5910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.09375 = fieldNorm(doc=5910)
          0.21605463 = weight(abstract_txt:swanson in 5910) [ClassicSimilarity], result of:
            0.21605463 = score(doc=5910,freq=1.0), product of:
              0.24531707 = queryWeight, product of:
                1.4576982 = boost
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.017914126 = queryNorm
              0.88071585 = fieldWeight in 5910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.394302 = idf(docFreq=9, maxDocs=44218)
                0.09375 = fieldNorm(doc=5910)
          0.26863265 = weight(abstract_txt:hypotheses in 5910) [ClassicSimilarity], result of:
            0.26863265 = score(doc=5910,freq=2.0), product of:
              0.28365552 = queryWeight, product of:
                2.216738 = boost
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.017914126 = queryNorm
              0.9470383 = fieldWeight in 5910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.14301 = idf(docFreq=94, maxDocs=44218)
                0.09375 = fieldNorm(doc=5910)
        0.16 = coord(4/25)
    
  5. Mining text data (2012) 0.10
    0.10491982 = sum of:
      0.10491982 = product of:
        0.6557489 = sum of:
          0.04571795 = weight(abstract_txt:topics in 362) [ClassicSimilarity], result of:
            0.04571795 = score(doc=362,freq=1.0), product of:
              0.14381827 = queryWeight, product of:
                1.5784316 = boost
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.017914126 = queryNorm
              0.31788695 = fieldWeight in 362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.086191 = idf(docFreq=742, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.11256537 = weight(abstract_txt:text in 362) [ClassicSimilarity], result of:
            0.11256537 = score(doc=362,freq=6.0), product of:
              0.18182448 = queryWeight, product of:
                2.5099201 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017914126 = queryNorm
              0.6190881 = fieldWeight in 362, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.3682325 = weight(abstract_txt:mining in 362) [ClassicSimilarity], result of:
            0.3682325 = score(doc=362,freq=9.0), product of:
              0.3180196 = queryWeight, product of:
                2.874692 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.017914126 = queryNorm
              1.1578925 = fieldWeight in 362, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
          0.12923308 = weight(abstract_txt:algorithms in 362) [ClassicSimilarity], result of:
            0.12923308 = score(doc=362,freq=1.0), product of:
              0.36225578 = queryWeight, product of:
                3.5427573 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.017914126 = queryNorm
              0.35674536 = fieldWeight in 362, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=362)
        0.16 = coord(4/25)