Document (#25039)

Author
Sanderson, M.
Lawrie, D.
Title
Building, testing, and applying concept hierarchies
Source
Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval. Ed.: W.B. Croft
Imprint
Boston, MA : Kluwer Academic Publ.
Year
2000
Pages
S.235-266
Series
The Kluwer international series on information retrieval; 7
Abstract
A means of automatically deriving a hierarchical organization of concepts from a set of documents without use of training data or standard clustering techniques is presented. Using a process that extracts salient words and phrases from the documents, these terms are organized hierarchically using a type of co-occurrence known as subsumption. The resulting structure is displayed as a series of hierarchical menus. When generated from a set of retrieved documents, a user browsing the menus gains an overview of their content in a manner distinct from existing techniques. The methods used to build the structure are simple and appear to be effective. The formation and presentation of the hierarchy is described along with a study of some of its properties, including a preliminary experiment, which indicates that users may find the hierarchy a more efficient means of locating relevant documents than the classic method of scanning a ranked document list
Theme
Semantisches Umfeld in Indexierung u. Retrieval

Similar documents (author)

  1. Sanderson, M.: ¬The Reuters test collection (1996) 5.39
    5.3864803 = sum of:
      5.3864803 = weight(author_txt:sanderson in 41) [ClassicSimilarity], result of:
        5.3864803 = fieldWeight in 41, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.618368 = idf(docFreq=20, maxDocs=42740)
          0.625 = fieldNorm(doc=41)
    
  2. Sanderson, M.: Revisiting h measured on UK LIS and IR academics (2008) 5.39
    5.3864803 = sum of:
      5.3864803 = weight(author_txt:sanderson in 3868) [ClassicSimilarity], result of:
        5.3864803 = fieldWeight in 3868, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.618368 = idf(docFreq=20, maxDocs=42740)
          0.625 = fieldNorm(doc=3868)
    
  3. Purves, R.S.; Sanderson, M.: ¬A methodology to allow avalanche forecasting on an information retrieval system (1998) 4.31
    4.309184 = sum of:
      4.309184 = weight(author_txt:sanderson in 2074) [ClassicSimilarity], result of:
        4.309184 = fieldWeight in 2074, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.618368 = idf(docFreq=20, maxDocs=42740)
          0.5 = fieldNorm(doc=2074)
    
  4. Sanderson, M.; Ruthven, I.: Report on the Glasgow IR group (glair4) submission (1997) 4.31
    4.309184 = sum of:
      4.309184 = weight(author_txt:sanderson in 4089) [ClassicSimilarity], result of:
        4.309184 = fieldWeight in 4089, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.618368 = idf(docFreq=20, maxDocs=42740)
          0.5 = fieldNorm(doc=4089)
    
  5. Clough, P.; Sanderson, M.: User experiments with the Eurovision Cross-Language Image Retrieval System (2006) 4.31
    4.309184 = sum of:
      4.309184 = weight(author_txt:sanderson in 53) [ClassicSimilarity], result of:
        4.309184 = fieldWeight in 53, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.618368 = idf(docFreq=20, maxDocs=42740)
          0.5 = fieldNorm(doc=53)
    

Similar documents (content)

  1. Gauch, S.; Chandramouli, A.; Ranganathan, S.: Training a hierarchical classifier using inter document relationships (2009) 0.17
    0.1745302 = sum of:
      0.1745302 = product of:
        0.72720915 = sum of:
          0.021869602 = weight(abstract_txt:using in 4698) [ClassicSimilarity], result of:
            0.021869602 = score(doc=4698,freq=1.0), product of:
              0.08045164 = queryWeight, product of:
                1.1001736 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.021016369 = queryNorm
              0.2718354 = fieldWeight in 4698, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.1500739 = weight(abstract_txt:hierarchically in 4698) [ClassicSimilarity], result of:
            0.1500739 = score(doc=4698,freq=1.0), product of:
              0.23058678 = queryWeight, product of:
                1.317031 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.021016369 = queryNorm
              0.6508348 = fieldWeight in 4698, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.031845924 = weight(abstract_txt:from in 4698) [ClassicSimilarity], result of:
            0.031845924 = score(doc=4698,freq=2.0), product of:
              0.10335787 = queryWeight, product of:
                1.7635206 = boost
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.021016369 = queryNorm
              0.30811322 = fieldWeight in 4698, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.17113209 = weight(abstract_txt:hierarchical in 4698) [ClassicSimilarity], result of:
            0.17113209 = score(doc=4698,freq=3.0), product of:
              0.21986437 = queryWeight, product of:
                1.8187425 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.021016369 = queryNorm
              0.778353 = fieldWeight in 4698, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.20754863 = weight(abstract_txt:hierarchy in 4698) [ClassicSimilarity], result of:
            0.20754863 = score(doc=4698,freq=2.0), product of:
              0.2862272 = queryWeight, product of:
                2.0751488 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.021016369 = queryNorm
              0.7251185 = fieldWeight in 4698, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
          0.14473902 = weight(abstract_txt:documents in 4698) [ClassicSimilarity], result of:
            0.14473902 = score(doc=4698,freq=4.0), product of:
              0.22508922 = queryWeight, product of:
                2.6024723 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.021016369 = queryNorm
              0.6430295 = fieldWeight in 4698, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.078125 = fieldNorm(doc=4698)
        0.24 = coord(6/25)
    
  2. Allen, R.B.: Navigating and searching in digital library catalogs (1994) 0.13
    0.13370088 = sum of:
      0.13370088 = product of:
        0.66850436 = sum of:
          0.07395322 = weight(abstract_txt:displayed in 4415) [ClassicSimilarity], result of:
            0.07395322 = score(doc=4415,freq=1.0), product of:
              0.16693306 = queryWeight, product of:
                1.1205983 = boost
                7.0881796 = idf(docFreq=96, maxDocs=42740)
                0.021016369 = queryNorm
              0.44301122 = fieldWeight in 4415, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.0881796 = idf(docFreq=96, maxDocs=42740)
                0.0625 = fieldNorm(doc=4415)
          0.04910396 = weight(abstract_txt:structure in 4415) [ClassicSimilarity], result of:
            0.04910396 = score(doc=4415,freq=2.0), product of:
              0.12705214 = queryWeight, product of:
                1.3825625 = boost
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.021016369 = queryNorm
              0.38648668 = fieldWeight in 4415, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.0625 = fieldNorm(doc=4415)
          0.2348145 = weight(abstract_txt:hierarchy in 4415) [ClassicSimilarity], result of:
            0.2348145 = score(doc=4415,freq=4.0), product of:
              0.2862272 = queryWeight, product of:
                2.0751488 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.021016369 = queryNorm
              0.820378 = fieldWeight in 4415, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.0625 = fieldNorm(doc=4415)
          0.22875594 = weight(abstract_txt:menus in 4415) [ClassicSimilarity], result of:
            0.22875594 = score(doc=4415,freq=1.0), product of:
              0.44650796 = queryWeight, product of:
                2.5918412 = boost
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.021016369 = queryNorm
              0.5123222 = fieldWeight in 4415, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.197155 = idf(docFreq=31, maxDocs=42740)
                0.0625 = fieldNorm(doc=4415)
          0.08187675 = weight(abstract_txt:documents in 4415) [ClassicSimilarity], result of:
            0.08187675 = score(doc=4415,freq=2.0), product of:
              0.22508922 = queryWeight, product of:
                2.6024723 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.021016369 = queryNorm
              0.36375242 = fieldWeight in 4415, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=4415)
        0.2 = coord(5/25)
    
  3. Yang, C.C.; Wang, F.L.: Hierarchical summarization of large documents (2008) 0.13
    0.13256384 = sum of:
      0.13256384 = product of:
        0.47344232 = sum of:
          0.017495682 = weight(abstract_txt:using in 3720) [ClassicSimilarity], result of:
            0.017495682 = score(doc=3720,freq=1.0), product of:
              0.08045164 = queryWeight, product of:
                1.1001736 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.021016369 = queryNorm
              0.21746832 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.0625 = fieldNorm(doc=3720)
          0.09003063 = weight(abstract_txt:salient in 3720) [ClassicSimilarity], result of:
            0.09003063 = score(doc=3720,freq=1.0), product of:
              0.19032587 = queryWeight, product of:
                1.1965413 = boost
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.021016369 = queryNorm
              0.4730341 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.568546 = idf(docFreq=59, maxDocs=42740)
                0.0625 = fieldNorm(doc=3720)
          0.06944349 = weight(abstract_txt:structure in 3720) [ClassicSimilarity], result of:
            0.06944349 = score(doc=3720,freq=4.0), product of:
              0.12705214 = queryWeight, product of:
                1.3825625 = boost
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.021016369 = queryNorm
              0.5465747 = fieldWeight in 3720, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.0625 = fieldNorm(doc=3720)
          0.038495935 = weight(abstract_txt:techniques in 3720) [ClassicSimilarity], result of:
            0.038495935 = score(doc=3720,freq=1.0), product of:
              0.13609982 = queryWeight, product of:
                1.4309437 = boost
                4.525612 = idf(docFreq=1257, maxDocs=42740)
                0.021016369 = queryNorm
              0.28285074 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.525612 = idf(docFreq=1257, maxDocs=42740)
                0.0625 = fieldNorm(doc=3720)
          0.018014776 = weight(abstract_txt:from in 3720) [ClassicSimilarity], result of:
            0.018014776 = score(doc=3720,freq=1.0), product of:
              0.10335787 = queryWeight, product of:
                1.7635206 = boost
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.021016369 = queryNorm
              0.17429516 = fieldWeight in 3720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.0625 = fieldNorm(doc=3720)
          0.15808506 = weight(abstract_txt:hierarchical in 3720) [ClassicSimilarity], result of:
            0.15808506 = score(doc=3720,freq=4.0), product of:
              0.21986437 = queryWeight, product of:
                1.8187425 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.021016369 = queryNorm
              0.7190117 = fieldWeight in 3720, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.0625 = fieldNorm(doc=3720)
          0.08187675 = weight(abstract_txt:documents in 3720) [ClassicSimilarity], result of:
            0.08187675 = score(doc=3720,freq=2.0), product of:
              0.22508922 = queryWeight, product of:
                2.6024723 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.021016369 = queryNorm
              0.36375242 = fieldWeight in 3720, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.0625 = fieldNorm(doc=3720)
        0.28 = coord(7/25)
    
  4. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.12
    0.12087505 = sum of:
      0.12087505 = product of:
        0.33576402 = sum of:
          0.03338328 = weight(abstract_txt:phrases in 3254) [ClassicSimilarity], result of:
            0.03338328 = score(doc=3254,freq=1.0), product of:
              0.15593502 = queryWeight, product of:
                1.0830553 = boost
                6.850706 = idf(docFreq=122, maxDocs=42740)
                0.021016369 = queryNorm
              0.21408457 = fieldWeight in 3254, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.850706 = idf(docFreq=122, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.03747045 = weight(abstract_txt:locating in 3254) [ClassicSimilarity], result of:
            0.03747045 = score(doc=3254,freq=1.0), product of:
              0.16841608 = queryWeight, product of:
                1.1255649 = boost
                7.1195955 = idf(docFreq=93, maxDocs=42740)
                0.021016369 = queryNorm
              0.22248736 = fieldWeight in 3254, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1195955 = idf(docFreq=93, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.060029563 = weight(abstract_txt:hierarchically in 3254) [ClassicSimilarity], result of:
            0.060029563 = score(doc=3254,freq=1.0), product of:
              0.23058678 = queryWeight, product of:
                1.317031 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.021016369 = queryNorm
              0.26033393 = fieldWeight in 3254, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.017360872 = weight(abstract_txt:structure in 3254) [ClassicSimilarity], result of:
            0.017360872 = score(doc=3254,freq=1.0), product of:
              0.12705214 = queryWeight, product of:
                1.3825625 = boost
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.021016369 = queryNorm
              0.13664368 = fieldWeight in 3254, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.3725977 = idf(docFreq=1465, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.019247968 = weight(abstract_txt:techniques in 3254) [ClassicSimilarity], result of:
            0.019247968 = score(doc=3254,freq=1.0), product of:
              0.13609982 = queryWeight, product of:
                1.4309437 = boost
                4.525612 = idf(docFreq=1257, maxDocs=42740)
                0.021016369 = queryNorm
              0.14142537 = fieldWeight in 3254, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.525612 = idf(docFreq=1257, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.012738369 = weight(abstract_txt:from in 3254) [ClassicSimilarity], result of:
            0.012738369 = score(doc=3254,freq=2.0), product of:
              0.10335787 = queryWeight, product of:
                1.7635206 = boost
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.021016369 = queryNorm
              0.123245284 = fieldWeight in 3254, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.05589151 = weight(abstract_txt:hierarchical in 3254) [ClassicSimilarity], result of:
            0.05589151 = score(doc=3254,freq=2.0), product of:
              0.21986437 = queryWeight, product of:
                1.8187425 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.021016369 = queryNorm
              0.25420904 = fieldWeight in 3254, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.058703624 = weight(abstract_txt:hierarchy in 3254) [ClassicSimilarity], result of:
            0.058703624 = score(doc=3254,freq=1.0), product of:
              0.2862272 = queryWeight, product of:
                2.0751488 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.021016369 = queryNorm
              0.2050945 = fieldWeight in 3254, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
          0.040938374 = weight(abstract_txt:documents in 3254) [ClassicSimilarity], result of:
            0.040938374 = score(doc=3254,freq=2.0), product of:
              0.22508922 = queryWeight, product of:
                2.6024723 = boost
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.021016369 = queryNorm
              0.18187621 = fieldWeight in 3254, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115389 = idf(docFreq=1895, maxDocs=42740)
                0.03125 = fieldNorm(doc=3254)
        0.36 = coord(9/25)
    
  5. Hepp, M.; Bruijn, J. de: GenTax : a generic methodology for deriving OWL and RDF-S ontologies from hierarchical classifications, thesauri, and inconsistent taxonomies (2007) 0.12
    0.118580356 = sum of:
      0.118580356 = product of:
        0.59290177 = sum of:
          0.10336823 = weight(abstract_txt:deriving in 1693) [ClassicSimilarity], result of:
            0.10336823 = score(doc=1693,freq=1.0), product of:
              0.20868714 = queryWeight, product of:
                1.2529294 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.021016369 = queryNorm
              0.4953263 = fieldWeight in 1693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=1693)
          0.15638041 = weight(abstract_txt:subsumption in 1693) [ClassicSimilarity], result of:
            0.15638041 = score(doc=1693,freq=1.0), product of:
              0.27501678 = queryWeight, product of:
                1.4383295 = boost
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.021016369 = queryNorm
              0.56862134 = fieldWeight in 1693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.0625 = fieldNorm(doc=1693)
          0.018014776 = weight(abstract_txt:from in 1693) [ClassicSimilarity], result of:
            0.018014776 = score(doc=1693,freq=1.0), product of:
              0.10335787 = queryWeight, product of:
                1.7635206 = boost
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.021016369 = queryNorm
              0.17429516 = fieldWeight in 1693, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7887225 = idf(docFreq=7144, maxDocs=42740)
                0.0625 = fieldNorm(doc=1693)
          0.11178302 = weight(abstract_txt:hierarchical in 1693) [ClassicSimilarity], result of:
            0.11178302 = score(doc=1693,freq=2.0), product of:
              0.21986437 = queryWeight, product of:
                1.8187425 = boost
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.021016369 = queryNorm
              0.5084181 = fieldWeight in 1693, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.752094 = idf(docFreq=368, maxDocs=42740)
                0.0625 = fieldNorm(doc=1693)
          0.20335531 = weight(abstract_txt:hierarchy in 1693) [ClassicSimilarity], result of:
            0.20335531 = score(doc=1693,freq=3.0), product of:
              0.2862272 = queryWeight, product of:
                2.0751488 = boost
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.021016369 = queryNorm
              0.7104682 = fieldWeight in 1693, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.563024 = idf(docFreq=163, maxDocs=42740)
                0.0625 = fieldNorm(doc=1693)
        0.2 = coord(5/25)