Document (#33170)

Author
Hagedorn, K.
Chapman, S.
Newman, D.
Title
Enhancing search and browse using automated clustering of subject metadata
Source
D-Lib magazine. 13(2007) nos.7/8, x S
Year
2007
Abstract
The Web puzzle of online information resources often hinders end-users from effective and efficient access to these resources. Clustering resources into appropriate subject-based groupings may help alleviate these difficulties, but will it work with heterogeneous material? The University of Michigan and the University of California Irvine joined forces to test automatically enhancing metadata records using the Topic Modeling algorithm on the varied OAIster corpus. We created labels for the resulting clusters of metadata records, matched the clusters to an in-house classification system, and developed a prototype that would showcase methods for search and retrieval using the enhanced records. Results indicated that while the algorithm was somewhat time-intensive to run and using a local classification scheme had its drawbacks, precise clustering of records was achieved and the prototype interface proved that faceted classification could be powerful in helping end-users find resources.
Footnote
Vgl. auch: http://dlib.ukoln.ac.uk/dlib/july07/hagedorn/07hagedorn.html.
Theme
Automatisches Klassifizieren

Similar documents (author)

  1. Newman, N.: Search strategies and activities of BBC news interactive (2007) 2.31
    2.314644 = sum of:
      2.314644 = product of:
        4.629288 = sum of:
          4.629288 = weight(author_txt:newman in 2382) [ClassicSimilarity], result of:
            4.629288 = score(doc=2382,freq=1.0), product of:
              0.75035584 = queryWeight, product of:
                1.065422 = boost
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.07134748 = queryNorm
              6.169457 = fieldWeight in 2382, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.871131 = idf(docFreq=5, maxDocs=42740)
                0.625 = fieldNorm(doc=2382)
        0.5 = coord(1/2)
    
  2. Chapman, L.: How to catalogue : a practical manual using AACR2 and Library of Congress (1990) 1.91
    1.9138994 = sum of:
      1.9138994 = product of:
        3.8277988 = sum of:
          3.8277988 = weight(author_txt:chapman in 6081) [ClassicSimilarity], result of:
            3.8277988 = score(doc=6081,freq=1.0), product of:
              0.6610341 = queryWeight, product of:
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.07134748 = queryNorm
              5.790622 = fieldWeight in 6081, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.625 = fieldNorm(doc=6081)
        0.5 = coord(1/2)
    
  3. Chapman, A.: Up to standard? : a study of the quality of records in a shared cataloguing database (1994) 1.91
    1.9138994 = sum of:
      1.9138994 = product of:
        3.8277988 = sum of:
          3.8277988 = weight(author_txt:chapman in 808) [ClassicSimilarity], result of:
            3.8277988 = score(doc=808,freq=1.0), product of:
              0.6610341 = queryWeight, product of:
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.07134748 = queryNorm
              5.790622 = fieldWeight in 808, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.625 = fieldNorm(doc=808)
        0.5 = coord(1/2)
    
  4. Chapman, A.: Quality of bibliographic records in a shared cataloguing database : a case study using the BLCMP database (1993) 1.91
    1.9138994 = sum of:
      1.9138994 = product of:
        3.8277988 = sum of:
          3.8277988 = weight(author_txt:chapman in 809) [ClassicSimilarity], result of:
            3.8277988 = score(doc=809,freq=1.0), product of:
              0.6610341 = queryWeight, product of:
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.07134748 = queryNorm
              5.790622 = fieldWeight in 809, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.625 = fieldNorm(doc=809)
        0.5 = coord(1/2)
    
  5. Chapman, A.: Retrospective catalogue conversion : a national study and a discussion based on selected literature (1996) 1.91
    1.9138994 = sum of:
      1.9138994 = product of:
        3.8277988 = sum of:
          3.8277988 = weight(author_txt:chapman in 6636) [ClassicSimilarity], result of:
            3.8277988 = score(doc=6636,freq=1.0), product of:
              0.6610341 = queryWeight, product of:
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.07134748 = queryNorm
              5.790622 = fieldWeight in 6636, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.264996 = idf(docFreq=10, maxDocs=42740)
                0.625 = fieldNorm(doc=6636)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Golub, K.: Automatic subject indexing of text (2019) 0.15
    0.14953615 = sum of:
      0.14953615 = product of:
        0.5340577 = sum of:
          0.0468163 = weight(abstract_txt:subject in 1269) [ClassicSimilarity], result of:
            0.0468163 = score(doc=1269,freq=5.0), product of:
              0.08567248 = queryWeight, product of:
                1.054739 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.020773306 = queryNorm
              0.5464567 = fieldWeight in 1269, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.0625 = fieldNorm(doc=1269)
          0.058231287 = weight(abstract_txt:classification in 1269) [ClassicSimilarity], result of:
            0.058231287 = score(doc=1269,freq=3.0), product of:
              0.13448098 = queryWeight, product of:
                1.6184541 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.020773306 = queryNorm
              0.4330076 = fieldWeight in 1269, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.0625 = fieldNorm(doc=1269)
          0.029506512 = weight(abstract_txt:using in 1269) [ClassicSimilarity], result of:
            0.029506512 = score(doc=1269,freq=1.0), product of:
              0.13568188 = queryWeight, product of:
                1.8771554 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.020773306 = queryNorm
              0.21746832 = fieldWeight in 1269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.0625 = fieldNorm(doc=1269)
          0.10743933 = weight(abstract_txt:metadata in 1269) [ClassicSimilarity], result of:
            0.10743933 = score(doc=1269,freq=3.0), product of:
              0.20230137 = queryWeight, product of:
                1.9850404 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.020773306 = queryNorm
              0.53108555 = fieldWeight in 1269, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0625 = fieldNorm(doc=1269)
          0.052874047 = weight(abstract_txt:resources in 1269) [ClassicSimilarity], result of:
            0.052874047 = score(doc=1269,freq=1.0), product of:
              0.20017277 = queryWeight, product of:
                2.2800364 = boost
                4.226273 = idf(docFreq=1696, maxDocs=42740)
                0.020773306 = queryNorm
              0.26414207 = fieldWeight in 1269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.226273 = idf(docFreq=1696, maxDocs=42740)
                0.0625 = fieldNorm(doc=1269)
          0.060370047 = weight(abstract_txt:records in 1269) [ClassicSimilarity], result of:
            0.060370047 = score(doc=1269,freq=1.0), product of:
              0.21867087 = queryWeight, product of:
                2.383059 = boost
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.020773306 = queryNorm
              0.2760772 = fieldWeight in 1269, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.0625 = fieldNorm(doc=1269)
          0.17882012 = weight(abstract_txt:clustering in 1269) [ClassicSimilarity], result of:
            0.17882012 = score(doc=1269,freq=2.0), product of:
              0.32523552 = queryWeight, product of:
                2.5169172 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.020773306 = queryNorm
              0.5498173 = fieldWeight in 1269, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.0625 = fieldNorm(doc=1269)
        0.28 = coord(7/25)
    
  2. Baker, T.: ¬The concepts of knowledge organization systems as hubs in the Web of data (2011) 0.15
    0.14778703 = sum of:
      0.14778703 = product of:
        0.5278108 = sum of:
          0.089216895 = weight(abstract_txt:labels in 1811) [ClassicSimilarity], result of:
            0.089216895 = score(doc=1811,freq=1.0), product of:
              0.1540215 = queryWeight, product of:
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.020773306 = queryNorm
              0.5792496 = fieldWeight in 1811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4143953 = idf(docFreq=69, maxDocs=42740)
                0.078125 = fieldNorm(doc=1811)
          0.09249104 = weight(abstract_txt:matched in 1811) [ClassicSimilarity], result of:
            0.09249104 = score(doc=1811,freq=1.0), product of:
              0.15776709 = queryWeight, product of:
                1.0120863 = boost
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.020773306 = queryNorm
              0.58625054 = fieldWeight in 1811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.078125 = fieldNorm(doc=1811)
          0.026171109 = weight(abstract_txt:subject in 1811) [ClassicSimilarity], result of:
            0.026171109 = score(doc=1811,freq=1.0), product of:
              0.08567248 = queryWeight, product of:
                1.054739 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.020773306 = queryNorm
              0.30547857 = fieldWeight in 1811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.078125 = fieldNorm(doc=1811)
          0.11204195 = weight(abstract_txt:joined in 1811) [ClassicSimilarity], result of:
            0.11204195 = score(doc=1811,freq=1.0), product of:
              0.17928214 = queryWeight, product of:
                1.0788916 = boost
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.020773306 = queryNorm
              0.6249476 = fieldWeight in 1811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.999329 = idf(docFreq=38, maxDocs=42740)
                0.078125 = fieldNorm(doc=1811)
          0.036883138 = weight(abstract_txt:using in 1811) [ClassicSimilarity], result of:
            0.036883138 = score(doc=1811,freq=1.0), product of:
              0.13568188 = queryWeight, product of:
                1.8771554 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.020773306 = queryNorm
              0.2718354 = fieldWeight in 1811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.078125 = fieldNorm(doc=1811)
          0.07753766 = weight(abstract_txt:metadata in 1811) [ClassicSimilarity], result of:
            0.07753766 = score(doc=1811,freq=1.0), product of:
              0.20230137 = queryWeight, product of:
                1.9850404 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.020773306 = queryNorm
              0.38327798 = fieldWeight in 1811, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.078125 = fieldNorm(doc=1811)
          0.093468994 = weight(abstract_txt:resources in 1811) [ClassicSimilarity], result of:
            0.093468994 = score(doc=1811,freq=2.0), product of:
              0.20017277 = queryWeight, product of:
                2.2800364 = boost
                4.226273 = idf(docFreq=1696, maxDocs=42740)
                0.020773306 = queryNorm
              0.4669416 = fieldWeight in 1811, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.226273 = idf(docFreq=1696, maxDocs=42740)
                0.078125 = fieldNorm(doc=1811)
        0.28 = coord(7/25)
    
  3. Hagedorn, K.: OAIster: a "no dead ends" OAI service provider (2003) 0.13
    0.1323083 = sum of:
      0.1323083 = product of:
        0.6615415 = sum of:
          0.10438316 = weight(abstract_txt:michigan in 777) [ClassicSimilarity], result of:
            0.10438316 = score(doc=777,freq=1.0), product of:
              0.17101607 = queryWeight, product of:
                1.0537262 = boost
                7.8127427 = idf(docFreq=46, maxDocs=42740)
                0.020773306 = queryNorm
              0.6103705 = fieldWeight in 777, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8127427 = idf(docFreq=46, maxDocs=42740)
                0.078125 = fieldNorm(doc=777)
          0.047611427 = weight(abstract_txt:university in 777) [ClassicSimilarity], result of:
            0.047611427 = score(doc=777,freq=2.0), product of:
              0.10133452 = queryWeight, product of:
                1.1471046 = boost
                4.2525434 = idf(docFreq=1652, maxDocs=42740)
                0.020773306 = queryNorm
              0.4698441 = fieldWeight in 777, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2525434 = idf(docFreq=1652, maxDocs=42740)
                0.078125 = fieldNorm(doc=777)
          0.28400683 = weight(abstract_txt:oaister in 777) [ClassicSimilarity], result of:
            0.28400683 = score(doc=777,freq=2.0), product of:
              0.2645407 = queryWeight, product of:
                1.3105559 = boost
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.020773306 = queryNorm
              1.0735847 = fieldWeight in 777, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                9.71698 = idf(docFreq=6, maxDocs=42740)
                0.078125 = fieldNorm(doc=777)
          0.052160636 = weight(abstract_txt:using in 777) [ClassicSimilarity], result of:
            0.052160636 = score(doc=777,freq=2.0), product of:
              0.13568188 = queryWeight, product of:
                1.8771554 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.020773306 = queryNorm
              0.3844333 = fieldWeight in 777, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.078125 = fieldNorm(doc=777)
          0.17337948 = weight(abstract_txt:metadata in 777) [ClassicSimilarity], result of:
            0.17337948 = score(doc=777,freq=5.0), product of:
              0.20230137 = queryWeight, product of:
                1.9850404 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.020773306 = queryNorm
              0.85703564 = fieldWeight in 777, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.078125 = fieldNorm(doc=777)
        0.2 = coord(5/25)
    
  4. Jun, W.: ¬A knowledge network constructed by integrating classification, thesaurus and metadata in a digital library (2003) 0.13
    0.12947807 = sum of:
      0.12947807 = product of:
        0.4624217 = sum of:
          0.064743735 = weight(abstract_txt:matched in 2255) [ClassicSimilarity], result of:
            0.064743735 = score(doc=2255,freq=1.0), product of:
              0.15776709 = queryWeight, product of:
                1.0120863 = boost
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.020773306 = queryNorm
              0.41037542 = fieldWeight in 2255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2255)
          0.018319776 = weight(abstract_txt:subject in 2255) [ClassicSimilarity], result of:
            0.018319776 = score(doc=2255,freq=1.0), product of:
              0.08567248 = queryWeight, product of:
                1.054739 = boost
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.020773306 = queryNorm
              0.213835 = fieldWeight in 2255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101257 = idf(docFreq=2327, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2255)
          0.023566453 = weight(abstract_txt:university in 2255) [ClassicSimilarity], result of:
            0.023566453 = score(doc=2255,freq=1.0), product of:
              0.10133452 = queryWeight, product of:
                1.1471046 = boost
                4.2525434 = idf(docFreq=1652, maxDocs=42740)
                0.020773306 = queryNorm
              0.23256096 = fieldWeight in 2255, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2525434 = idf(docFreq=1652, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2255)
          0.072057545 = weight(abstract_txt:classification in 2255) [ClassicSimilarity], result of:
            0.072057545 = score(doc=2255,freq=6.0), product of:
              0.13448098 = queryWeight, product of:
                1.6184541 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.020773306 = queryNorm
              0.5358196 = fieldWeight in 2255, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2255)
          0.14360176 = weight(abstract_txt:metadata in 2255) [ClassicSimilarity], result of:
            0.14360176 = score(doc=2255,freq=7.0), product of:
              0.20230137 = queryWeight, product of:
                1.9850404 = boost
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.020773306 = queryNorm
              0.7098408 = fieldWeight in 2255, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.905958 = idf(docFreq=859, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2255)
          0.065428294 = weight(abstract_txt:resources in 2255) [ClassicSimilarity], result of:
            0.065428294 = score(doc=2255,freq=2.0), product of:
              0.20017277 = queryWeight, product of:
                2.2800364 = boost
                4.226273 = idf(docFreq=1696, maxDocs=42740)
                0.020773306 = queryNorm
              0.32685912 = fieldWeight in 2255, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.226273 = idf(docFreq=1696, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2255)
          0.07470412 = weight(abstract_txt:records in 2255) [ClassicSimilarity], result of:
            0.07470412 = score(doc=2255,freq=2.0), product of:
              0.21867087 = queryWeight, product of:
                2.383059 = boost
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.020773306 = queryNorm
              0.3416281 = fieldWeight in 2255, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4172354 = idf(docFreq=1401, maxDocs=42740)
                0.0546875 = fieldNorm(doc=2255)
        0.28 = coord(7/25)
    
  5. Frants, V.I.; Kamenoff, N.I.; Shapiro, J.: ¬One approach to classification of users and automatic clustering of documents (1993) 0.13
    0.12937067 = sum of:
      0.12937067 = product of:
        0.8085667 = sum of:
          0.11646257 = weight(abstract_txt:classification in 4569) [ClassicSimilarity], result of:
            0.11646257 = score(doc=4569,freq=3.0), product of:
              0.13448098 = queryWeight, product of:
                1.6184541 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.020773306 = queryNorm
              0.8660152 = fieldWeight in 4569, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.125 = fieldNorm(doc=4569)
          0.27545083 = weight(abstract_txt:clusters in 4569) [ClassicSimilarity], result of:
            0.27545083 = score(doc=4569,freq=2.0), product of:
              0.2387258 = queryWeight, product of:
                1.7606539 = boost
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.020773306 = queryNorm
              1.1538377 = fieldWeight in 4569, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.527092 = idf(docFreq=169, maxDocs=42740)
                0.125 = fieldNorm(doc=4569)
          0.059013024 = weight(abstract_txt:using in 4569) [ClassicSimilarity], result of:
            0.059013024 = score(doc=4569,freq=1.0), product of:
              0.13568188 = queryWeight, product of:
                1.8771554 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.020773306 = queryNorm
              0.43493664 = fieldWeight in 4569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.125 = fieldNorm(doc=4569)
          0.35764024 = weight(abstract_txt:clustering in 4569) [ClassicSimilarity], result of:
            0.35764024 = score(doc=4569,freq=2.0), product of:
              0.32523552 = queryWeight, product of:
                2.5169172 = boost
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.020773306 = queryNorm
              1.0996346 = fieldWeight in 4569, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.220473 = idf(docFreq=230, maxDocs=42740)
                0.125 = fieldNorm(doc=4569)
        0.16 = coord(4/25)