Document (#33255)

Author
Dolin, R.
Agrawal, D.
El Abbadi, A.
Pearlman, J.
Title
Using automated classification for summarizing and selecting heterogeneous information sources
Source
D-Lib magazine. 4(1998) no.1, xx S
Year
1998
Abstract
Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.
Footnote
Vgl.: http://dlib.ukoln.ac.uk/dlib/january98/dolin/01dolin.html.
Theme
Automatisches Klassifizieren
Object
Pharos
LCC

Similar documents (content)

  1. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.43
    0.4296005 = sum of:
      0.4296005 = product of:
        0.7160008 = sum of:
          0.05461034 = weight(abstract_txt:heterogeneous in 1317) [ClassicSimilarity], result of:
            0.05461034 = score(doc=1317,freq=1.0), product of:
              0.13618822 = queryWeight, product of:
                6.4158664 = idf(docFreq=189, maxDocs=42740)
                0.021226786 = queryNorm
              0.40099165 = fieldWeight in 1317, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4158664 = idf(docFreq=189, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.06248239 = weight(abstract_txt:must in 1317) [ClassicSimilarity], result of:
            0.06248239 = score(doc=1317,freq=3.0), product of:
              0.11369304 = queryWeight, product of:
                1.0550342 = boost
                5.076719 = idf(docFreq=724, maxDocs=42740)
                0.021226786 = queryNorm
              0.5495709 = fieldWeight in 1317, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.076719 = idf(docFreq=724, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.03634117 = weight(abstract_txt:queries in 1317) [ClassicSimilarity], result of:
            0.03634117 = score(doc=1317,freq=1.0), product of:
              0.11425323 = queryWeight, product of:
                1.0576302 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.021226786 = queryNorm
              0.31807566 = fieldWeight in 1317, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.03656197 = weight(abstract_txt:within in 1317) [ClassicSimilarity], result of:
            0.03656197 = score(doc=1317,freq=2.0), product of:
              0.09808041 = queryWeight, product of:
                1.0955842 = boost
                4.217473 = idf(docFreq=1711, maxDocs=42740)
                0.021226786 = queryNorm
              0.37277547 = fieldWeight in 1317, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.217473 = idf(docFreq=1711, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.017804705 = weight(abstract_txt:information in 1317) [ClassicSimilarity], result of:
            0.017804705 = score(doc=1317,freq=4.0), product of:
              0.058613803 = queryWeight, product of:
                1.1362942 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.021226786 = queryNorm
              0.303763 = fieldWeight in 1317, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.06530678 = weight(abstract_txt:automatically in 1317) [ClassicSimilarity], result of:
            0.06530678 = score(doc=1317,freq=2.0), product of:
              0.13403907 = queryWeight, product of:
                1.1455534 = boost
                5.5122876 = idf(docFreq=468, maxDocs=42740)
                0.021226786 = queryNorm
              0.487222 = fieldWeight in 1317, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.5122876 = idf(docFreq=468, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.049373087 = weight(abstract_txt:collection in 1317) [ClassicSimilarity], result of:
            0.049373087 = score(doc=1317,freq=2.0), product of:
              0.11982738 = queryWeight, product of:
                1.2109679 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.021226786 = queryNorm
              0.41203508 = fieldWeight in 1317, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.036548547 = weight(abstract_txt:query in 1317) [ClassicSimilarity], result of:
            0.036548547 = score(doc=1317,freq=1.0), product of:
              0.12354333 = queryWeight, product of:
                1.2296011 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.021226786 = queryNorm
              0.29583585 = fieldWeight in 1317, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.038571496 = weight(abstract_txt:include in 1317) [ClassicSimilarity], result of:
            0.038571496 = score(doc=1317,freq=1.0), product of:
              0.128061 = queryWeight, product of:
                1.251881 = boost
                4.8191404 = idf(docFreq=937, maxDocs=42740)
                0.021226786 = queryNorm
              0.30119628 = fieldWeight in 1317, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8191404 = idf(docFreq=937, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.0396645 = weight(abstract_txt:such in 1317) [ClassicSimilarity], result of:
            0.0396645 = score(doc=1317,freq=4.0), product of:
              0.09194537 = queryWeight, product of:
                1.2551152 = boost
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.021226786 = queryNorm
              0.431392 = fieldWeight in 1317, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.015555156 = weight(abstract_txt:with in 1317) [ClassicSimilarity], result of:
            0.015555156 = score(doc=1317,freq=2.0), product of:
              0.06990187 = queryWeight, product of:
                1.3080188 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.021226786 = queryNorm
              0.22252847 = fieldWeight in 1317, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.02214627 = weight(abstract_txt:which in 1317) [ClassicSimilarity], result of:
            0.02214627 = score(doc=1317,freq=2.0), product of:
              0.0854123 = queryWeight, product of:
                1.3716744 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.021226786 = queryNorm
              0.25928664 = fieldWeight in 1317, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.048592247 = weight(abstract_txt:relevant in 1317) [ClassicSimilarity], result of:
            0.048592247 = score(doc=1317,freq=1.0), product of:
              0.16710645 = queryWeight, product of:
                1.6920578 = boost
                4.652579 = idf(docFreq=1107, maxDocs=42740)
                0.021226786 = queryNorm
              0.29078618 = fieldWeight in 1317, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.652579 = idf(docFreq=1107, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.07640344 = weight(abstract_txt:classification in 1317) [ClassicSimilarity], result of:
            0.07640344 = score(doc=1317,freq=3.0), product of:
              0.17644827 = queryWeight, product of:
                2.0781567 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.021226786 = queryNorm
              0.4330076 = fieldWeight in 1317, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
          0.11603867 = weight(abstract_txt:sources in 1317) [ClassicSimilarity], result of:
            0.11603867 = score(doc=1317,freq=3.0), product of:
              0.22509101 = queryWeight, product of:
                2.2267423 = boost
                4.76216 = idf(docFreq=992, maxDocs=42740)
                0.021226786 = queryNorm
              0.5155189 = fieldWeight in 1317, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.76216 = idf(docFreq=992, maxDocs=42740)
                0.0625 = fieldNorm(doc=1317)
        0.6 = coord(15/25)
    
  2. McKiernan, G.: Parallel universe : the organization of information elements and access in a World Wide Web (WWW) Virtual Library (1996) 0.20
    0.19687301 = sum of:
      0.19687301 = product of:
        0.54686946 = sum of:
          0.030244801 = weight(abstract_txt:have in 5253) [ClassicSimilarity], result of:
            0.030244801 = score(doc=5253,freq=3.0), product of:
              0.069143765 = queryWeight, product of:
                1.0076779 = boost
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.021226786 = queryNorm
              0.43741906 = fieldWeight in 5253, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.03231652 = weight(abstract_txt:within in 5253) [ClassicSimilarity], result of:
            0.03231652 = score(doc=5253,freq=1.0), product of:
              0.09808041 = queryWeight, product of:
                1.0955842 = boost
                4.217473 = idf(docFreq=1711, maxDocs=42740)
                0.021226786 = queryNorm
              0.32949007 = fieldWeight in 5253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.217473 = idf(docFreq=1711, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.01112794 = weight(abstract_txt:information in 5253) [ClassicSimilarity], result of:
            0.01112794 = score(doc=5253,freq=1.0), product of:
              0.058613803 = queryWeight, product of:
                1.1362942 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.021226786 = queryNorm
              0.18985188 = fieldWeight in 5253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.109908216 = weight(abstract_txt:demonstration in 5253) [ClassicSimilarity], result of:
            0.109908216 = score(doc=5253,freq=1.0), product of:
              0.1870839 = queryWeight, product of:
                1.1720562 = boost
                7.519756 = idf(docFreq=62, maxDocs=42740)
                0.021226786 = queryNorm
              0.5874809 = fieldWeight in 5253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.519756 = idf(docFreq=62, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.061716355 = weight(abstract_txt:collection in 5253) [ClassicSimilarity], result of:
            0.061716355 = score(doc=5253,freq=2.0), product of:
              0.11982738 = queryWeight, product of:
                1.2109679 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.021226786 = queryNorm
              0.51504385 = fieldWeight in 5253, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.10030192 = weight(abstract_txt:prototype in 5253) [ClassicSimilarity], result of:
            0.10030192 = score(doc=5253,freq=2.0), product of:
              0.15376543 = queryWeight, product of:
                1.2269559 = boost
                5.903989 = idf(docFreq=316, maxDocs=42740)
                0.021226786 = queryNorm
              0.65230477 = fieldWeight in 5253, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.903989 = idf(docFreq=316, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.027682835 = weight(abstract_txt:which in 5253) [ClassicSimilarity], result of:
            0.027682835 = score(doc=5253,freq=2.0), product of:
              0.0854123 = queryWeight, product of:
                1.3716744 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.021226786 = queryNorm
              0.3241083 = fieldWeight in 5253, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.055139434 = weight(abstract_txt:classification in 5253) [ClassicSimilarity], result of:
            0.055139434 = score(doc=5253,freq=1.0), product of:
              0.17644827 = queryWeight, product of:
                2.0781567 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.021226786 = queryNorm
              0.3124963 = fieldWeight in 5253, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
          0.11843147 = weight(abstract_txt:sources in 5253) [ClassicSimilarity], result of:
            0.11843147 = score(doc=5253,freq=2.0), product of:
              0.22509101 = queryWeight, product of:
                2.2267423 = boost
                4.76216 = idf(docFreq=992, maxDocs=42740)
                0.021226786 = queryNorm
              0.5261493 = fieldWeight in 5253, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.76216 = idf(docFreq=992, maxDocs=42740)
                0.078125 = fieldNorm(doc=5253)
        0.36 = coord(9/25)
    
  3. Layfield, C.; Azzopardi, J,; Staff, C.: Experiments with document retrieval from small text collections using Latent Semantic Analysis or term similarity with query coordination and automatic relevance feedback (2017) 0.19
    0.19182691 = sum of:
      0.19182691 = product of:
        0.43597025 = sum of:
          0.01222329 = weight(abstract_txt:have in 5479) [ClassicSimilarity], result of:
            0.01222329 = score(doc=5479,freq=1.0), product of:
              0.069143765 = queryWeight, product of:
                1.0076779 = boost
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.021226786 = queryNorm
              0.1767808 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.06359705 = weight(abstract_txt:queries in 5479) [ClassicSimilarity], result of:
            0.06359705 = score(doc=5479,freq=4.0), product of:
              0.11425323 = queryWeight, product of:
                1.0576302 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.021226786 = queryNorm
              0.5566324 = fieldWeight in 5479, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.040406514 = weight(abstract_txt:automatically in 5479) [ClassicSimilarity], result of:
            0.040406514 = score(doc=5479,freq=1.0), product of:
              0.13403907 = queryWeight, product of:
                1.1455534 = boost
                5.5122876 = idf(docFreq=468, maxDocs=42740)
                0.021226786 = queryNorm
              0.30145323 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5122876 = idf(docFreq=468, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.035128076 = weight(abstract_txt:work in 5479) [ClassicSimilarity], result of:
            0.035128076 = score(doc=5479,freq=3.0), product of:
              0.09690736 = queryWeight, product of:
                1.1929538 = boost
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.021226786 = queryNorm
              0.3624913 = fieldWeight in 5479, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.074827105 = weight(abstract_txt:collection in 5479) [ClassicSimilarity], result of:
            0.074827105 = score(doc=5479,freq=6.0), product of:
              0.11982738 = queryWeight, product of:
                1.2109679 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.021226786 = queryNorm
              0.6244575 = fieldWeight in 5479, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.045226518 = weight(abstract_txt:query in 5479) [ClassicSimilarity], result of:
            0.045226518 = score(doc=5479,freq=2.0), product of:
              0.12354333 = queryWeight, product of:
                1.2296011 = boost
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.021226786 = queryNorm
              0.3660782 = fieldWeight in 5479, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.7333736 = idf(docFreq=1021, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.033750057 = weight(abstract_txt:include in 5479) [ClassicSimilarity], result of:
            0.033750057 = score(doc=5479,freq=1.0), product of:
              0.128061 = queryWeight, product of:
                1.251881 = boost
                4.8191404 = idf(docFreq=937, maxDocs=42740)
                0.021226786 = queryNorm
              0.26354674 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8191404 = idf(docFreq=937, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.024217086 = weight(abstract_txt:terms in 5479) [ClassicSimilarity], result of:
            0.024217086 = score(doc=5479,freq=1.0), product of:
              0.10907089 = queryWeight, product of:
                1.2656093 = boost
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.021226786 = queryNorm
              0.2220307 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.05999 = idf(docFreq=2003, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.019248523 = weight(abstract_txt:with in 5479) [ClassicSimilarity], result of:
            0.019248523 = score(doc=5479,freq=4.0), product of:
              0.06990187 = queryWeight, product of:
                1.3080188 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.021226786 = queryNorm
              0.27536494 = fieldWeight in 5479, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.013702305 = weight(abstract_txt:which in 5479) [ClassicSimilarity], result of:
            0.013702305 = score(doc=5479,freq=1.0), product of:
              0.0854123 = queryWeight, product of:
                1.3716744 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.021226786 = queryNorm
              0.16042542 = fieldWeight in 5479, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
          0.073643714 = weight(abstract_txt:relevant in 5479) [ClassicSimilarity], result of:
            0.073643714 = score(doc=5479,freq=3.0), product of:
              0.16710645 = queryWeight, product of:
                1.6920578 = boost
                4.652579 = idf(docFreq=1107, maxDocs=42740)
                0.021226786 = queryNorm
              0.4406994 = fieldWeight in 5479, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.652579 = idf(docFreq=1107, maxDocs=42740)
                0.0546875 = fieldNorm(doc=5479)
        0.44 = coord(11/25)
    
  4. Yoon, Y.; Lee, G.G.: Efficient implementation of associative classifiers for document classification (2007) 0.19
    0.1873347 = sum of:
      0.1873347 = product of:
        0.58542097 = sum of:
          0.01975582 = weight(abstract_txt:have in 2910) [ClassicSimilarity], result of:
            0.01975582 = score(doc=2910,freq=2.0), product of:
              0.069143765 = queryWeight, product of:
                1.0076779 = boost
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.021226786 = queryNorm
              0.2857209 = fieldWeight in 2910, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
          0.008902352 = weight(abstract_txt:information in 2910) [ClassicSimilarity], result of:
            0.008902352 = score(doc=2910,freq=1.0), product of:
              0.058613803 = queryWeight, product of:
                1.1362942 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.021226786 = queryNorm
              0.1518815 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
          0.034912046 = weight(abstract_txt:collection in 2910) [ClassicSimilarity], result of:
            0.034912046 = score(doc=2910,freq=1.0), product of:
              0.11982738 = queryWeight, product of:
                1.2109679 = boost
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.021226786 = queryNorm
              0.2913528 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.661645 = idf(docFreq=1097, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
          0.01983225 = weight(abstract_txt:such in 2910) [ClassicSimilarity], result of:
            0.01983225 = score(doc=2910,freq=1.0), product of:
              0.09194537 = queryWeight, product of:
                1.2551152 = boost
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.021226786 = queryNorm
              0.215696 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
          0.010999156 = weight(abstract_txt:with in 2910) [ClassicSimilarity], result of:
            0.010999156 = score(doc=2910,freq=1.0), product of:
              0.06990187 = queryWeight, product of:
                1.3080188 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.021226786 = queryNorm
              0.15735139 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
          0.015659776 = weight(abstract_txt:which in 2910) [ClassicSimilarity], result of:
            0.015659776 = score(doc=2910,freq=1.0), product of:
              0.0854123 = queryWeight, product of:
                1.3716744 = boost
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.021226786 = queryNorm
              0.18334334 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9334934 = idf(docFreq=6181, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
          0.11670818 = weight(abstract_txt:classification in 2910) [ClassicSimilarity], result of:
            0.11670818 = score(doc=2910,freq=7.0), product of:
              0.17644827 = queryWeight, product of:
                2.0781567 = boost
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.021226786 = queryNorm
              0.66143 = fieldWeight in 2910, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.9999528 = idf(docFreq=2127, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
          0.35865137 = weight(abstract_txt:newsgroups in 2910) [ClassicSimilarity], result of:
            0.35865137 = score(doc=2910,freq=1.0), product of:
              0.6888295 = queryWeight, product of:
                3.8953521 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.021226786 = queryNorm
              0.52066785 = fieldWeight in 2910, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.0625 = fieldNorm(doc=2910)
        0.32 = coord(8/25)
    
  5. Tonta, Y.: Scholarly communication and the use of networked information sources (1996) 0.18
    0.18292682 = sum of:
      0.18292682 = product of:
        0.7621951 = sum of:
          0.017461844 = weight(abstract_txt:have in 6458) [ClassicSimilarity], result of:
            0.017461844 = score(doc=6458,freq=1.0), product of:
              0.069143765 = queryWeight, product of:
                1.0076779 = boost
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.021226786 = queryNorm
              0.25254402 = fieldWeight in 6458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2325633 = idf(docFreq=4583, maxDocs=42740)
                0.078125 = fieldNorm(doc=6458)
          0.027257778 = weight(abstract_txt:information in 6458) [ClassicSimilarity], result of:
            0.027257778 = score(doc=6458,freq=6.0), product of:
              0.058613803 = queryWeight, product of:
                1.1362942 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.021226786 = queryNorm
              0.46504024 = fieldWeight in 6458, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.078125 = fieldNorm(doc=6458)
          0.028973153 = weight(abstract_txt:work in 6458) [ClassicSimilarity], result of:
            0.028973153 = score(doc=6458,freq=1.0), product of:
              0.09690736 = queryWeight, product of:
                1.1929538 = boost
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.021226786 = queryNorm
              0.29897782 = fieldWeight in 6458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.826916 = idf(docFreq=2529, maxDocs=42740)
                0.078125 = fieldNorm(doc=6458)
          0.035058796 = weight(abstract_txt:such in 6458) [ClassicSimilarity], result of:
            0.035058796 = score(doc=6458,freq=2.0), product of:
              0.09194537 = queryWeight, product of:
                1.2551152 = boost
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.021226786 = queryNorm
              0.38130027 = fieldWeight in 6458, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.451136 = idf(docFreq=3683, maxDocs=42740)
                0.078125 = fieldNorm(doc=6458)
          0.20512934 = weight(abstract_txt:sources in 6458) [ClassicSimilarity], result of:
            0.20512934 = score(doc=6458,freq=6.0), product of:
              0.22509101 = queryWeight, product of:
                2.2267423 = boost
                4.76216 = idf(docFreq=992, maxDocs=42740)
                0.021226786 = queryNorm
              0.91131735 = fieldWeight in 6458, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.76216 = idf(docFreq=992, maxDocs=42740)
                0.078125 = fieldNorm(doc=6458)
          0.4483142 = weight(abstract_txt:newsgroups in 6458) [ClassicSimilarity], result of:
            0.4483142 = score(doc=6458,freq=1.0), product of:
              0.6888295 = queryWeight, product of:
                3.8953521 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.021226786 = queryNorm
              0.6508348 = fieldWeight in 6458, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.078125 = fieldNorm(doc=6458)
        0.24 = coord(6/25)