Search (183 results, page 1 of 10)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.42

0.4164444 = product of:
  0.8328888 = sum of:
    0.04987225 = product of:
      0.14961675 = sum of:
        0.14961675 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.14961675 = score(doc=562,freq=2.0), product of:
            0.26621342 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.031400457 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.022169823 = weight(_text_:web in 562) [ClassicSimilarity], result of:
      0.022169823 = score(doc=562,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.21634221 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.14961675 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.14961675 = score(doc=562,freq=2.0), product of:
        0.26621342 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.031400457 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.14961675 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.14961675 = score(doc=562,freq=2.0), product of:
        0.26621342 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.031400457 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.14961675 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.14961675 = score(doc=562,freq=2.0), product of:
        0.26621342 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.031400457 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.14961675 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.14961675 = score(doc=562,freq=2.0), product of:
        0.26621342 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.031400457 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.14961675 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.14961675 = score(doc=562,freq=2.0), product of:
        0.26621342 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.031400457 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.012762985 = product of:
      0.02552597 = sum of:
        0.02552597 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.02552597 = score(doc=562,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(8/16)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Yao, H.; Etzkorn, L.H.; Virani, S.: Automated classification and retrieval of reusable software components (2008) 0.06

0.057559423 = product of:
  0.18419015 = sum of:
    0.03405392 = weight(_text_:wide in 1382) [ClassicSimilarity], result of:
      0.03405392 = score(doc=1382,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.24476713 = fieldWeight in 1382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.018474855 = weight(_text_:web in 1382) [ClassicSimilarity], result of:
      0.018474855 = score(doc=1382,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.18028519 = fieldWeight in 1382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.005345665 = weight(_text_:information in 1382) [ClassicSimilarity], result of:
      0.005345665 = score(doc=1382,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.09697737 = fieldWeight in 1382, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.031744417 = weight(_text_:retrieval in 1382) [ClassicSimilarity], result of:
      0.031744417 = score(doc=1382,freq=8.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.33420905 = fieldWeight in 1382, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
    0.094571285 = weight(_text_:software in 1382) [ClassicSimilarity], result of:
      0.094571285 = score(doc=1382,freq=24.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.75917953 = fieldWeight in 1382, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1382)
  0.3125 = coord(5/16)

Abstract: The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.4, S.613-627

Möller, G.: Automatic classification of the World Wide Web using Universal Decimal Classification (1999) 0.06

0.05718537 = product of:
  0.18299319 = sum of:
    0.06810784 = weight(_text_:wide in 494) [ClassicSimilarity], result of:
      0.06810784 = score(doc=494,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.48953426 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
    0.03694971 = weight(_text_:web in 494) [ClassicSimilarity], result of:
      0.03694971 = score(doc=494,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.36057037 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
    0.027673291 = product of:
      0.055346582 = sum of:
        0.055346582 = weight(_text_:online in 494) [ClassicSimilarity], result of:
          0.055346582 = score(doc=494,freq=6.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.5807781 = fieldWeight in 494, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.078125 = fieldNorm(doc=494)
      0.5 = coord(1/2)
    0.018517928 = weight(_text_:information in 494) [ClassicSimilarity], result of:
      0.018517928 = score(doc=494,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.3359395 = fieldWeight in 494, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
    0.031744417 = weight(_text_:retrieval in 494) [ClassicSimilarity], result of:
      0.031744417 = score(doc=494,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.33420905 = fieldWeight in 494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=494)
  0.3125 = coord(5/16)

Imprint: Hinskey Hill : Learned Information
Source: Online information 99: 23rd International Online Information Meeting, Proceedings, London, 7-9 December 1999. Ed.: D. Raitt et al
Theme: Klassifikationssysteme im Online-Retrieval

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.06

0.056757666 = product of:
  0.15135378 = sum of:
    0.04767549 = weight(_text_:wide in 1673) [ClassicSimilarity], result of:
      0.04767549 = score(doc=1673,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.342674 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.04479914 = weight(_text_:web in 1673) [ClassicSimilarity], result of:
      0.04479914 = score(doc=1673,freq=6.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.43716836 = fieldWeight in 1673, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.0111840265 = product of:
      0.022368053 = sum of:
        0.022368053 = weight(_text_:online in 1673) [ClassicSimilarity], result of:
          0.022368053 = score(doc=1673,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.23471867 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
    0.010583877 = weight(_text_:information in 1673) [ClassicSimilarity], result of:
      0.010583877 = score(doc=1673,freq=4.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.1920054 = fieldWeight in 1673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.022221092 = weight(_text_:retrieval in 1673) [ClassicSimilarity], result of:
      0.022221092 = score(doc=1673,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.23394634 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.014890149 = product of:
      0.029780298 = sum of:
        0.029780298 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.029780298 = score(doc=1673,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.375 = coord(6/16)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia; vgl. auch: http://www7.scu.edu.au/programme/posters/1846/com1846.htm.
Theme: Klassifikationssysteme im Online-Retrieval

Wätjen, H.-J.: Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web : das DFG-Projekt GERHARD (1998) 0.04

0.03984928 = product of:
  0.15939713 = sum of:
    0.06810784 = weight(_text_:wide in 3066) [ClassicSimilarity], result of:
      0.06810784 = score(doc=3066,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.48953426 = fieldWeight in 3066, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.078125 = fieldNorm(doc=3066)
    0.03694971 = weight(_text_:web in 3066) [ClassicSimilarity], result of:
      0.03694971 = score(doc=3066,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.36057037 = fieldWeight in 3066, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=3066)
    0.022595147 = product of:
      0.045190293 = sum of:
        0.045190293 = weight(_text_:online in 3066) [ClassicSimilarity], result of:
          0.045190293 = score(doc=3066,freq=4.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.47420335 = fieldWeight in 3066, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.078125 = fieldNorm(doc=3066)
      0.5 = coord(1/2)
    0.031744417 = weight(_text_:retrieval in 3066) [ClassicSimilarity], result of:
      0.031744417 = score(doc=3066,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.33420905 = fieldWeight in 3066, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=3066)
  0.25 = coord(4/16)

Footnote: Vortrag auf der 20. Online-Tagung der Deutschen Gesellschaft für Dokumentation, 5.-7.5.1998. Session 3: WWW-Suchmaschinen
Theme: Klassifikationssysteme im Online-Retrieval

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.03

0.031523272 = product of:
  0.12609309 = sum of:
    0.06742332 = weight(_text_:wide in 7209) [ClassicSimilarity], result of:
      0.06742332 = score(doc=7209,freq=4.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.4846142 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.025864797 = weight(_text_:web in 7209) [ClassicSimilarity], result of:
      0.025864797 = score(doc=7209,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.25239927 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.010583877 = weight(_text_:information in 7209) [ClassicSimilarity], result of:
      0.010583877 = score(doc=7209,freq=4.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.1920054 = fieldWeight in 7209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
    0.022221092 = weight(_text_:retrieval in 7209) [ClassicSimilarity], result of:
      0.022221092 = score(doc=7209,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.23394634 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.25 = coord(4/16)

Abstract: The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources

Miyamoto, S.: Information clustering based an fuzzy multisets (2003) 0.03

0.030824367 = product of:
  0.12329747 = sum of:
    0.04767549 = weight(_text_:wide in 1071) [ClassicSimilarity], result of:
      0.04767549 = score(doc=1071,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.342674 = fieldWeight in 1071, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
    0.025864797 = weight(_text_:web in 1071) [ClassicSimilarity], result of:
      0.025864797 = score(doc=1071,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.25239927 = fieldWeight in 1071, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
    0.018331813 = weight(_text_:information in 1071) [ClassicSimilarity], result of:
      0.018331813 = score(doc=1071,freq=12.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.3325631 = fieldWeight in 1071, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
    0.031425368 = weight(_text_:retrieval in 1071) [ClassicSimilarity], result of:
      0.031425368 = score(doc=1071,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.33085006 = fieldWeight in 1071, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1071)
  0.25 = coord(4/16)

Abstract: A fuzzy multiset model for information clustering is proposed with application to information retrieval on the World Wide Web. Noting that a search engine retrieves multiple occurrences of the same subjects with possibly different degrees of relevance, we observe that fuzzy multisets provide an appropriate model of information retrieval on the WWW. Information clustering which means both term clustering and document clustering is considered. Three methods of the hard c-means, fuzzy c-means, and an agglomerative method using cluster centers are proposed. Two distances between fuzzy multisets and algorithms for calculating cluster centers are defined. Theoretical properties concerning the clustering algorithms are studied. Illustrative examples are given to show how the algorithms work.
Source: Information processing and management. 39(2003) no.2, S.195-213

Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.03

0.030720102 = product of:
  0.098304324 = sum of:
    0.027243135 = weight(_text_:wide in 2596) [ClassicSimilarity], result of:
      0.027243135 = score(doc=2596,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.1958137 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.0147798825 = weight(_text_:web in 2596) [ClassicSimilarity], result of:
      0.0147798825 = score(doc=2596,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.14422815 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.0060479296 = weight(_text_:information in 2596) [ClassicSimilarity], result of:
      0.0060479296 = score(doc=2596,freq=4.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.10971737 = fieldWeight in 2596, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.02839307 = weight(_text_:retrieval in 2596) [ClassicSimilarity], result of:
      0.02839307 = score(doc=2596,freq=10.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.29892567 = fieldWeight in 2596, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
    0.021840302 = weight(_text_:software in 2596) [ClassicSimilarity], result of:
      0.021840302 = score(doc=2596,freq=2.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.17532499 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.3125 = coord(5/16)

Content: Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support

Wätjen, H.-J.: GERHARD : Automatisches Sammeln, Klassifizieren und Indexieren von wissenschaftlich relevanten Informationsressourcen im deutschen World Wide Web (1998) 0.03

0.02941474 = product of:
  0.11765896 = sum of:
    0.04767549 = weight(_text_:wide in 3064) [ClassicSimilarity], result of:
      0.04767549 = score(doc=3064,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.342674 = fieldWeight in 3064, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3064)
    0.036578346 = weight(_text_:web in 3064) [ClassicSimilarity], result of:
      0.036578346 = score(doc=3064,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.35694647 = fieldWeight in 3064, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3064)
    0.0111840265 = product of:
      0.022368053 = sum of:
        0.022368053 = weight(_text_:online in 3064) [ClassicSimilarity], result of:
          0.022368053 = score(doc=3064,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.23471867 = fieldWeight in 3064, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3064)
      0.5 = coord(1/2)
    0.022221092 = weight(_text_:retrieval in 3064) [ClassicSimilarity], result of:
      0.022221092 = score(doc=3064,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.23394634 = fieldWeight in 3064, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3064)
  0.25 = coord(4/16)

Abstract: Die intellektuelle Erschließung des Internet befindet sich in einer Krise. Yahoo und andere Dienste können mit dem Wachstum des Web nicht mithalten. GERHARD ist derzeit weltweit der einzige Such- und Navigationsdienst, der die mit einem Roboter gesammelten Internetressourcen mit computerlinguistischen und statistischen Verfahren auch automatisch vollständig klassifiziert. Weit über eine Million HTML-Dokumente von wissenschaftlich relevanten Servern in Deutschland können wie bei anderen Suchmaschinen in der Datenbank gesucht, aber auch über die Navigation in der dreisprachigen Universalen Dezimalklassifikation (ETH-Bibliothek Zürich) recherchiert werden
Theme: Klassifikationssysteme im Online-Retrieval

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.03

0.02653515 = product of:
  0.08491248 = sum of:
    0.038527615 = weight(_text_:wide in 3284) [ClassicSimilarity], result of:
      0.038527615 = score(doc=3284,freq=4.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.2769224 = fieldWeight in 3284, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.020901911 = weight(_text_:web in 3284) [ClassicSimilarity], result of:
      0.020901911 = score(doc=3284,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.2039694 = fieldWeight in 3284, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.004276532 = weight(_text_:information in 3284) [ClassicSimilarity], result of:
      0.004276532 = score(doc=3284,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.0775819 = fieldWeight in 3284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.012697767 = weight(_text_:retrieval in 3284) [ClassicSimilarity], result of:
      0.012697767 = score(doc=3284,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.13368362 = fieldWeight in 3284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.008508657 = product of:
      0.017017314 = sum of:
        0.017017314 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.017017314 = score(doc=3284,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.3125 = coord(5/16)

Abstract: Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"
Date: 22. 1.2010 14:41:24

Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.03

0.025193062 = product of:
  0.134363 = sum of:
    0.06650947 = weight(_text_:web in 2100) [ClassicSimilarity], result of:
      0.06650947 = score(doc=2100,freq=18.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.64902663 = fieldWeight in 2100, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.011110757 = weight(_text_:information in 2100) [ClassicSimilarity], result of:
      0.011110757 = score(doc=2100,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.20156369 = fieldWeight in 2100, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
    0.05674277 = weight(_text_:software in 2100) [ClassicSimilarity], result of:
      0.05674277 = score(doc=2100,freq=6.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.4555077 = fieldWeight in 2100, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=2100)
  0.1875 = coord(3/16)

Abstract: This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Source: Information processing and management. 44(2008) no.4, S.1410-1430

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.02
```
0.023589497 = product of:
  0.09435799 = sum of:
    0.027243135 = weight(_text_:wide in 2741) [ClassicSimilarity], result of:
      0.027243135 = score(doc=2741,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.1958137 = fieldWeight in 2741, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.05119902 = weight(_text_:web in 2741) [ClassicSimilarity], result of:
      0.05119902 = score(doc=2741,freq=24.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.49962097 = fieldWeight in 2741, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.007407171 = weight(_text_:information in 2741) [ClassicSimilarity], result of:
      0.007407171 = score(doc=2741,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.1343758 = fieldWeight in 2741, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.008508657 = product of:
      0.017017314 = sum of:
        0.017017314 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.017017314 = score(doc=2741,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.5 = coord(1/2)
  0.25 = coord(4/16)
```
Abstract

This study seeks to find out how human beings cluster Web pages naturally. Twenty Web pages retrieved by the Northem Light search engine for each of 10 queries were sorted by 3 subjects into categories that were natural or meaningful to them. lt was found that different subjects clustered the same set of Web pages quite differently and created different categories. The average inter-subject similarity of the clusters created was a low 0.27. Subjects created an average of 5.4 clusters for each sorting. The categories constructed can be divided into 10 types. About 1/3 of the categories created were topical. Another 20% of the categories relate to the degree of relevance or usefulness. The rest of the categories were subject-independent categories such as format, purpose, authoritativeness and direction to other sources. The authors plan to develop automatic methods for categorizing Web pages using the common categories created by the subjects. lt is hoped that the techniques developed can be used by Web search engines to automatically organize Web pages retrieved into categories that are natural to users. 1. Introduction The World Wide Web is an increasingly important source of information for people globally because of its ease of access, the ease of publishing, its ability to transcend geographic and national boundaries, its flexibility and heterogeneity and its dynamic nature. However, Web users also find it increasingly difficult to locate relevant and useful information in this vast information storehouse. Web search engines, despite their scope and power, appear to be quite ineffective. They retrieve too many pages, and though they attempt to rank retrieved pages in order of probable relevance, often the relevant documents do not appear in the top-ranked 10 or 20 documents displayed. Several studies have found that users do not know how to use the advanced features of Web search engines, and do not know how to formulate and re-formulate queries. Users also typically exert minimal effort in performing, evaluating and refining their searches, and are unwilling to scan more than 10 or 20 items retrieved (Jansen, Spink, Bateman & Saracevic, 1998). This suggests that the conventional ranked-list display of search results does not satisfy user requirements, and that better ways of presenting and summarizing search results have to be developed. One promising approach is to group retrieved pages into clusters or categories to allow users to navigate immediately to the "promising" clusters where the most useful Web pages are likely to be located. This approach has been adopted by a number of search engines (notably Northem Light) and search agents.

Date

12. 9.2004 9:56:22

Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.02

0.020803574 = product of:
  0.11095239 = sum of:
    0.03405392 = weight(_text_:wide in 1070) [ClassicSimilarity], result of:
      0.03405392 = score(doc=1070,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.24476713 = fieldWeight in 1070, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1070)
    0.071552806 = weight(_text_:web in 1070) [ClassicSimilarity], result of:
      0.071552806 = score(doc=1070,freq=30.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.69824153 = fieldWeight in 1070, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1070)
    0.005345665 = weight(_text_:information in 1070) [ClassicSimilarity], result of:
      0.005345665 = score(doc=1070,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.09697737 = fieldWeight in 1070, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1070)
  0.1875 = coord(3/16)

Abstract: Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.
Source: Information processing and management. 39(2003) no.1, S.25-44

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.02
```
0.020617982 = product of:
  0.06597754 = sum of:
    0.020432351 = weight(_text_:wide in 1253) [ClassicSimilarity], result of:
      0.020432351 = score(doc=1253,freq=2.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.14686027 = fieldWeight in 1253, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.015676433 = weight(_text_:web in 1253) [ClassicSimilarity], result of:
      0.015676433 = score(doc=1253,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.15297705 = fieldWeight in 1253, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.006778544 = product of:
      0.013557088 = sum of:
        0.013557088 = weight(_text_:online in 1253) [ClassicSimilarity], result of:
          0.013557088 = score(doc=1253,freq=4.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.142261 = fieldWeight in 1253, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
    0.009622197 = weight(_text_:information in 1253) [ClassicSimilarity], result of:
      0.009622197 = score(doc=1253,freq=18.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.17455927 = fieldWeight in 1253, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
    0.013468015 = weight(_text_:retrieval in 1253) [ClassicSimilarity], result of:
      0.013468015 = score(doc=1253,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.1417929 = fieldWeight in 1253, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0234375 = fieldNorm(doc=1253)
  0.3125 = coord(5/16)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
We are currently experimenting with newsgroups as collections. We have built an initial prototype which automatically classifies and summarizes newsgroups within the LCC. (The prototype can be tested below, and more details may be found at http://pharos.alexandria.ucsb.edu/). The prototype uses electronic library catalog records as a `training set' and Latent Semantic Indexing (LSI) for IR. We use the training set to build a rich set of classification terminology, and associate these terms with the relevant categories in the LCC. This association between terms and classification categories allows us to relate users' queries to nodes in the LCC so that users can select appropriate query categories. Newsgroups are similarly associated with classification categories. Pharos then matches the categories selected by users to relevant newsgroups. In principle, this approach allows users to exclude newsgroups that might have been selected based on an unintended meaning of a query term, and to include newsgroups with relevant content even though the exact query terms may not have been used. This work is extensible to other types of classification, including geographical, temporal, and image feature. Before discussing the methodology of the collection summarization and selection, we first present an online demonstration below. The demonstration is not intended to be a complete end-user interface. Rather, it is intended merely to offer a view of the process to suggest the "look and feel" of the prototype. The demo works as follows. First supply it with a few keywords of interest. The system will then use those terms to try to return to you the most relevant subject categories within the LCC. Assuming that the system recognizes any of your terms (it has over 400,000 terms indexed), it will give you a list of 15 LCC categories sorted by relevancy ranking. From there, you have two choices. The first choice, by clicking on the "News" links, is to get a list of newsgroups which the system has identified as relevant to the LCC category you select. The other choice, by clicking on the LCC ID links, is to enter the LCC hierarchy starting at the category of your choice and navigate the tree until you locate the best category for your query. From there, again, you can get a list of newsgroups by clicking on the "News" links. After having shown this demonstration to many people, we would like to suggest that you first give it easier examples before trying to break it. For example, "prostate cancer" (discussed below), "remote sensing", "investment banking", and "gershwin" all work reasonably well.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02

0.019921143 = product of:
  0.07968457 = sum of:
    0.01597718 = product of:
      0.03195436 = sum of:
        0.03195436 = weight(_text_:online in 611) [ClassicSimilarity], result of:
          0.03195436 = score(doc=611,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.33531237 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
    0.01069133 = weight(_text_:information in 611) [ClassicSimilarity], result of:
      0.01069133 = score(doc=611,freq=2.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.19395474 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.031744417 = weight(_text_:retrieval in 611) [ClassicSimilarity], result of:
      0.031744417 = score(doc=611,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.33420905 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.021271642 = product of:
      0.042543285 = sum of:
        0.042543285 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.042543285 = score(doc=611,freq=2.0), product of:
            0.10995905 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.031400457 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.25 = coord(4/16)

Content: Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
Date: 22. 8.2009 12:54:24
Theme: Klassifikationssysteme im Online-Retrieval

Krüger, C.: Evaluation des WWW-Suchdienstes GERHARD unter besonderer Beachtung automatischer Indexierung (1999) 0.02
```
0.01813755 = product of:
  0.0967336 = sum of:
    0.04815952 = weight(_text_:wide in 1777) [ClassicSimilarity], result of:
      0.04815952 = score(doc=1777,freq=4.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.34615302 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
    0.026127389 = weight(_text_:web in 1777) [ClassicSimilarity], result of:
      0.026127389 = score(doc=1777,freq=4.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.25496176 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
    0.022446692 = weight(_text_:retrieval in 1777) [ClassicSimilarity], result of:
      0.022446692 = score(doc=1777,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.23632148 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
  0.1875 = coord(3/16)
```
Abstract

Die vorliegende Arbeit beinhaltet eine Beschreibung und Evaluation des WWW - Suchdienstes GERHARD (German Harvest Automated Retrieval and Directory). GERHARD ist ein Such- und Navigationssystem für das deutsche World Wide Web, weiches ausschließlich wissenschaftlich relevante Dokumente sammelt, und diese auf der Basis computerlinguistischer und statistischer Methoden automatisch mit Hilfe eines bibliothekarischen Klassifikationssystems klassifiziert. Mit dem DFG - Projekt GERHARD ist der Versuch unternommen worden, mit einem auf einem automatischen Klassifizierungsverfahren basierenden World Wide Web - Dienst eine Alternative zu herkömmlichen Methoden der Interneterschließung zu entwickeln. GERHARD ist im deutschsprachigen Raum das einzige Verzeichnis von Internetressourcen, dessen Erstellung und Aktualisierung vollständig automatisch (also maschinell) erfolgt. GERHARD beschränkt sich dabei auf den Nachweis von Dokumenten auf wissenschaftlichen WWW - Servern. Die Grundidee dabei war, kostenintensive intellektuelle Erschließung und Klassifizierung von lnternetseiten durch computerlinguistische und statistische Methoden zu ersetzen, um auf diese Weise die nachgewiesenen Internetressourcen automatisch auf das Vokabular eines bibliothekarischen Klassifikationssystems abzubilden. GERHARD steht für German Harvest Automated Retrieval and Directory. Die WWW - Adresse (URL) von GERHARD lautet: http://www.gerhard.de. Im Rahmen der vorliegenden Diplomarbeit soll eine Beschreibung des Dienstes mit besonderem Schwerpunkt auf dem zugrundeliegenden Indexierungs- bzw. Klassifizierungssystem erfolgen und anschließend mit Hilfe eines kleinen Retrievaltests die Effektivität von GERHARD überprüft werden.

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.02

0.016748656 = product of:
  0.06699462 = sum of:
    0.00798859 = product of:
      0.01597718 = sum of:
        0.01597718 = weight(_text_:online in 2836) [ClassicSimilarity], result of:
          0.01597718 = score(doc=2836,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.16765618 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.5 = coord(1/2)
    0.009258964 = weight(_text_:information in 2836) [ClassicSimilarity], result of:
      0.009258964 = score(doc=2836,freq=6.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.16796975 = fieldWeight in 2836, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.022446692 = weight(_text_:retrieval in 2836) [ClassicSimilarity], result of:
      0.022446692 = score(doc=2836,freq=4.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.23632148 = fieldWeight in 2836, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
    0.027300376 = weight(_text_:software in 2836) [ClassicSimilarity], result of:
      0.027300376 = score(doc=2836,freq=2.0), product of:
        0.124570385 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.031400457 = queryNorm
        0.21915624 = fieldWeight in 2836, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
  0.25 = coord(4/16)

Abstract: Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.02

0.016447278 = product of:
  0.087718815 = sum of:
    0.04815952 = weight(_text_:wide in 5997) [ClassicSimilarity], result of:
      0.04815952 = score(doc=5997,freq=4.0), product of:
        0.13912784 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.031400457 = queryNorm
        0.34615302 = fieldWeight in 5997, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.031999387 = weight(_text_:web in 5997) [ClassicSimilarity], result of:
      0.031999387 = score(doc=5997,freq=6.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.3122631 = fieldWeight in 5997, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
    0.007559912 = weight(_text_:information in 5997) [ClassicSimilarity], result of:
      0.007559912 = score(doc=5997,freq=4.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.13714671 = fieldWeight in 5997, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5997)
  0.1875 = coord(3/16)

Abstract: Given the huge amount of information in the internet and in practically every domain of knowledge that we are facing today, knowledge discovery calls for automation. The book deals with methods from classification and data analysis that respond effectively to this rapidly growing challenge. The interested reader will find new methodological insights as well as applications in economics, management science, finance, and marketing, and in pattern recognition, biology, health, and archaeology.
Content: Data Analysis, Statistics, and Classification.- Pattern Recognition and Automation.- Data Mining, Information Processing, and Automation.- New Media, Web Mining, and Automation.- Applications in Management Science, Finance, and Marketing.- Applications in Medicine, Biology, Archaeology, and Others.- Author Index.- Subject Index.
RSWK: World Wide Web / Wissensorganisation / Kongress / Passau <2000>
Subject: World Wide Web / Wissensorganisation / Kongress / Passau <2000>

Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.02

0.015875868 = product of:
  0.0846713 = sum of:
    0.03694971 = weight(_text_:web in 4180) [ClassicSimilarity], result of:
      0.03694971 = score(doc=4180,freq=2.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.36057037 = fieldWeight in 4180, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=4180)
    0.01597718 = product of:
      0.03195436 = sum of:
        0.03195436 = weight(_text_:online in 4180) [ClassicSimilarity], result of:
          0.03195436 = score(doc=4180,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.33531237 = fieldWeight in 4180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.078125 = fieldNorm(doc=4180)
      0.5 = coord(1/2)
    0.031744417 = weight(_text_:retrieval in 4180) [ClassicSimilarity], result of:
      0.031744417 = score(doc=4180,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.33420905 = fieldWeight in 4180, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=4180)
  0.1875 = coord(3/16)

Theme: Klassifikationssysteme im Online-Retrieval

Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.02

0.015855024 = product of:
  0.063420095 = sum of:
    0.031999387 = weight(_text_:web in 5041) [ClassicSimilarity], result of:
      0.031999387 = score(doc=5041,freq=6.0), product of:
        0.10247572 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.031400457 = queryNorm
        0.3122631 = fieldWeight in 5041, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5041)
    0.00798859 = product of:
      0.01597718 = sum of:
        0.01597718 = weight(_text_:online in 5041) [ClassicSimilarity], result of:
          0.01597718 = score(doc=5041,freq=2.0), product of:
            0.09529729 = queryWeight, product of:
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.031400457 = queryNorm
            0.16765618 = fieldWeight in 5041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0349014 = idf(docFreq=5778, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
      0.5 = coord(1/2)
    0.007559912 = weight(_text_:information in 5041) [ClassicSimilarity], result of:
      0.007559912 = score(doc=5041,freq=4.0), product of:
        0.055122808 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.031400457 = queryNorm
        0.13714671 = fieldWeight in 5041, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5041)
    0.015872208 = weight(_text_:retrieval in 5041) [ClassicSimilarity], result of:
      0.015872208 = score(doc=5041,freq=2.0), product of:
        0.09498371 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.031400457 = queryNorm
        0.16710453 = fieldWeight in 5041, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5041)
  0.25 = coord(4/16)

Abstract: Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.
Source: Information processing and management. 56(2019) no.1, S.228-246

Search (183 results, page 1 of 10)

Authors

Years

Languages

Types

Themes

Subjects