Search (278 results, page 2 of 14)

Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.02

0.017550126 = product of:
  0.03510025 = sum of:
    0.03510025 = product of:
      0.052650377 = sum of:
        0.0072548515 = weight(_text_:e in 1139) [ClassicSimilarity], result of:
          0.0072548515 = score(doc=1139,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.1111659 = fieldWeight in 1139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.045395527 = weight(_text_:p in 1139) [ClassicSimilarity], result of:
          0.045395527 = score(doc=1139,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.27807623 = fieldWeight in 1139, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e
Source: Cataloging and classification quarterly. 60(2022) no.8, p.807-835

Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.02
```
0.017012816 = product of:
  0.03402563 = sum of:
    0.03402563 = product of:
      0.051038444 = sum of:
        0.005182037 = weight(_text_:e in 3151) [ClassicSimilarity], result of:
          0.005182037 = score(doc=3151,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.07940422 = fieldWeight in 3151, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
        0.04585641 = weight(_text_:p in 3151) [ClassicSimilarity], result of:
          0.04585641 = score(doc=3151,freq=4.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.28089944 = fieldWeight in 3151, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Language

e

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.02

0.016771864 = product of:
  0.03354373 = sum of:
    0.03354373 = product of:
      0.05031559 = sum of:
        0.0072548515 = weight(_text_:e in 5001) [ClassicSimilarity], result of:
          0.0072548515 = score(doc=5001,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.1111659 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
        0.04306074 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.04306074 = score(doc=5001,freq=2.0), product of:
            0.15899497 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0454034 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 14. 3.1996 13:22:21
Language: e

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.02

0.016771864 = product of:
  0.03354373 = sum of:
    0.03354373 = product of:
      0.05031559 = sum of:
        0.0072548515 = weight(_text_:e in 530) [ClassicSimilarity], result of:
          0.0072548515 = score(doc=530,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.1111659 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
        0.04306074 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.04306074 = score(doc=530,freq=2.0), product of:
            0.15899497 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0454034 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.02

0.016771864 = product of:
  0.03354373 = sum of:
    0.03354373 = product of:
      0.05031559 = sum of:
        0.0072548515 = weight(_text_:e in 2673) [ClassicSimilarity], result of:
          0.0072548515 = score(doc=2673,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.1111659 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.04306074 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.04306074 = score(doc=2673,freq=2.0), product of:
            0.15899497 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0454034 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 1. 8.1996 22:08:06
Language: e

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.02

0.016771864 = product of:
  0.03354373 = sum of:
    0.03354373 = product of:
      0.05031559 = sum of:
        0.0072548515 = weight(_text_:e in 5291) [ClassicSimilarity], result of:
          0.0072548515 = score(doc=5291,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.1111659 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
        0.04306074 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.04306074 = score(doc=5291,freq=2.0), product of:
            0.15899497 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0454034 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 22. 7.2006 17:32:00
Language: e

Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.02
```
0.015901554 = product of:
  0.03180311 = sum of:
    0.03180311 = product of:
      0.04770466 = sum of:
        0.008794209 = weight(_text_:e in 3422) [ClassicSimilarity], result of:
          0.008794209 = score(doc=3422,freq=4.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.13475344 = fieldWeight in 3422, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
        0.038910452 = weight(_text_:p in 3422) [ClassicSimilarity], result of:
          0.038910452 = score(doc=3422,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.23835106 = fieldWeight in 3422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

This article presents an unsupervised algorithm for semantic annotation of morphological descriptions of whole organisms. The algorithm is able to annotate plain text descriptions with high accuracy at the clause level by exploiting the corpus itself. In other words, the algorithm does not need lexicons, syntactic parsers, training examples, or annotation templates. The evaluation on two real-life description collections in botany and paleontology shows that the algorithm has the following desirable features: (a) reduces/eliminates manual labor required to compile dictionaries and prepare source documents; (b) improves annotation coverage: the algorithm annotates what appears in documents and is not limited by predefined and often incomplete templates; (c) learns clean and reusable concepts: the algorithm learns organ names and character states that can be used to construct reusable domain lexicons, as opposed to collection-dependent patterns whose applicability is often limited to a particular collection; (d) insensitive to collection size; and (e) runs in linear time with respect to the number of clauses to be annotated.

Language

e

Oberhauser, O.; Labner, J.: Einführung der automatischen Indexierung im Österreichischen Verbundkatalog? : Bericht über eine empirische Studie (2003) 0.02

0.015131842 = product of:
  0.030263685 = sum of:
    0.030263685 = product of:
      0.090791054 = sum of:
        0.090791054 = weight(_text_:p in 1878) [ClassicSimilarity], result of:
          0.090791054 = score(doc=1878,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.55615246 = fieldWeight in 1878, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.109375 = fieldNorm(doc=1878)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Type: p

Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.02

0.015042966 = product of:
  0.030085932 = sum of:
    0.030085932 = product of:
      0.045128897 = sum of:
        0.006218444 = weight(_text_:e in 4311) [ClassicSimilarity], result of:
          0.006218444 = score(doc=4311,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.09528506 = fieldWeight in 4311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
        0.038910452 = weight(_text_:p in 4311) [ClassicSimilarity], result of:
          0.038910452 = score(doc=4311,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.23835106 = fieldWeight in 4311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e

Matthews, P.; Glitre, K.: Genre analysis of movies using a topic model of plot summaries (2021) 0.02

0.015042966 = product of:
  0.030085932 = sum of:
    0.030085932 = product of:
      0.045128897 = sum of:
        0.006218444 = weight(_text_:e in 412) [ClassicSimilarity], result of:
          0.006218444 = score(doc=412,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.09528506 = fieldWeight in 412, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=412)
        0.038910452 = weight(_text_:p in 412) [ClassicSimilarity], result of:
          0.038910452 = score(doc=412,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.23835106 = fieldWeight in 412, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.046875 = fieldNorm(doc=412)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e

Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.02

0.015042966 = product of:
  0.030085932 = sum of:
    0.030085932 = product of:
      0.045128897 = sum of:
        0.006218444 = weight(_text_:e in 720) [ClassicSimilarity], result of:
          0.006218444 = score(doc=720,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.09528506 = fieldWeight in 720, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=720)
        0.038910452 = weight(_text_:p in 720) [ClassicSimilarity], result of:
          0.038910452 = score(doc=720,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.23835106 = fieldWeight in 720, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.046875 = fieldNorm(doc=720)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e
Source: Cataloging and classification quarterly. 59(2021) no.8, p.815-834

Asula, M.; Makke, J.; Freienthal, L.; Kuulmets, H.-A.; Sirel, R.: Kratt: developing an automatic subject indexing tool for the National Library of Estonia : how to transfer metadata information among work cluster members (2021) 0.02

0.015042966 = product of:
  0.030085932 = sum of:
    0.030085932 = product of:
      0.045128897 = sum of:
        0.006218444 = weight(_text_:e in 723) [ClassicSimilarity], result of:
          0.006218444 = score(doc=723,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.09528506 = fieldWeight in 723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=723)
        0.038910452 = weight(_text_:p in 723) [ClassicSimilarity], result of:
          0.038910452 = score(doc=723,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.23835106 = fieldWeight in 723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.046875 = fieldNorm(doc=723)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e
Source: Cataloging and classification quarterly. 59(2021) no.8, p.775-793

Ward, M.L.: ¬The future of the human indexer (1996) 0.01

0.014375883 = product of:
  0.028751766 = sum of:
    0.028751766 = product of:
      0.04312765 = sum of:
        0.006218444 = weight(_text_:e in 7244) [ClassicSimilarity], result of:
          0.006218444 = score(doc=7244,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.09528506 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
        0.036909204 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
          0.036909204 = score(doc=7244,freq=2.0), product of:
            0.15899497 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0454034 = queryNorm
            0.23214069 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Date: 9. 2.1997 18:44:22
Language: e

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.01

0.01435358 = product of:
  0.02870716 = sum of:
    0.02870716 = product of:
      0.08612148 = sum of:
        0.08612148 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.08612148 = score(doc=262,freq=2.0), product of:
            0.15899497 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0454034 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 20.10.2000 12:22:23

Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.01

0.013251295 = product of:
  0.02650259 = sum of:
    0.02650259 = product of:
      0.039753884 = sum of:
        0.007328507 = weight(_text_:e in 3667) [ClassicSimilarity], result of:
          0.007328507 = score(doc=3667,freq=4.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.112294525 = fieldWeight in 3667, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3667)
        0.032425378 = weight(_text_:p in 3667) [ClassicSimilarity], result of:
          0.032425378 = score(doc=3667,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.19862589 = fieldWeight in 3667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3667)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e
Source: Metadata and semantics research: 9th Research Conference, MTSR 2015, Manchester, UK, September 9-11, 2015, Proceedings. Eds.: E. Garoufallou et al

Schäuble, P.: Kostengünstige Konversion großer Bibliothekskataloge (1996) 0.01

0.012970151 = product of:
  0.025940303 = sum of:
    0.025940303 = product of:
      0.077820905 = sum of:
        0.077820905 = weight(_text_:p in 4232) [ClassicSimilarity], result of:
          0.077820905 = score(doc=4232,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.47670212 = fieldWeight in 4232, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.09375 = fieldNorm(doc=4232)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.01
```
0.012794125 = product of:
  0.02558825 = sum of:
    0.02558825 = product of:
      0.038382374 = sum of:
        0.0062828865 = weight(_text_:e in 4285) [ClassicSimilarity], result of:
          0.0062828865 = score(doc=4285,freq=6.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.096272506 = fieldWeight in 4285, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
        0.032099485 = weight(_text_:p in 4285) [ClassicSimilarity], result of:
          0.032099485 = score(doc=4285,freq=4.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.1966296 = fieldWeight in 4285, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)
```
Abstract

The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).

Language

e

Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.01

0.012535806 = product of:
  0.025071612 = sum of:
    0.025071612 = product of:
      0.037607417 = sum of:
        0.005182037 = weight(_text_:e in 896) [ClassicSimilarity], result of:
          0.005182037 = score(doc=896,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.07940422 = fieldWeight in 896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0390625 = fieldNorm(doc=896)
        0.032425378 = weight(_text_:p in 896) [ClassicSimilarity], result of:
          0.032425378 = score(doc=896,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.19862589 = fieldWeight in 896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=896)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.01

0.012535806 = product of:
  0.025071612 = sum of:
    0.025071612 = product of:
      0.037607417 = sum of:
        0.005182037 = weight(_text_:e in 3300) [ClassicSimilarity], result of:
          0.005182037 = score(doc=3300,freq=2.0), product of:
            0.06526148 = queryWeight, product of:
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0454034 = queryNorm
            0.07940422 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.43737 = idf(docFreq=28552, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
        0.032425378 = weight(_text_:p in 3300) [ClassicSimilarity], result of:
          0.032425378 = score(doc=3300,freq=2.0), product of:
            0.1632485 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0454034 = queryNorm
            0.19862589 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.6666667 = coord(2/3)
  0.5 = coord(1/2)

Language: e

Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.01

0.012303068 = product of:
  0.024606137 = sum of:
    0.024606137 = product of:
      0.07381841 = sum of:
        0.07381841 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
          0.07381841 = score(doc=58,freq=2.0), product of:
            0.15899497 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0454034 = queryNorm
            0.46428138 = fieldWeight in 58, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=58)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Date: 14. 6.2015 22:12:44

Search (278 results, page 2 of 14)

Authors

Years

Languages

Types

Themes

Subjects

Classifications