Search (134 results, page 1 of 7)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.55

0.5487897 = product of:
  0.9479095 = sum of:
    0.040955298 = product of:
      0.122865885 = sum of:
        0.122865885 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.122865885 = score(doc=562,freq=2.0), product of:
            0.21861556 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.025786186 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.018205952 = weight(_text_:web in 562) [ClassicSimilarity], result of:
      0.018205952 = score(doc=562,freq=2.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.21634221 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.122865885 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.122865885 = score(doc=562,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.018205952 = weight(_text_:web in 562) [ClassicSimilarity], result of:
      0.018205952 = score(doc=562,freq=2.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.21634221 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.122865885 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.122865885 = score(doc=562,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.122865885 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.122865885 = score(doc=562,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.122865885 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.122865885 = score(doc=562,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.122865885 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.122865885 = score(doc=562,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.122865885 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.122865885 = score(doc=562,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.122865885 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.122865885 = score(doc=562,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.010481017 = product of:
      0.020962033 = sum of:
        0.020962033 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.020962033 = score(doc=562,freq=2.0), product of:
            0.09029883 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025786186 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.57894737 = coord(11/19)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.50

0.49650848 = product of:
  0.94336605 = sum of:
    0.036411904 = weight(_text_:web in 563) [ClassicSimilarity], result of:
      0.036411904 = score(doc=563,freq=8.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.43268442 = fieldWeight in 563, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.122865885 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.122865885 = score(doc=563,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.036411904 = weight(_text_:web in 563) [ClassicSimilarity], result of:
      0.036411904 = score(doc=563,freq=8.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.43268442 = fieldWeight in 563, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.122865885 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.122865885 = score(doc=563,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.122865885 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.122865885 = score(doc=563,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.122865885 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.122865885 = score(doc=563,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.122865885 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.122865885 = score(doc=563,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.122865885 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.122865885 = score(doc=563,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.122865885 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.122865885 = score(doc=563,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.010481017 = product of:
      0.020962033 = sum of:
        0.020962033 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.020962033 = score(doc=563,freq=2.0), product of:
            0.09029883 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025786186 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.5263158 = coord(10/19)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.38

0.3793754 = product of:
  0.9010166 = sum of:
    0.040955298 = product of:
      0.122865885 = sum of:
        0.122865885 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.122865885 = score(doc=862,freq=2.0), product of:
            0.21861556 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.025786186 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.122865885 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.122865885 = score(doc=862,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.122865885 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.122865885 = score(doc=862,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.122865885 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.122865885 = score(doc=862,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.122865885 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.122865885 = score(doc=862,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.122865885 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.122865885 = score(doc=862,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.122865885 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.122865885 = score(doc=862,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.122865885 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.122865885 = score(doc=862,freq=2.0), product of:
        0.21861556 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.025786186 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.42105263 = coord(8/19)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Information und Sprache : Beiträge zu Informationswissenschaft, Computerlinguistik, Bibliothekswesen und verwandten Fächern. Festschrift für Harald H. Zimmermann. Herausgegeben von Ilse Harms, Heinz-Dirk Luckhardt und Hans W. Giessen (2006) 0.05
```
0.046691675 = product of:
  0.14785698 = sum of:
    0.013569917 = weight(_text_:web in 91) [ClassicSimilarity], result of:
      0.013569917 = score(doc=91,freq=10.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.16125198 = fieldWeight in 91, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.015625 = fieldNorm(doc=91)
    0.013569917 = weight(_text_:web in 91) [ClassicSimilarity], result of:
      0.013569917 = score(doc=91,freq=10.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.16125198 = fieldWeight in 91, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.015625 = fieldNorm(doc=91)
    0.023495728 = weight(_text_:semantische in 91) [ClassicSimilarity], result of:
      0.023495728 = score(doc=91,freq=4.0), product of:
        0.13923967 = queryWeight, product of:
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.025786186 = queryNorm
        0.16874306 = fieldWeight in 91, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.015625 = fieldNorm(doc=91)
    0.048309542 = weight(_text_:ontologie in 91) [ClassicSimilarity], result of:
      0.048309542 = score(doc=91,freq=6.0), product of:
        0.18041065 = queryWeight, product of:
          6.996407 = idf(docFreq=109, maxDocs=44218)
          0.025786186 = queryNorm
        0.26777545 = fieldWeight in 91, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          6.996407 = idf(docFreq=109, maxDocs=44218)
          0.015625 = fieldNorm(doc=91)
    0.014223097 = weight(_text_:suche in 91) [ClassicSimilarity], result of:
      0.014223097 = score(doc=91,freq=2.0), product of:
        0.12883182 = queryWeight, product of:
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.025786186 = queryNorm
        0.1104005 = fieldWeight in 91, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.015625 = fieldNorm(doc=91)
    0.03468877 = product of:
      0.06937754 = sum of:
        0.06937754 = weight(_text_:aufsatzsammlung in 91) [ClassicSimilarity], result of:
          0.06937754 = score(doc=91,freq=16.0), product of:
            0.16918544 = queryWeight, product of:
              6.5610886 = idf(docFreq=169, maxDocs=44218)
              0.025786186 = queryNorm
            0.41006804 = fieldWeight in 91, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              6.5610886 = idf(docFreq=169, maxDocs=44218)
              0.015625 = fieldNorm(doc=91)
      0.5 = coord(1/2)
  0.31578946 = coord(6/19)
```
Content

Inhalt: Information und Sprache und mehr - eine Einleitung - Information und Kommunikation Wolf Rauch: Auch Information ist eine Tochter der Zeit Winfried Lenders: Information und kulturelles Gedächtnis Rainer Hammwöhner: Anmerkungen zur Grundlegung der Informationsethik Hans W. Giessen: Ehrwürdig stille Informationen Gernot Wersig: Vereinheitlichte Medientheorie und ihre Sicht auf das Internet Johann Haller, Anja Rütten: Informationswissenschaft und Translationswissenschaft: Spielarten oder Schwestern? Rainer Kuhlen: In Richtung Summarizing für Diskurse in K3 Werner Schweibenz: Sprache, Information und Bedeutung im Museum. Narrative Vermittlung durch Storytelling - Sprache und Computer, insbesondere Information Retrieval und Automatische Indexierung Manfred Thiel: Bedingt wahrscheinliche Syntaxbäume Jürgen Krause: Shell Model, Semantic Web and Web Information Retrieval Elisabeth Niggemann: Wer suchet, der findet? Verbesserung der inhaltlichen Suchmöglichkeiten im Informationssystem Der Deutschen Bibliothek Christa Womser-Hacker: Zur Rolle von Eigennamen im Cross-Language Information Retrieval Klaus-Dirk Schmitz: Wörterbuch, Thesaurus, Terminologie, Ontologie. Was tragen Terminologiewissenschaft und Informationswissenschaft zur Wissensordnung bei?
Jiri Panyr: Thesauri, Semantische Netze, Frames, Topic Maps, Taxonomien, Ontologien - begriffliche Verwirrung oder konzeptionelle Vielfalt? Heinz-Dieter Maas: Indexieren mit AUTINDEX Wilhelm Gaus, Rainer Kaluscha: Maschinelle inhaltliche Erschließung von Arztbriefen und Auswertung von Reha-Entlassungsberichten Klaus Lepsky: Automatische Indexierung des Reallexikons zur Deutschen Kunstgeschichte - Analysen und Entwicklungen Ilse Harms: Die computervermittelte Kommunikation als ein Instrument des Wissensmanagements in Organisationen August- Wilhelm Scheer, Dirk Werth: Geschäftsregel-basiertes Geschäftsprozessmanagement Thomas Seeger: Akkreditierung und Evaluierung von Hochschullehre und -forschung in Großbritannien. Hinweise für die Situation in Deutschland Bernd Hagenau: Gehabte Sorgen hab' ich gern? Ein Blick zurück auf die Deutschen Bibliothekartage 1975 bis 1980 - Persönliches Jorgo Chatzimarkakis: Sprache und Information in Europa Alfred Gulden: 7 Briefe und eine Anmerkung Günter Scholdt: Der Weg nach Europa im Spiegel von Mundartgedichten Alfred Guldens Wolfgang Müller: Prof. Dr. Harald H. Zimmermann - Seit 45 Jahren der Universität des Saarlandes verbunden Heinz-Dirk Luckhardt: Computerlinguistik und Informationswissenschaft: Facetten des wissenschaftlichen Wirkens von Harald H. Zimmermann Schriftenverzeichnis Harald H. Zimmermanns 1967-2005 - Projekte in Verantwortung von Harald H. Zimmermann - Adressen der Beiträgerinnen und Beiträger

Footnote

In Thesauri, Semantische Netze, Frames, Topic Maps, Taxonomien, Ontologien - begriffliche Verwirrung oder konzeptionelle Vielfalt? (S. 139-151) gibt Jiri Panyr (München/Saarbrücken) eine gut lesbare und nützliche Übersicht über die im Titel des Beitrags genannten semantischen Repräsentationsformen, die im Zusammenhang mit dem Internet und insbesondere mit dem vorgeschlagenen Semantic Web immer wieder - und zwar häufig unpräzise oder gar unrichtig - Anwendung finden. Insbesondere die Ausführungen zum Modebegriff Ontologie zeigen, dass dieser nicht leichtfertig als Quasi-Synonym zu Thesaurus oder Klassifikation verwendet werden darf. Panyrs Beitrag ist übrigens thematisch verwandt mit jenem von K.-D. Schmitz (Köln), Wörterbuch, Thesaurus, Terminologie, Ontologie (S. 129-137). Abgesehen von dem einfallslosen Titel Wer suchet, der findet? (S. 107- 118) - zum Glück mit dem Untertitel Verbesserung der inhaltlichen Suchmöglichkeiten im Informationssystem Der Deutschen Bibliothek versehen - handelt es sich bei diesem Artikel von Elisabeth Niggemann (Frankfurt am Main) zwar um keinen wissenschaftlichen, doch sicherlich den praktischsten, lesbarsten und aus bibliothekarischer Sicht interessantesten des Buches. Niggemann gibt einen Überblick über die bisherige sachliche Erschliessung der bibliographischen Daten der inzwischen zur Deutschen Nationalbibliothek mutierten DDB, sowie einen Statusbericht nebst Ausblick über gegenwärtige bzw. geplante Verbesserungen der inhaltlichen Suche. Dazu zählen der breite Einsatz eines automatischen Indexierungsverfahrens (MILOS/IDX) ebenso wie Aktivitäten im klassifikatorischen Bereich (DDC), die Vernetzung nationaler Schlagwortsysteme (Projekt MACS) sowie die Beschäftigung mit Crosskonkordanzen (CARMEN) und Ansätzen zur Heterogenitätsbehandlung. Das hier von zentraler Stelle deklarierte "commitment" hinsichtlich der Verbesserung der sachlichen Erschließung des nationalen Online-Informationssystems erfüllt den eher nur Kleinmut und Gleichgültigkeit gewohnten phäakischen Beobachter mit Respekt und wehmutsvollem Neid.
Mit automatischer Indexierung beschäftigen sich auch zwei weitere Beiträge. Indexieren mit AUTINDEX von H.-D. Mass (Saarbrücken) ist leider knapp und ohne didaktische Ambition verfasst, sodass man sich nicht wirklich vorstellen kann, wie dieses System funktioniert. Übersichtlicher stellt sich der Werkstattbericht Automatische Indexierung des Reallexikons zur deutschen Kunstgeschichte von K. Lepsky (Köln) dar, der zeigt, welche Probleme und Schritte bei der Digitalisierung, Indexierung und Web-Präsentation der Volltexte eines grossen fachlichen Nachschlagewerkes anfallen. Weitere interessante Beiträge befassen sich z.B. mit Summarizing-Leistungen im Rahmen eines e-Learning-Projektes (R. Kuhlen), mit dem Schalenmodell und dem Semantischen Web (J. Krause; aus nicht näher dargelegten Gründen in englischer Sprache) und mit der Akkreditierung/ Evaluierung von Hochschullehre und -forschung in Großbritannien (T. Seeger). In Summe liegt hier eine würdige Festschrift vor, über die sich der Gefeierte sicherlich gefreut haben wird. Für informationswissenschaftliche Spezialsammlungen und größere Bibliotheken ist der Band allemal eine Bereicherung. Ein Wermutstropfen aber doch: Obzwar mit Information und Sprache ein optisch ansprechend gestaltetes Buch produziert wurde, enthüllt eine nähere Betrachtung leider allzu viele Druckfehler, mangelhafte Worttrennungen, unkorrigierte grammatikalische Fehler, sowie auch Inkonsistenzen bei Kursivdruck und Satzzeichen. Lektoren und Korrektoren sind, so muss man wieder einmal schmerzlich zur Kenntnis nehmen, ein aussterbender Berufsstand."

RSWK

Informations- und Dokumentationswissenschaft / Aufsatzsammlung
Information Retrieval / Aufsatzsammlung
Automatische Indexierung / Aufsatzsammlung
Linguistische Datenverarbeitung / Aufsatzsammlung

Subject

Informations- und Dokumentationswissenschaft / Aufsatzsammlung
Information Retrieval / Aufsatzsammlung
Automatische Indexierung / Aufsatzsammlung
Linguistische Datenverarbeitung / Aufsatzsammlung

Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.04

0.03847785 = product of:
  0.18276979 = sum of:
    0.056196496 = weight(_text_:web in 4184) [ClassicSimilarity], result of:
      0.056196496 = score(doc=4184,freq=14.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.6677857 = fieldWeight in 4184, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4184)
    0.056196496 = weight(_text_:web in 4184) [ClassicSimilarity], result of:
      0.056196496 = score(doc=4184,freq=14.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.6677857 = fieldWeight in 4184, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4184)
    0.058148954 = weight(_text_:semantische in 4184) [ClassicSimilarity], result of:
      0.058148954 = score(doc=4184,freq=2.0), product of:
        0.13923967 = queryWeight, product of:
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.025786186 = queryNorm
        0.41761774 = fieldWeight in 4184, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4184)
    0.012227853 = product of:
      0.024455706 = sum of:
        0.024455706 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
          0.024455706 = score(doc=4184,freq=2.0), product of:
            0.09029883 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025786186 = queryNorm
            0.2708308 = fieldWeight in 4184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4184)
      0.5 = coord(1/2)
  0.21052632 = coord(4/19)

Abstract: Das Medium Internet ist im Wandel, und mit ihm ändern sich seine Publikations- und Rezeptionsbedingungen. Welche Chancen bieten die momentan parallel diskutierten Zukunftsentwürfe von Social Web und Semantic Web? Zur Beantwortung dieser Frage beschäftigt sich der Beitrag mit den Grundlagen beider Modelle unter den Aspekten Anwendungsbezug und Technologie, beleuchtet darüber hinaus jedoch auch deren Unzulänglichkeiten sowie den Mehrwert einer mediengerechten Kombination. Am Beispiel des grammatischen Online-Informationssystems grammis wird eine Strategie zur integrativen Nutzung der jeweiligen Stärken skizziert.
Date: 22. 1.2011 10:38:28
Source: Kommunikation, Partizipation und Wirkungen im Social Web, Band 1. Hrsg.: A. Zerfaß u.a
Theme: Semantic Web
Semantische Interoperabilität

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.02

0.019381484 = product of:
  0.09206205 = sum of:
    0.02627803 = weight(_text_:web in 2541) [ClassicSimilarity], result of:
      0.02627803 = score(doc=2541,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.3122631 = fieldWeight in 2541, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.02627803 = weight(_text_:web in 2541) [ClassicSimilarity], result of:
      0.02627803 = score(doc=2541,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.3122631 = fieldWeight in 2541, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.027153987 = weight(_text_:services in 2541) [ClassicSimilarity], result of:
      0.027153987 = score(doc=2541,freq=4.0), product of:
        0.094670646 = queryWeight, product of:
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.025786186 = queryNorm
        0.28682584 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.012351996 = product of:
      0.024703993 = sum of:
        0.024703993 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.024703993 = score(doc=2541,freq=4.0), product of:
            0.09029883 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025786186 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.21052632 = coord(4/19)

Abstract: The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
Date: 14. 8.2004 17:22:56
Source: Online. 28(2004) no.3, S.22-29

Heyer, G.; Quasthoff, U.; Wittig, T.: Text Mining : Wissensrohstoff Text. Konzepte, Algorithmen, Ergebnisse (2006) 0.02
```
0.018271746 = product of:
  0.11572106 = sum of:
    0.017164737 = weight(_text_:web in 5218) [ClassicSimilarity], result of:
      0.017164737 = score(doc=5218,freq=4.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.2039694 = fieldWeight in 5218, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=5218)
    0.017164737 = weight(_text_:web in 5218) [ClassicSimilarity], result of:
      0.017164737 = score(doc=5218,freq=4.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.2039694 = fieldWeight in 5218, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=5218)
    0.08139159 = weight(_text_:semantische in 5218) [ClassicSimilarity], result of:
      0.08139159 = score(doc=5218,freq=12.0), product of:
        0.13923967 = queryWeight, product of:
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.025786186 = queryNorm
        0.5845431 = fieldWeight in 5218, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.03125 = fieldNorm(doc=5218)
  0.15789473 = coord(3/19)
```
Abstract

Ein großer Teil des Weltwissens befindet sich in Form digitaler Texte im Internet oder in Intranets. Heutige Suchmaschinen nutzen diesen Wissensrohstoff nur rudimentär: Sie können semantische Zusammen-hänge nur bedingt erkennen. Alle warten auf das semantische Web, in dem die Ersteller von Text selbst die Semantik einfügen. Das wird aber noch lange dauern. Es gibt jedoch eine Technologie, die es bereits heute ermöglicht semantische Zusammenhänge in Rohtexten zu analysieren und aufzubereiten. Das Forschungsgebiet "Text Mining" ermöglicht es mit Hilfe statistischer und musterbasierter Verfahren, Wissen aus Texten zu extrahieren, zu verarbeiten und zu nutzen. Hier wird die Basis für die Suchmaschinen der Zukunft gelegt. Das erste deutsche Lehrbuch zu einer bahnbrechenden Technologie: Text Mining: Wissensrohstoff Text Konzepte, Algorithmen, Ergebnisse Ein großer Teil des Weltwissens befindet sich in Form digitaler Texte im Internet oder in Intranets. Heutige Suchmaschinen nutzen diesen Wissensrohstoff nur rudimentär: Sie können semantische Zusammen-hänge nur bedingt erkennen. Alle warten auf das semantische Web, in dem die Ersteller von Text selbst die Semantik einfügen. Das wird aber noch lange dauern. Es gibt jedoch eine Technologie, die es bereits heute ermöglicht semantische Zusammenhänge in Rohtexten zu analysieren und aufzubereiten. Das For-schungsgebiet "Text Mining" ermöglicht es mit Hilfe statistischer und musterbasierter Verfahren, Wissen aus Texten zu extrahieren, zu verarbeiten und zu nutzen. Hier wird die Basis für die Suchmaschinen der Zukunft gelegt. Was fällt Ihnen bei dem Wort "Stich" ein? Die einen denken an Tennis, die anderen an Skat. Die verschiedenen Zusammenhänge können durch Text Mining automatisch ermittelt und in Form von Wortnetzen dargestellt werden. Welche Begriffe stehen am häufigsten links und rechts vom Wort "Festplatte"? Welche Wortformen und Eigennamen treten seit 2001 neu in der deutschen Sprache auf? Text Mining beantwortet diese und viele weitere Fragen. Tauchen Sie mit diesem Lehrbuch ein in eine neue, faszinierende Wissenschaftsdisziplin und entdecken Sie neue, bisher unbekannte Zusammenhänge und Sichtweisen. Sehen Sie, wie aus dem Wissensrohstoff Text Wissen wird! Dieses Lehrbuch richtet sich sowohl an Studierende als auch an Praktiker mit einem fachlichen Schwerpunkt in der Informatik, Wirtschaftsinformatik und/oder Linguistik, die sich über die Grundlagen, Verfahren und Anwendungen des Text Mining informieren möchten und Anregungen für die Implementierung eigener Anwendungen suchen. Es basiert auf Arbeiten, die während der letzten Jahre an der Abteilung Automatische Sprachverarbeitung am Institut für Informatik der Universität Leipzig unter Leitung von Prof. Dr. Heyer entstanden sind. Eine Fülle praktischer Beispiele von Text Mining-Konzepten und -Algorithmen verhelfen dem Leser zu einem umfassenden, aber auch detaillierten Verständnis der Grundlagen und Anwendungen des Text Mining. Folgende Themen werden behandelt: Wissen und Text Grundlagen der Bedeutungsanalyse Textdatenbanken Sprachstatistik Clustering Musteranalyse Hybride Verfahren Beispielanwendungen Anhänge: Statistik und linguistische Grundlagen 360 Seiten, 54 Abb., 58 Tabellen und 95 Glossarbegriffe Mit kostenlosen e-learning-Kurs "Schnelleinstieg: Sprachstatistik" Zusätzlich zum Buch gibt es in Kürze einen Online-Zertifikats-Kurs mit Mentor- und Tutorunterstützung.
Rötzer, F.: Computer ergooglen die Bedeutung von Worten (2005) 0.02
```
0.016376687 = product of:
  0.07778926 = sum of:
    0.015766818 = weight(_text_:web in 3385) [ClassicSimilarity], result of:
      0.015766818 = score(doc=3385,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.18735787 = fieldWeight in 3385, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3385)
    0.015766818 = weight(_text_:web in 3385) [ClassicSimilarity], result of:
      0.015766818 = score(doc=3385,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.18735787 = fieldWeight in 3385, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3385)
    0.024920981 = weight(_text_:semantische in 3385) [ClassicSimilarity], result of:
      0.024920981 = score(doc=3385,freq=2.0), product of:
        0.13923967 = queryWeight, product of:
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.025786186 = queryNorm
        0.17897904 = fieldWeight in 3385, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3385)
    0.021334646 = weight(_text_:suche in 3385) [ClassicSimilarity], result of:
      0.021334646 = score(doc=3385,freq=2.0), product of:
        0.12883182 = queryWeight, product of:
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.025786186 = queryNorm
        0.16560075 = fieldWeight in 3385, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3385)
  0.21052632 = coord(4/19)
```
Content

"Wie könnten Computer Sprache lernen und dabei auch die Bedeutung von Worten sowie die Beziehungen zwischen ihnen verstehen? Dieses Problem der Semantik stellt eine gewaltige, bislang nur ansatzweise bewältigte Aufgabe dar, da Worte und Wortverbindungen oft mehrere oder auch viele Bedeutungen haben, die zudem vom außersprachlichen Kontext abhängen. Die beiden holländischen (Ein künstliches Bewusstsein aus einfachen Aussagen (1)). Paul Vitanyi (2) und Rudi Cilibrasi vom Nationalen Institut für Mathematik und Informatik (3) in Amsterdam schlagen eine elegante Lösung vor: zum Nachschlagen im Internet, der größten Datenbank, die es gibt, wird einfach Google benutzt. Objekte wie eine Maus können mit ihren Namen "Maus" benannt werden, die Bedeutung allgemeiner Begriffe muss aus ihrem Kontext gelernt werden. Ein semantisches Web zur Repräsentation von Wissen besteht aus den möglichen Verbindungen, die Objekte und ihre Namen eingehen können. Natürlich können in der Wirklichkeit neue Namen, aber auch neue Bedeutungen und damit neue Verknüpfungen geschaffen werden. Sprache ist lebendig und flexibel. Um einer Künstlichen Intelligenz alle Wortbedeutungen beizubringen, müsste mit der Hilfe von menschlichen Experten oder auch vielen Mitarbeitern eine riesige Datenbank mit den möglichen semantischen Netzen aufgebaut und dazu noch ständig aktualisiert werden. Das aber müsste gar nicht notwendig sein, denn mit dem Web gibt es nicht nur die größte und weitgehend kostenlos benutzbare semantische Datenbank, sie wird auch ständig von zahllosen Internetnutzern aktualisiert. Zudem gibt es Suchmaschinen wie Google, die Verbindungen zwischen Worten und damit deren Bedeutungskontext in der Praxis in ihrer Wahrscheinlichkeit quantitativ mit der Angabe der Webseiten, auf denen sie gefunden wurden, messen.
Mit einem bereits zuvor von Paul Vitanyi und anderen entwickeltem Verfahren, das den Zusammenhang von Objekten misst (normalized information distance - NID ), kann die Nähe zwischen bestimmten Objekten (Bilder, Worte, Muster, Intervalle, Genome, Programme etc.) anhand aller Eigenschaften analysiert und aufgrund der dominanten gemeinsamen Eigenschaft bestimmt werden. Ähnlich können auch die allgemein verwendeten, nicht unbedingt "wahren" Bedeutungen von Namen mit der Google-Suche erschlossen werden. 'At this moment one database stands out as the pinnacle of computer-accessible human knowledge and the most inclusive summary of statistical information: the Google search engine. There can be no doubt that Google has already enabled science to accelerate tremendously and revolutionized the research process. It has dominated the attention of internet users for years, and has recently attracted substantial attention of many Wall Street investors, even reshaping their ideas of company financing.' (Paul Vitanyi und Rudi Cilibrasi) Gibt man ein Wort ein wie beispielsweise "Pferd", erhält man bei Google 4.310.000 indexierte Seiten. Für "Reiter" sind es 3.400.000 Seiten. Kombiniert man beide Begriffe, werden noch 315.000 Seiten erfasst. Für das gemeinsame Auftreten beispielsweise von "Pferd" und "Bart" werden zwar noch immer erstaunliche 67.100 Seiten aufgeführt, aber man sieht schon, dass "Pferd" und "Reiter" enger zusammen hängen. Daraus ergibt sich eine bestimmte Wahrscheinlichkeit für das gemeinsame Auftreten von Begriffen. Aus dieser Häufigkeit, die sich im Vergleich mit der maximalen Menge (5.000.000.000) an indexierten Seiten ergibt, haben die beiden Wissenschaftler eine statistische Größe entwickelt, die sie "normalised Google distance" (NGD) nennen und die normalerweise zwischen 0 und 1 liegt. Je geringer NGD ist, desto enger hängen zwei Begriffe zusammen. "Das ist eine automatische Bedeutungsgenerierung", sagt Vitanyi gegenüber dern New Scientist (4). "Das könnte gut eine Möglichkeit darstellen, einen Computer Dinge verstehen und halbintelligent handeln zu lassen." Werden solche Suchen immer wieder durchgeführt, lässt sich eine Karte für die Verbindungen von Worten erstellen. Und aus dieser Karte wiederum kann ein Computer, so die Hoffnung, auch die Bedeutung der einzelnen Worte in unterschiedlichen natürlichen Sprachen und Kontexten erfassen. So habe man über einige Suchen realisiert, dass ein Computer zwischen Farben und Zahlen unterscheiden, holländische Maler aus dem 17. Jahrhundert und Notfälle sowie Fast-Notfälle auseinander halten oder elektrische oder religiöse Begriffe verstehen könne. Überdies habe eine einfache automatische Übersetzung Englisch-Spanisch bewerkstelligt werden können. Auf diese Weise ließe sich auch, so hoffen die Wissenschaftler, die Bedeutung von Worten erlernen, könne man Spracherkennung verbessern oder ein semantisches Web erstellen und natürlich endlich eine bessere automatische Übersetzung von einer Sprache in die andere realisieren.

Sünkler, S.; Kerkmann, F.; Schultheiß, S.: Ok Google . the end of search as we know it : sprachgesteuerte Websuche im Test (2018) 0.01

0.014567588 = product of:
  0.0922614 = sum of:
    0.02124028 = weight(_text_:web in 5626) [ClassicSimilarity], result of:
      0.02124028 = score(doc=5626,freq=2.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.25239927 = fieldWeight in 5626, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5626)
    0.02124028 = weight(_text_:web in 5626) [ClassicSimilarity], result of:
      0.02124028 = score(doc=5626,freq=2.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.25239927 = fieldWeight in 5626, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5626)
    0.04978084 = weight(_text_:suche in 5626) [ClassicSimilarity], result of:
      0.04978084 = score(doc=5626,freq=2.0), product of:
        0.12883182 = queryWeight, product of:
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.025786186 = queryNorm
        0.38640174 = fieldWeight in 5626, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.996156 = idf(docFreq=812, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5626)
  0.15789473 = coord(3/19)

Abstract: Sprachsteuerungssysteme, die den Nutzer auf Zuruf unterstützen, werden im Zuge der Verbreitung von Smartphones und Lautsprechersystemen wie Amazon Echo oder Google Home zunehmend populär. Eine der zentralen Anwendungen dabei stellt die Suche in Websuchmaschinen dar. Wie aber funktioniert "googlen", wenn der Nutzer seine Suchanfrage nicht schreibt, sondern spricht? Dieser Frage ist ein Projektteam der HAW Hamburg nachgegangen und hat im Auftrag der Deutschen Telekom untersucht, wie effektiv, effizient und zufriedenstellend Google Now, Apple Siri, Microsoft Cortana sowie das Amazon Fire OS arbeiten. Ermittelt wurden Stärken und Schwächen der Systeme sowie Erfolgskriterien für eine hohe Gebrauchstauglichkeit. Diese Erkenntnisse mündeten in dem Prototyp einer optimalen Voice Web Search.

Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.01

0.0110644335 = product of:
  0.10511212 = sum of:
    0.05255606 = weight(_text_:web in 2027) [ClassicSimilarity], result of:
      0.05255606 = score(doc=2027,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.6245262 = fieldWeight in 2027, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
    0.05255606 = weight(_text_:web in 2027) [ClassicSimilarity], result of:
      0.05255606 = score(doc=2027,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.6245262 = fieldWeight in 2027, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
  0.10526316 = coord(2/19)

Series: Information Systems and Applications, incl. Internet/Web, and HCI; Bd. 9088
Source: The Semantic Web: latest advances and new domains. 12th European Semantic Web Conference, ESWC 2015 Portoroz, Slovenia, May 31 -- June 4, 2015. Proceedings. Eds.: F. Gandon u.a

Cimiano, P.; Völker, J.; Studer, R.: Ontologies on demand? : a description of the state-of-the-art, applications, challenges and trends for ontology learning from text (2006) 0.01

0.010894214 = product of:
  0.06899669 = sum of:
    0.018205952 = weight(_text_:web in 6014) [ClassicSimilarity], result of:
      0.018205952 = score(doc=6014,freq=2.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.21634221 = fieldWeight in 6014, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=6014)
    0.018205952 = weight(_text_:web in 6014) [ClassicSimilarity], result of:
      0.018205952 = score(doc=6014,freq=2.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.21634221 = fieldWeight in 6014, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=6014)
    0.032584786 = weight(_text_:services in 6014) [ClassicSimilarity], result of:
      0.032584786 = score(doc=6014,freq=4.0), product of:
        0.094670646 = queryWeight, product of:
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.025786186 = queryNorm
        0.344191 = fieldWeight in 6014, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.6713707 = idf(docFreq=3057, maxDocs=44218)
          0.046875 = fieldNorm(doc=6014)
  0.15789473 = coord(3/19)

Abstract: Ontologies are nowadays used for many applications requiring data, services and resources in general to be interoperable and machine understandable. Such applications are for example web service discovery and composition, information integration across databases, intelligent search, etc. The general idea is that data and services are semantically described with respect to ontologies, which are formal specifications of a domain of interest, and can thus be shared and reused in a way such that the shared meaning specified by the ontology remains formally the same across different parties and applications. As the cost of creating ontologies is relatively high, different proposals have emerged for learning ontologies from structured and unstructured resources. In this article we examine the maturity of techniques for ontology learning from textual resources, addressing the question whether the state-of-the-art is mature enough to produce ontologies 'on demand'.

Ontologie und Axiomatik der Wissensbasis von LILOG (1992) 0.01

0.010275825 = product of:
  0.19524068 = sum of:
    0.19524068 = weight(_text_:ontologie in 3957) [ClassicSimilarity], result of:
      0.19524068 = score(doc=3957,freq=2.0), product of:
        0.18041065 = queryWeight, product of:
          6.996407 = idf(docFreq=109, maxDocs=44218)
          0.025786186 = queryNorm
        1.0822015 = fieldWeight in 3957, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.996407 = idf(docFreq=109, maxDocs=44218)
          0.109375 = fieldNorm(doc=3957)
  0.05263158 = coord(1/19)

Bian, G.-W.; Chen, H.-H.: Cross-language information access to multilingual collections on the Internet (2000) 0.01

0.009785562 = product of:
  0.061975226 = sum of:
    0.025747105 = weight(_text_:web in 4436) [ClassicSimilarity], result of:
      0.025747105 = score(doc=4436,freq=4.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.3059541 = fieldWeight in 4436, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=4436)
    0.025747105 = weight(_text_:web in 4436) [ClassicSimilarity], result of:
      0.025747105 = score(doc=4436,freq=4.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.3059541 = fieldWeight in 4436, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=4436)
    0.010481017 = product of:
      0.020962033 = sum of:
        0.020962033 = weight(_text_:22 in 4436) [ClassicSimilarity], result of:
          0.020962033 = score(doc=4436,freq=2.0), product of:
            0.09029883 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025786186 = queryNorm
            0.23214069 = fieldWeight in 4436, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=4436)
      0.5 = coord(1/2)
  0.15789473 = coord(3/19)

Abstract: Language barrier is the major problem that people face in searching for, retrieving, and understanding multilingual collections on the Internet. This paper deals with query translation and document translation in a Chinese-English information retrieval system called MTIR. Bilingual dictionary and monolingual corpus-based approaches are adopted to select suitable tranlated query terms. A machine transliteration algorithm is introduced to resolve proper name searching. We consider several design issues for document translation, including which material is translated, what roles the HTML tags play in translation, what the tradeoff is between the speed performance and the translation performance, and what from the translated result is presented in. About 100.000 Web pages translated in the last 4 months of 1997 are used for quantitative study of online and real-time Web page translation
Date: 16. 2.2000 14:22:39

Granitzer, M.: Statistische Verfahren der Textanalyse (2006) 0.01

0.0077451034 = product of:
  0.073578484 = sum of:
    0.036789242 = weight(_text_:web in 5809) [ClassicSimilarity], result of:
      0.036789242 = score(doc=5809,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.43716836 = fieldWeight in 5809, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5809)
    0.036789242 = weight(_text_:web in 5809) [ClassicSimilarity], result of:
      0.036789242 = score(doc=5809,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.43716836 = fieldWeight in 5809, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5809)
  0.10526316 = coord(2/19)

Abstract: Der vorliegende Artikel bietet einen Überblick über statistische Verfahren der Textanalyse im Kontext des Semantic Webs. Als Einleitung erfolgt die Diskussion von Methoden und gängigen Techniken zur Vorverarbeitung von Texten wie z. B. Stemming oder Part-of-Speech Tagging. Die so eingeführten Repräsentationsformen dienen als Basis für statistische Merkmalsanalysen sowie für weiterführende Techniken wie Information Extraction und maschinelle Lernverfahren. Die Darstellung dieser speziellen Techniken erfolgt im Überblick, wobei auf die wichtigsten Aspekte in Bezug auf das Semantic Web detailliert eingegangen wird. Die Anwendung der vorgestellten Techniken zur Erstellung und Wartung von Ontologien sowie der Verweis auf weiterführende Literatur bilden den Abschluss dieses Artikels.
Source: Semantic Web: Wege zur vernetzten Wissensgesellschaft. Hrsg.: T. Pellegrini, u. A. Blumauer
Theme: Semantic Web

Yang, C.C.; Luk, J.: Automatic generation of English/Chinese thesaurus based on a parallel corpus in laws (2003) 0.01
```
0.0076728133 = product of:
  0.048594486 = sum of:
    0.02124028 = weight(_text_:web in 1616) [ClassicSimilarity], result of:
      0.02124028 = score(doc=1616,freq=8.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.25239927 = fieldWeight in 1616, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
    0.02124028 = weight(_text_:web in 1616) [ClassicSimilarity], result of:
      0.02124028 = score(doc=1616,freq=8.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.25239927 = fieldWeight in 1616, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.02734375 = fieldNorm(doc=1616)
    0.0061139264 = product of:
      0.012227853 = sum of:
        0.012227853 = weight(_text_:22 in 1616) [ClassicSimilarity], result of:
          0.012227853 = score(doc=1616,freq=2.0), product of:
            0.09029883 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.025786186 = queryNorm
            0.1354154 = fieldWeight in 1616, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1616)
      0.5 = coord(1/2)
  0.15789473 = coord(3/19)
```
Abstract

The information available in languages other than English in the World Wide Web is increasing significantly. According to a report from Computer Economics in 1999, 54% of Internet users are English speakers ("English Will Dominate Web for Only Three More Years," Computer Economics, July 9, 1999, http://www.computereconomics. com/new4/pr/pr990610.html). However, it is predicted that there will be only 60% increase in Internet users among English speakers verses a 150% growth among nonEnglish speakers for the next five years. By 2005, 57% of Internet users will be non-English speakers. A report by CNN.com in 2000 showed that the number of Internet users in China had been increased from 8.9 million to 16.9 million from January to June in 2000 ("Report: China Internet users double to 17 million," CNN.com, July, 2000, http://cnn.org/2000/TECH/computing/07/27/ china.internet.reut/index.html). According to Nielsen/ NetRatings, there was a dramatic leap from 22.5 millions to 56.6 millions Internet users from 2001 to 2002. China had become the second largest global at-home Internet population in 2002 (US's Internet population was 166 millions) (Robyn Greenspan, "China Pulls Ahead of Japan," Internet.com, April 22, 2002, http://cyberatias.internet.com/big-picture/geographics/article/0,,5911_1013841,00. html). All of the evidences reveal the importance of crosslingual research to satisfy the needs in the near future. Digital library research has been focusing in structural and semantic interoperability in the past. Searching and retrieving objects across variations in protocols, formats and disciplines are widely explored (Schatz, B., & Chen, H. (1999). Digital libraries: technological advances and social impacts. IEEE Computer, Special Issue an Digital Libraries, February, 32(2), 45-50.; Chen, H., Yen, J., & Yang, C.C. (1999). International activities: development of Asian digital libraries. IEEE Computer, Special Issue an Digital Libraries, 32(2), 48-49.). However, research in crossing language boundaries, especially across European languages and Oriental languages, is still in the initial stage. In this proposal, we put our focus an cross-lingual semantic interoperability by developing automatic generation of a cross-lingual thesaurus based an English/Chinese parallel corpus. When the searchers encounter retrieval problems, Professional librarians usually consult the thesaurus to identify other relevant vocabularies. In the problem of searching across language boundaries, a cross-lingual thesaurus, which is generated by co-occurrence analysis and Hopfield network, can be used to generate additional semantically relevant terms that cannot be obtained from dictionary. In particular, the automatically generated cross-lingual thesaurus is able to capture the unknown words that do not exist in a dictionary, such as names of persons, organizations, and events. Due to Hong Kong's unique history background, both English and Chinese are used as official languages in all legal documents. Therefore, English/Chinese cross-lingual information retrieval is critical for applications in courts and the government. In this paper, we develop an automatic thesaurus by the Hopfield network based an a parallel corpus collected from the Web site of the Department of Justice of the Hong Kong Special Administrative Region (HKSAR) Government. Experiments are conducted to measure the precision and recall of the automatic generated English/Chinese thesaurus. The result Shows that such thesaurus is a promising tool to retrieve relevant terms, especially in the language that is not the same as the input term. The direct translation of the input term can also be retrieved in most of the cases.

Footnote

Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"

Grigonyte, G.: Building and evaluating domain ontologies : NLP contributions (2010) 0.01

0.0072661056 = product of:
  0.13805601 = sum of:
    0.13805601 = weight(_text_:ontologie in 481) [ClassicSimilarity], result of:
      0.13805601 = score(doc=481,freq=4.0), product of:
        0.18041065 = queryWeight, product of:
          6.996407 = idf(docFreq=109, maxDocs=44218)
          0.025786186 = queryNorm
        0.765232 = fieldWeight in 481, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.996407 = idf(docFreq=109, maxDocs=44218)
          0.0546875 = fieldNorm(doc=481)
  0.05263158 = coord(1/19)

RSWK: Wissenserwerb / Natürliche Sprache / Ontologie <Wissensverarbeitung>
Subject: Wissenserwerb / Natürliche Sprache / Ontologie <Wissensverarbeitung>

Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.01
```
0.0071420614 = product of:
  0.067849584 = sum of:
    0.033924792 = weight(_text_:web in 604) [ClassicSimilarity], result of:
      0.033924792 = score(doc=604,freq=10.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.40312994 = fieldWeight in 604, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
    0.033924792 = weight(_text_:web in 604) [ClassicSimilarity], result of:
      0.033924792 = score(doc=604,freq=10.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.40312994 = fieldWeight in 604, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
  0.10526316 = coord(2/19)
```
Abstract

Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Leinfellner, E.: Semantische Netze und Textzusammenhang (1992) 0.01

0.0069953636 = product of:
  0.1329119 = sum of:
    0.1329119 = weight(_text_:semantische in 5488) [ClassicSimilarity], result of:
      0.1329119 = score(doc=5488,freq=2.0), product of:
        0.13923967 = queryWeight, product of:
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.025786186 = queryNorm
        0.95455486 = fieldWeight in 5488, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.399778 = idf(docFreq=542, maxDocs=44218)
          0.125 = fieldNorm(doc=5488)
  0.05263158 = coord(1/19)

Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.01

0.00663866 = product of:
  0.06306727 = sum of:
    0.031533636 = weight(_text_:web in 3455) [ClassicSimilarity], result of:
      0.031533636 = score(doc=3455,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.37471575 = fieldWeight in 3455, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3455)
    0.031533636 = weight(_text_:web in 3455) [ClassicSimilarity], result of:
      0.031533636 = score(doc=3455,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.37471575 = fieldWeight in 3455, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3455)
  0.10526316 = coord(2/19)

Abstract: Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.

Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.01

0.00663866 = product of:
  0.06306727 = sum of:
    0.031533636 = weight(_text_:web in 5896) [ClassicSimilarity], result of:
      0.031533636 = score(doc=5896,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.37471575 = fieldWeight in 5896, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5896)
    0.031533636 = weight(_text_:web in 5896) [ClassicSimilarity], result of:
      0.031533636 = score(doc=5896,freq=6.0), product of:
        0.08415349 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.025786186 = queryNorm
        0.37471575 = fieldWeight in 5896, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5896)
  0.10526316 = coord(2/19)

Abstract: Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.

Search (134 results, page 1 of 7)

Authors

Years

Languages

Types

Themes

Subjects

Classifications