Search (1 results, page 1 of 1)

  • × author_ss:"Frank, E."
  • × theme_ss:"Semantisches Umfeld in Indexierung u. Retrieval"
  • × year_i:[2010 TO 2020}
  1. Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012) 0.00
    0.0048545036 = product of:
      0.033981524 = sum of:
        0.00856136 = weight(_text_:information in 372) [ClassicSimilarity], result of:
          0.00856136 = score(doc=372,freq=4.0), product of:
            0.052020688 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.029633347 = queryNorm
            0.16457605 = fieldWeight in 372, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=372)
        0.025420163 = weight(_text_:retrieval in 372) [ClassicSimilarity], result of:
          0.025420163 = score(doc=372,freq=4.0), product of:
            0.08963835 = queryWeight, product of:
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.029633347 = queryNorm
            0.2835858 = fieldWeight in 372, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.024915 = idf(docFreq=5836, maxDocs=44218)
              0.046875 = fieldNorm(doc=372)
      0.14285715 = coord(2/14)
    
    Abstract
    Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance.
    Source
    Journal of the American Society for Information Science and Technology. 63(2012) no.8, S.1593-1608
    Theme
    Semantisches Umfeld in Indexierung u. Retrieval