Search (1 results, page 1 of 1)

  • × author_ss:"Batet, M."
  • × theme_ss:"Wissensrepräsentation"
  • × year_i:[2010 TO 2020}
  1. Sánchez, D.; Batet, M.; Valls, A.; Gibert, K.: Ontology-driven web-based semantic similarity (2010) 0.00
    3.2888478E-4 = product of:
      0.0049332716 = sum of:
        0.0049332716 = product of:
          0.009866543 = sum of:
            0.009866543 = weight(_text_:information in 335) [ClassicSimilarity], result of:
              0.009866543 = score(doc=335,freq=8.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.19395474 = fieldWeight in 335, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=335)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Estimation of the degree of semantic similarity/distance between concepts is a very common problem in research areas such as natural language processing, knowledge acquisition, information retrieval or data mining. In the past, many similarity measures have been proposed, exploiting explicit knowledge-such as the structure of a taxonomy-or implicit knowledge-such as information distribution. In the former case, taxonomies and/or ontologies are used to introduce additional semantics; in the latter case, frequencies of term appearances in a corpus are considered. Classical measures based on those premises suffer from some problems: in the ?rst case, their excessive dependency of the taxonomical/ontological structure; in the second case, the lack of semantics of a pure statistical analysis of occurrences and/or the ambiguity of estimating concept statistical distribution from term appearances. Measures based on Information Content (IC) of taxonomical concepts combine both approaches. However, they heavily depend on a properly pre-tagged and disambiguated corpus according to the ontological entities in order to computer accurate concept appearance probabilities. This limits the applicability of those measures to other ontologies - like specific domain ontologies - and massive corpus - like the Web. In this paper, several of the presente issues are analyzed. Modifications of classical similarity measures are also proposed. They are based on a contextualized and scalable version of IC computation in the Web by exploiting taxonomical knowledge. The goal is to avoid the measures' dependency on the corpus pre-processing to achieve reliable results and minimize language ambiguity. Our proposals are able to outperform classical approaches when using the Web for estimating concept probabilities.
    Source
    Journal of intelligent information systems. 35(2010) no.x, S.383-413