Document (#37100)

Djioua, B.
Desclés, J.-P.
Alrahabi, M.
Searching and mining with semantic categories
Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
Hershey, PA : IGI Publishing
A new model is proposed to retrieve information by building automatically a semantic metatext structure for texts that allow searching and extracting discourse and semantic information according to certain linguistic categorizations. This paper presents approaches for searching and mining full text with semantic categories. The model is built up from two engines: The first one, called EXCOM (Djioua et al., 2006; Alrahabi, 2010), is an automatic system for text annotation, related to discourse and semantic maps, which are specification of general linguistic ontologies founded on the Applicative and Cognitive Grammar. The annotation layer uses a linguistic method called Contextual Exploration, which handles the polysemic values of a term in texts. Several 'semantic maps' underlying 'point of views' for text mining guide this automatic annotation process. The second engine uses semantic annotated texts, produced previously in order to create a semantic inverted index, which is able to retrieve relevant documents for queries associated with discourse and semantic categories such as definition, quotation, causality, relations between concepts, etc. (Djioua & Desclés, 2007). This semantic indexation process builds a metatext layer for textual contents. Some data and linguistic rules sets as well as the general architecture that extend third-party software are expressed as supplementary information.
Semantic Web

Similar documents (content)

  1. Rindflesch, T.C.; Fizsman, M.: The interaction of domain knowledge and linguistic structure in natural language processing : interpreting hypernymic propositions in biomedical text (2003) 0.23
    0.22562756 = sum of:
      0.22562756 = product of:
        0.7050861 = sum of:
          0.01829059 = weight(abstract_txt:process in 2097) [ClassicSimilarity], result of:
            0.01829059 = score(doc=2097,freq=1.0), product of:
              0.07224108 = queryWeight, product of:
                1.0148696 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.017571568 = queryNorm
              0.25318822 = fieldWeight in 2097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
          0.031694397 = weight(abstract_txt:general in 2097) [ClassicSimilarity], result of:
            0.031694397 = score(doc=2097,freq=2.0), product of:
              0.08272004 = queryWeight, product of:
                1.0859841 = boost
                4.3348765 = idf(docFreq=1574, maxDocs=44218)
                0.017571568 = queryNorm
              0.38315257 = fieldWeight in 2097, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.3348765 = idf(docFreq=1574, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
          0.020480493 = weight(abstract_txt:which in 2097) [ClassicSimilarity], result of:
            0.020480493 = score(doc=2097,freq=4.0), product of:
              0.05617414 = queryWeight, product of:
                1.096054 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.017571568 = queryNorm
              0.36458933 = fieldWeight in 2097, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
          0.038587246 = weight(abstract_txt:automatic in 2097) [ClassicSimilarity], result of:
            0.038587246 = score(doc=2097,freq=1.0), product of:
              0.11883059 = queryWeight, product of:
                1.301614 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.017571568 = queryNorm
              0.32472485 = fieldWeight in 2097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
          0.03859526 = weight(abstract_txt:text in 2097) [ClassicSimilarity], result of:
            0.03859526 = score(doc=2097,freq=2.0), product of:
              0.1079797 = queryWeight, product of:
                1.5196192 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017571568 = queryNorm
              0.3574307 = fieldWeight in 2097, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
          0.10182013 = weight(abstract_txt:discourse in 2097) [ClassicSimilarity], result of:
            0.10182013 = score(doc=2097,freq=1.0), product of:
              0.25974783 = queryWeight, product of:
                2.356892 = boost
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.017571568 = queryNorm
              0.3919961 = fieldWeight in 2097, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
          0.15378658 = weight(abstract_txt:linguistic in 2097) [ClassicSimilarity], result of:
            0.15378658 = score(doc=2097,freq=2.0), product of:
              0.29870653 = queryWeight, product of:
                2.9184716 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.017571568 = queryNorm
              0.51484174 = fieldWeight in 2097, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
          0.3018314 = weight(abstract_txt:semantic in 2097) [ClassicSimilarity], result of:
            0.3018314 = score(doc=2097,freq=6.0), product of:
              0.44063765 = queryWeight, product of:
                5.604591 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.017571568 = queryNorm
              0.6849878 = fieldWeight in 2097, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=2097)
        0.32 = coord(8/25)
  2. Sembok, T.M.T.; Rijsbergen, C.J. van: SILOL: a simple logical-linguistic document retrieval system (1990) 0.20
    0.19589575 = sum of:
      0.19589575 = product of:
        0.69962764 = sum of:
          0.027435886 = weight(abstract_txt:process in 6684) [ClassicSimilarity], result of:
            0.027435886 = score(doc=6684,freq=1.0), product of:
              0.07224108 = queryWeight, product of:
                1.0148696 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.017571568 = queryNorm
              0.37978232 = fieldWeight in 6684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.09375 = fieldNorm(doc=6684)
          0.021722844 = weight(abstract_txt:which in 6684) [ClassicSimilarity], result of:
            0.021722844 = score(doc=6684,freq=2.0), product of:
              0.05617414 = queryWeight, product of:
                1.096054 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.017571568 = queryNorm
              0.3867054 = fieldWeight in 6684, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.09375 = fieldNorm(doc=6684)
          0.052558642 = weight(abstract_txt:uses in 6684) [ClassicSimilarity], result of:
            0.052558642 = score(doc=6684,freq=1.0), product of:
              0.111429706 = queryWeight, product of:
                1.2604296 = boost
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.017571568 = queryNorm
              0.4716753 = fieldWeight in 6684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0312033 = idf(docFreq=784, maxDocs=44218)
                0.09375 = fieldNorm(doc=6684)
          0.061497252 = weight(abstract_txt:called in 6684) [ClassicSimilarity], result of:
            0.061497252 = score(doc=6684,freq=1.0), product of:
              0.12373011 = queryWeight, product of:
                1.3281765 = boost
                5.3016257 = idf(docFreq=598, maxDocs=44218)
                0.017571568 = queryNorm
              0.4970274 = fieldWeight in 6684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3016257 = idf(docFreq=598, maxDocs=44218)
                0.09375 = fieldNorm(doc=6684)
          0.111904055 = weight(abstract_txt:texts in 6684) [ClassicSimilarity], result of:
            0.111904055 = score(doc=6684,freq=1.0), product of:
              0.21110533 = queryWeight, product of:
                2.1247768 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.017571568 = queryNorm
              0.53008634 = fieldWeight in 6684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.09375 = fieldNorm(doc=6684)
          0.16311531 = weight(abstract_txt:linguistic in 6684) [ClassicSimilarity], result of:
            0.16311531 = score(doc=6684,freq=1.0), product of:
              0.29870653 = queryWeight, product of:
                2.9184716 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.017571568 = queryNorm
              0.5460721 = fieldWeight in 6684, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.09375 = fieldNorm(doc=6684)
          0.26139364 = weight(abstract_txt:semantic in 6684) [ClassicSimilarity], result of:
            0.26139364 = score(doc=6684,freq=2.0), product of:
              0.44063765 = queryWeight, product of:
                5.604591 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.017571568 = queryNorm
              0.5932168 = fieldWeight in 6684, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.09375 = fieldNorm(doc=6684)
        0.28 = coord(7/25)
  3. Park, J.-r.: Evolution of concept networks and implications for knowledge representation (2007) 0.19
    0.18825722 = sum of:
      0.18825722 = product of:
        0.7844051 = sum of:
          0.01829059 = weight(abstract_txt:process in 847) [ClassicSimilarity], result of:
            0.01829059 = score(doc=847,freq=1.0), product of:
              0.07224108 = queryWeight, product of:
                1.0148696 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.017571568 = queryNorm
              0.25318822 = fieldWeight in 847, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0625 = fieldNorm(doc=847)
          0.010240247 = weight(abstract_txt:which in 847) [ClassicSimilarity], result of:
            0.010240247 = score(doc=847,freq=1.0), product of:
              0.05617414 = queryWeight, product of:
                1.096054 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.017571568 = queryNorm
              0.18229467 = fieldWeight in 847, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0625 = fieldNorm(doc=847)
          0.03859526 = weight(abstract_txt:text in 847) [ClassicSimilarity], result of:
            0.03859526 = score(doc=847,freq=2.0), product of:
              0.1079797 = queryWeight, product of:
                1.5196192 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017571568 = queryNorm
              0.3574307 = fieldWeight in 847, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=847)
          0.22767675 = weight(abstract_txt:discourse in 847) [ClassicSimilarity], result of:
            0.22767675 = score(doc=847,freq=5.0), product of:
              0.25974783 = queryWeight, product of:
                2.356892 = boost
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.017571568 = queryNorm
              0.87652993 = fieldWeight in 847, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.0625 = fieldNorm(doc=847)
          0.24315797 = weight(abstract_txt:linguistic in 847) [ClassicSimilarity], result of:
            0.24315797 = score(doc=847,freq=5.0), product of:
              0.29870653 = queryWeight, product of:
                2.9184716 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.017571568 = queryNorm
              0.8140363 = fieldWeight in 847, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=847)
          0.2464443 = weight(abstract_txt:semantic in 847) [ClassicSimilarity], result of:
            0.2464443 = score(doc=847,freq=4.0), product of:
              0.44063765 = queryWeight, product of:
                5.604591 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.017571568 = queryNorm
              0.5592902 = fieldWeight in 847, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0625 = fieldNorm(doc=847)
        0.24 = coord(6/25)
  4. Ibekwe-SanJuan, F.: Constructing and maintaining knowledge organization tools : a symbolic approach (2006) 0.17
    0.1700426 = sum of:
      0.1700426 = product of:
        0.60729504 = sum of:
          0.012671659 = weight(abstract_txt:which in 5595) [ClassicSimilarity], result of:
            0.012671659 = score(doc=5595,freq=2.0), product of:
              0.05617414 = queryWeight, product of:
                1.096054 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.017571568 = queryNorm
              0.22557814 = fieldWeight in 5595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.04774928 = weight(abstract_txt:automatic in 5595) [ClassicSimilarity], result of:
            0.04774928 = score(doc=5595,freq=2.0), product of:
              0.11883059 = queryWeight, product of:
                1.301614 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.017571568 = queryNorm
              0.4018265 = fieldWeight in 5595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.04916019 = weight(abstract_txt:maps in 5595) [ClassicSimilarity], result of:
            0.04916019 = score(doc=5595,freq=1.0), product of:
              0.15265208 = queryWeight, product of:
                1.475263 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.017571568 = queryNorm
              0.32204074 = fieldWeight in 5595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.03377085 = weight(abstract_txt:text in 5595) [ClassicSimilarity], result of:
            0.03377085 = score(doc=5595,freq=2.0), product of:
              0.1079797 = queryWeight, product of:
                1.5196192 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017571568 = queryNorm
              0.31275186 = fieldWeight in 5595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.06527737 = weight(abstract_txt:texts in 5595) [ClassicSimilarity], result of:
            0.06527737 = score(doc=5595,freq=1.0), product of:
              0.21110533 = queryWeight, product of:
                2.1247768 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.017571568 = queryNorm
              0.30921704 = fieldWeight in 5595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.13456327 = weight(abstract_txt:linguistic in 5595) [ClassicSimilarity], result of:
            0.13456327 = score(doc=5595,freq=2.0), product of:
              0.29870653 = queryWeight, product of:
                2.9184716 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.017571568 = queryNorm
              0.4504865 = fieldWeight in 5595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
          0.26410246 = weight(abstract_txt:semantic in 5595) [ClassicSimilarity], result of:
            0.26410246 = score(doc=5595,freq=6.0), product of:
              0.44063765 = queryWeight, product of:
                5.604591 = boost
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.017571568 = queryNorm
              0.5993643 = fieldWeight in 5595, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.4743214 = idf(docFreq=1369, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5595)
        0.28 = coord(7/25)
  5. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 0.17
    0.1658096 = sum of:
      0.1658096 = product of:
        0.51815504 = sum of:
          0.016004268 = weight(abstract_txt:process in 2121) [ClassicSimilarity], result of:
            0.016004268 = score(doc=2121,freq=1.0), product of:
              0.07224108 = queryWeight, product of:
                1.0148696 = boost
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.017571568 = queryNorm
              0.22153969 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0510116 = idf(docFreq=2091, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
          0.017920433 = weight(abstract_txt:which in 2121) [ClassicSimilarity], result of:
            0.017920433 = score(doc=2121,freq=4.0), product of:
              0.05617414 = queryWeight, product of:
                1.096054 = boost
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.017571568 = queryNorm
              0.31901568 = fieldWeight in 2121, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.9167147 = idf(docFreq=6503, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
          0.03376384 = weight(abstract_txt:automatic in 2121) [ClassicSimilarity], result of:
            0.03376384 = score(doc=2121,freq=1.0), product of:
              0.11883059 = queryWeight, product of:
                1.301614 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.017571568 = queryNorm
              0.28413424 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
          0.08514794 = weight(abstract_txt:maps in 2121) [ClassicSimilarity], result of:
            0.08514794 = score(doc=2121,freq=3.0), product of:
              0.15265208 = queryWeight, product of:
                1.475263 = boost
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.017571568 = queryNorm
              0.5577909 = fieldWeight in 2121, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.888745 = idf(docFreq=332, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
          0.053396408 = weight(abstract_txt:text in 2121) [ClassicSimilarity], result of:
            0.053396408 = score(doc=2121,freq=5.0), product of:
              0.1079797 = queryWeight, product of:
                1.5196192 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.017571568 = queryNorm
              0.49450412 = fieldWeight in 2121, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
          0.092316136 = weight(abstract_txt:texts in 2121) [ClassicSimilarity], result of:
            0.092316136 = score(doc=2121,freq=2.0), product of:
              0.21110533 = queryWeight, product of:
                2.1247768 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.017571568 = queryNorm
              0.43729892 = fieldWeight in 2121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
          0.085042775 = weight(abstract_txt:mining in 2121) [ClassicSimilarity], result of:
            0.085042775 = score(doc=2121,freq=1.0), product of:
              0.25181547 = queryWeight, product of:
                2.3206248 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.017571568 = queryNorm
              0.33771864 = fieldWeight in 2121, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
          0.13456327 = weight(abstract_txt:linguistic in 2121) [ClassicSimilarity], result of:
            0.13456327 = score(doc=2121,freq=2.0), product of:
              0.29870653 = queryWeight, product of:
                2.9184716 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.017571568 = queryNorm
              0.4504865 = fieldWeight in 2121, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2121)
        0.32 = coord(8/25)