Document (#25247)

Author
Diaz, I.
Morato, J.
Lioréns, J.
Title
¬An algorithm for term conflation based on tree structures
Source
Journal of the American Society for Information Science and technology. 53(2002) no.3, S.199-208
Year
2002
Abstract
This work presents a new stemming algorithm. This algorithm stores the stemming information in tree structures. This storage allows us to enhance the performance of the algorithm due to the reduction of the search space and the overall complexity. The final result of that stemming algorithm is a normalized concept, understanding this process as the automatic extraction of the generic form (or a lexeme) for a selected term.
Theme
Computerlinguistik

Similar documents (author)

  1. Diaz, P.: Multilingual tools for accessing a Spanish library catalogue (1997) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:diaz in 1163) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 1163, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=1163)
    
  2. Diaz, D.A.V.: Manejo de informacion en el sistema literario (1997) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:diaz in 2255) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 2255, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=2255)
    
  3. Diaz, I.: Semi-automatic construction of thesaurus applying domain analysis techniques (1998) 6.01
    6.010904 = sum of:
      6.010904 = weight(author_txt:diaz in 3744) [ClassicSimilarity], result of:
        6.010904 = fieldWeight in 3744, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.625 = fieldNorm(doc=3744)
    
  4. Flynn, D.J.; Diaz, O.F.: Information modelling : an international perspective (1995) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:diaz in 6302) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 6302, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=6302)
    
  5. Diaz, I.G.; Aguilar, G.S.: Bibliometria comparada sobre tecnologia de informacion : diez anos en la base de datos ERIC (1995) 4.81
    4.808723 = sum of:
      4.808723 = weight(author_txt:diaz in 6388) [ClassicSimilarity], result of:
        4.808723 = fieldWeight in 6388, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.617446 = idf(docFreq=7, maxDocs=44218)
          0.5 = fieldNorm(doc=6388)
    

Similar documents (content)

  1. Goslin, K.; Hofmann, M.: ¬A Wikipedia powered state-based approach to automatic search query enhancement (2018) 0.21
    0.21075884 = sum of:
      0.21075884 = product of:
        0.7527101 = sum of:
          0.022485174 = weight(abstract_txt:concept in 5083) [ClassicSimilarity], result of:
            0.022485174 = score(doc=5083,freq=1.0), product of:
              0.06388035 = queryWeight, product of:
                4.505458 = idf(docFreq=1327, maxDocs=44218)
                0.014178435 = queryNorm
              0.3519889 = fieldWeight in 5083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.505458 = idf(docFreq=1327, maxDocs=44218)
                0.078125 = fieldNorm(doc=5083)
          0.022946177 = weight(abstract_txt:understanding in 5083) [ClassicSimilarity], result of:
            0.022946177 = score(doc=5083,freq=1.0), product of:
              0.06475053 = queryWeight, product of:
                1.006788 = boost
                4.5360413 = idf(docFreq=1287, maxDocs=44218)
                0.014178435 = queryNorm
              0.35437822 = fieldWeight in 5083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5360413 = idf(docFreq=1287, maxDocs=44218)
                0.078125 = fieldNorm(doc=5083)
          0.034481484 = weight(abstract_txt:automatic in 5083) [ClassicSimilarity], result of:
            0.034481484 = score(doc=5083,freq=1.0), product of:
              0.08494941 = queryWeight, product of:
                1.1531786 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.014178435 = queryNorm
              0.40590608 = fieldWeight in 5083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.078125 = fieldNorm(doc=5083)
          0.03521738 = weight(abstract_txt:selected in 5083) [ClassicSimilarity], result of:
            0.03521738 = score(doc=5083,freq=1.0), product of:
              0.0861538 = queryWeight, product of:
                1.1613245 = boost
                5.232299 = idf(docFreq=641, maxDocs=44218)
                0.014178435 = queryNorm
              0.40877336 = fieldWeight in 5083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.232299 = idf(docFreq=641, maxDocs=44218)
                0.078125 = fieldNorm(doc=5083)
          0.05442019 = weight(abstract_txt:term in 5083) [ClassicSimilarity], result of:
            0.05442019 = score(doc=5083,freq=1.0), product of:
              0.1450841 = queryWeight, product of:
                2.1312838 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.014178435 = queryNorm
              0.37509412 = fieldWeight in 5083, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=5083)
          0.023932133 = weight(abstract_txt:this in 5083) [ClassicSimilarity], result of:
            0.023932133 = score(doc=5083,freq=3.0), product of:
              0.07329432 = queryWeight, product of:
                2.142306 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014178435 = queryNorm
              0.32652098 = fieldWeight in 5083, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=5083)
          0.5592276 = weight(abstract_txt:algorithm in 5083) [ClassicSimilarity], result of:
            0.5592276 = score(doc=5083,freq=6.0), product of:
              0.5121947 = queryWeight, product of:
                6.3316793 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.014178435 = queryNorm
              1.0918262 = fieldWeight in 5083, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=5083)
        0.28 = coord(7/25)
    
  2. Ahmad, F.; Yusoff, M.; Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words (1996) 0.18
    0.17695369 = sum of:
      0.17695369 = product of:
        1.1059606 = sum of:
          0.029290166 = weight(abstract_txt:performance in 6504) [ClassicSimilarity], result of:
            0.029290166 = score(doc=6504,freq=1.0), product of:
              0.067473024 = queryWeight, product of:
                1.0277357 = boost
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.014178435 = queryNorm
              0.43410188 = fieldWeight in 6504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.63042 = idf(docFreq=1171, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.05575031 = weight(abstract_txt:enhance in 6504) [ClassicSimilarity], result of:
            0.05575031 = score(doc=6504,freq=1.0), product of:
              0.1036288 = queryWeight, product of:
                1.2736691 = boost
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.014178435 = queryNorm
              0.53798085 = fieldWeight in 6504, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7384624 = idf(docFreq=386, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.6334759 = weight(abstract_txt:stemming in 6504) [ClassicSimilarity], result of:
            0.6334759 = score(doc=6504,freq=3.0), product of:
              0.523764 = queryWeight, product of:
                4.9595795 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.014178435 = queryNorm
              1.2094681 = fieldWeight in 6504, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
          0.3874442 = weight(abstract_txt:algorithm in 6504) [ClassicSimilarity], result of:
            0.3874442 = score(doc=6504,freq=2.0), product of:
              0.5121947 = queryWeight, product of:
                6.3316793 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.014178435 = queryNorm
              0.7564393 = fieldWeight in 6504, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.09375 = fieldNorm(doc=6504)
        0.16 = coord(4/25)
    
  3. Khoo, C.S.G.; Wan, K.-W.: ¬A simple relevancy-ranking strategy for an interface to Boolean OPACs (2004) 0.16
    0.16058525 = sum of:
      0.16058525 = product of:
        1.0036578 = sum of:
          0.040421627 = weight(abstract_txt:overall in 2509) [ClassicSimilarity], result of:
            0.040421627 = score(doc=2509,freq=1.0), product of:
              0.094445 = queryWeight, product of:
                1.2159224 = boost
                5.478287 = idf(docFreq=501, maxDocs=44218)
                0.014178435 = queryNorm
              0.42799118 = fieldWeight in 2509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.478287 = idf(docFreq=501, maxDocs=44218)
                0.078125 = fieldNorm(doc=2509)
          0.05442019 = weight(abstract_txt:term in 2509) [ClassicSimilarity], result of:
            0.05442019 = score(doc=2509,freq=1.0), product of:
              0.1450841 = queryWeight, product of:
                2.1312838 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.014178435 = queryNorm
              0.37509412 = fieldWeight in 2509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.078125 = fieldNorm(doc=2509)
          0.30478123 = weight(abstract_txt:stemming in 2509) [ClassicSimilarity], result of:
            0.30478123 = score(doc=2509,freq=1.0), product of:
              0.523764 = queryWeight, product of:
                4.9595795 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.014178435 = queryNorm
              0.5819056 = fieldWeight in 2509, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.078125 = fieldNorm(doc=2509)
          0.6040348 = weight(abstract_txt:algorithm in 2509) [ClassicSimilarity], result of:
            0.6040348 = score(doc=2509,freq=7.0), product of:
              0.5121947 = queryWeight, product of:
                6.3316793 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.014178435 = queryNorm
              1.179307 = fieldWeight in 2509, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.078125 = fieldNorm(doc=2509)
        0.16 = coord(4/25)
    
  4. Greengrass, M.: Conflation methods for searching databases of Latin text (1996) 0.15
    0.15326835 = sum of:
      0.15326835 = product of:
        0.9579272 = sum of:
          0.25292116 = weight(abstract_txt:conflation in 6987) [ClassicSimilarity], result of:
            0.25292116 = score(doc=6987,freq=1.0), product of:
              0.28399175 = queryWeight, product of:
                2.1084788 = boost
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.014178435 = queryNorm
              0.89059335 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.499662 = idf(docFreq=8, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
          0.06530423 = weight(abstract_txt:term in 6987) [ClassicSimilarity], result of:
            0.06530423 = score(doc=6987,freq=1.0), product of:
              0.1450841 = queryWeight, product of:
                2.1312838 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.014178435 = queryNorm
              0.45011294 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
          0.36573747 = weight(abstract_txt:stemming in 6987) [ClassicSimilarity], result of:
            0.36573747 = score(doc=6987,freq=1.0), product of:
              0.523764 = queryWeight, product of:
                4.9595795 = boost
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.014178435 = queryNorm
              0.6982868 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.448392 = idf(docFreq=69, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
          0.27396443 = weight(abstract_txt:algorithm in 6987) [ClassicSimilarity], result of:
            0.27396443 = score(doc=6987,freq=1.0), product of:
              0.5121947 = queryWeight, product of:
                6.3316793 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.014178435 = queryNorm
              0.5348834 = fieldWeight in 6987, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.09375 = fieldNorm(doc=6987)
        0.16 = coord(4/25)
    
  5. Tseng, Y.-H.; Lin, C.-J.; Lin, Y.-I.: Text mining techniques for patent analysis (2007) 0.15
    0.15140629 = sum of:
      0.15140629 = product of:
        0.47314465 = sum of:
          0.026128333 = weight(abstract_txt:result in 935) [ClassicSimilarity], result of:
            0.026128333 = score(doc=935,freq=1.0), product of:
              0.0819315 = queryWeight, product of:
                1.1325095 = boost
                5.1024737 = idf(docFreq=730, maxDocs=44218)
                0.014178435 = queryNorm
              0.3189046 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1024737 = idf(docFreq=730, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.03901134 = weight(abstract_txt:automatic in 935) [ClassicSimilarity], result of:
            0.03901134 = score(doc=935,freq=2.0), product of:
              0.08494941 = queryWeight, product of:
                1.1531786 = boost
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.014178435 = queryNorm
              0.45923027 = fieldWeight in 935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.1955976 = idf(docFreq=665, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.06602124 = weight(abstract_txt:extraction in 935) [ClassicSimilarity], result of:
            0.06602124 = score(doc=935,freq=2.0), product of:
              0.120639324 = queryWeight, product of:
                1.3742344 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.014178435 = queryNorm
              0.54726136 = fieldWeight in 935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.048526 = weight(abstract_txt:final in 935) [ClassicSimilarity], result of:
            0.048526 = score(doc=935,freq=1.0), product of:
              0.12379205 = queryWeight, product of:
                1.3920754 = boost
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.014178435 = queryNorm
              0.3919961 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2719374 = idf(docFreq=226, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.051646214 = weight(abstract_txt:generic in 935) [ClassicSimilarity], result of:
            0.051646214 = score(doc=935,freq=1.0), product of:
              0.1290433 = queryWeight, product of:
                1.4212946 = boost
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.014178435 = queryNorm
              0.4002239 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.4035826 = idf(docFreq=198, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.043536153 = weight(abstract_txt:term in 935) [ClassicSimilarity], result of:
            0.043536153 = score(doc=935,freq=1.0), product of:
              0.1450841 = queryWeight, product of:
                2.1312838 = boost
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.014178435 = queryNorm
              0.3000753 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.8012047 = idf(docFreq=987, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.015632406 = weight(abstract_txt:this in 935) [ClassicSimilarity], result of:
            0.015632406 = score(doc=935,freq=2.0), product of:
              0.07329432 = queryWeight, product of:
                2.142306 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014178435 = queryNorm
              0.21328263 = fieldWeight in 935, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
          0.18264295 = weight(abstract_txt:algorithm in 935) [ClassicSimilarity], result of:
            0.18264295 = score(doc=935,freq=1.0), product of:
              0.5121947 = queryWeight, product of:
                6.3316793 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.014178435 = queryNorm
              0.35658893 = fieldWeight in 935, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=935)
        0.32 = coord(8/25)