Search (8 results, page 1 of 1)

Galvez, C.; Moya-Anegón, F. de; Solana, V.H.: Term conflation methods in information retrieval : non-linguistic and linguistic approaches (2005) 0.02

0.015545678 = product of:
  0.0725465 = sum of:
    0.03856498 = weight(_text_:wide in 4394) [ClassicSimilarity], result of:
      0.03856498 = score(doc=4394,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.29372054 = fieldWeight in 4394, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4394)
    0.00856136 = weight(_text_:information in 4394) [ClassicSimilarity], result of:
      0.00856136 = score(doc=4394,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16457605 = fieldWeight in 4394, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4394)
    0.025420163 = weight(_text_:retrieval in 4394) [ClassicSimilarity], result of:
      0.025420163 = score(doc=4394,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.2835858 = fieldWeight in 4394, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4394)
  0.21428572 = coord(3/14)

Abstract: Purpose - To propose a categorization of the different conflation procedures at the two basic approaches, non-linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques. Design/methodology/approach - Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non-linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern-matching, through equivalence relations represented in finite-state transducers (FST), are emerging methods for the recognition and standardization of terms. Findings - The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method. Originality/value - Outlines the importance of FSTs for the normalization of term variants.

Faba-Pérez, C.; Zapico-Alonso, F.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Comparative analysis of webometric measurements in thematic environments (2005) 0.01

0.008400954 = product of:
  0.05880668 = sum of:
    0.048818428 = weight(_text_:web in 3554) [ClassicSimilarity], result of:
      0.048818428 = score(doc=3554,freq=8.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.50479853 = fieldWeight in 3554, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3554)
    0.009988253 = weight(_text_:information in 3554) [ClassicSimilarity], result of:
      0.009988253 = score(doc=3554,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1920054 = fieldWeight in 3554, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3554)
  0.14285715 = coord(2/14)

Abstract: There have been many attempts to evaluate Web spaces an the basis of the information that they provide, their form or functionality, or even the importance given to each of them by the Web itself. The indicators that have been developed for this purpose fall into two groups: those based an the study of a Web space's formal characteristics, and those related to its link structure. In this study we examine most of the webometric indicators that have been proposed in the literature together with others of our own design by applying them to a set of thematically related Web spaces and analyzing the relationships between the different indicators.
Source: Journal of the American Society for Information Science and Technology. 56(2005) no.8, S.779-785

Guerrero Bote, V.P.; López-Pujalte, C.; Faba, C.; Reyes, M.J.; Zapica, F.; Moya-Anegón, F. de: Artificial neural networks applied to information retrieval (2003) 0.00

0.003790876 = product of:
  0.02653613 = sum of:
    0.00856136 = weight(_text_:information in 2780) [ClassicSimilarity], result of:
      0.00856136 = score(doc=2780,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16457605 = fieldWeight in 2780, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2780)
    0.01797477 = weight(_text_:retrieval in 2780) [ClassicSimilarity], result of:
      0.01797477 = score(doc=2780,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.20052543 = fieldWeight in 2780, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2780)
  0.14285715 = coord(2/14)

Abstract: Connectionist models or neural networksare a type of AI technique that is based an small interconnected processing nodes which yield an overall behaviour that is intelligent. They have a very broad utility. In IR, they have been used in filtering information, query expansion, relevance feedback, clustering terms or documents, the topological organization of documents, labeling groups of documents, interface design, reduction of document dimension, the classification of the terms in a brain-storming session, etc. The present work is a fairly exhaustive study and classification of the application of this type of technique to IR. For this purpose, we focus an the main publications in the area of IR and neural networks, as well as an some applications of our own design.

Lopez-Pujalte, C.; Guerrero Bote, V.P.; Moya-Anegón, F. de: Evaluation of the application of genetic algorithms to relevance feedback (2003) 0.00
```
0.0028605436 = product of:
  0.020023804 = sum of:
    0.0050448296 = weight(_text_:information in 2756) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=2756,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 2756, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2756)
    0.014978974 = weight(_text_:retrieval in 2756) [ClassicSimilarity], result of:
      0.014978974 = score(doc=2756,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 2756, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2756)
  0.14285715 = coord(2/14)
```
Abstract

We evaluated the different genetic algorithms applied to relevance feedback that are to be found in the literature and which follow the vector space model (the most commonly used model in this type of application). They were compared with a traditional relevance feedback algorithm - the Ide dec-hi method - since this had given the best results in the study of Salton & Buckley (1990) an this subject. The experiment was performed an the Cranfield collection, and the different algorithms were evaluated using the residual collection method (one of the most suitable methods for evaluating relevance feedback techniques). The results varied greatly depending an the fitness function that was used, from no improvement in some of the genetic algorithms, to a more than 127% improvement with one algorithm, surpassing even the traditional Ide dec-hi method. One can therefore conclude that genetic algorithms show great promise as an aid to implementing a truly effective information retrieval system.
López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Order-based fitness functions for genetic algorithms applied to relevance feedback (2003) 0.00
```
0.0028605436 = product of:
  0.020023804 = sum of:
    0.0050448296 = weight(_text_:information in 5154) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=5154,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 5154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5154)
    0.014978974 = weight(_text_:retrieval in 5154) [ClassicSimilarity], result of:
      0.014978974 = score(doc=5154,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 5154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5154)
  0.14285715 = coord(2/14)
```
Abstract

Lopez-Pujalte and Guerrero-Bote test a relevance feedback genetic algorithm while varying its order based fitness functions and generating a function based upon the Ide dec-hi method as a base line. Using the non-zero weighted term types assigned to the query, and to the initially retrieved set of documents, as genes, a chromosome of equal length is created for each. The algorithm is provided with the chromosomes for judged relevant documents, for judged irrelevant documents, and for the irrelevant documents with their terms negated. The algorithm uses random selection of all possible genes, but gives greater likelihood to those with higher fitness values. When the fittest chromosome of a previous population is eliminated it is restored while the least fittest of the new population is eliminated in its stead. A crossover probability of .8 and a mutation probability of .2 were used with 20 generations. Three fitness functions were utilized; the Horng and Yeh function which takes into account the position of relevant documents, and two new functions, one based on accumulating the cosine similarity for retrieved documents, the other on stored fixed-recall-interval precessions. The Cranfield collection was used with the first 15 documents retrieved from 33 queries chosen to have at least 3 relevant documents in the first 15 and at least 5 relevant documents not initially retrieved. Precision was calculated at fixed recall levels using the residual collection method which removes viewed documents. One of the three functions improved the original retrieval by127 percent, while the Ide dec-hi method provided a 120 percent improvement.

Source

Journal of the American Society for Information Science and technology. 54(2003) no.2, S.152-160
Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.00
```
0.0022238013 = product of:
  0.031133216 = sum of:
    0.031133216 = weight(_text_:retrieval in 5599) [ClassicSimilarity], result of:
      0.031133216 = score(doc=5599,freq=6.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.34732026 = fieldWeight in 5599, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=5599)
  0.071428575 = coord(1/14)
```
Abstract

Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.
Moya-Anegón, F. de; Vargas-Quesada, B.; Chinchilla-Rodríguez, Z.; Corera-Álvarez, E.; Munoz-Fernández, F.J.; Herrero-Solana, V.; SCImago Group: Visualizing the marrow of science (2007) 0.00
```
7.134467E-4 = product of:
  0.009988253 = sum of:
    0.009988253 = weight(_text_:information in 1313) [ClassicSimilarity], result of:
      0.009988253 = score(doc=1313,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1920054 = fieldWeight in 1313, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1313)
  0.071428575 = coord(1/14)
```
Abstract

This study proposes a new methodology that allows for the generation of scientograms of major scientific domains, constructed on the basis of cocitation of Institute of Scientific Information categories, and pruned using PathfinderNetwork, with a layout determined by algorithms of the spring-embedder type (Kamada-Kawai), then corroborated structurally by factor analysis. We present the complete scientogram of the world for the Year 2002. It integrates the natural sciences, the social sciences, and arts and humanities. Its basic structure and the essential relationships therein are revealed, allowing us to simultaneously analyze the macrostructure, microstructure, and marrow of worldwide scientific output.

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.14, S.2167-2179

López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Genetic algorithms in relevance feedback : a second test and new contributions (2003) 0.00

5.04483E-4 = product of:
  0.0070627616 = sum of:
    0.0070627616 = weight(_text_:information in 1076) [ClassicSimilarity], result of:
      0.0070627616 = score(doc=1076,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13576832 = fieldWeight in 1076, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1076)
  0.071428575 = coord(1/14)

Source: Information processing and management. 39(2003) no.5, S.669-687

Search (8 results, page 1 of 1)

Authors

Themes