Search (49 results, page 1 of 3)

  • × theme_ss:"Semantisches Umfeld in Indexierung u. Retrieval"
  1. Lund, K.; Burgess, C.; Atchley, R.A.: Semantic and associative priming in high-dimensional semantic space (1995) 0.11
    0.106233686 = product of:
      0.21246737 = sum of:
        0.21246737 = sum of:
          0.1615221 = weight(_text_:word in 2151) [ClassicSimilarity], result of:
            0.1615221 = score(doc=2151,freq=4.0), product of:
              0.28165168 = queryWeight, product of:
                5.2432623 = idf(docFreq=634, maxDocs=44218)
                0.05371688 = queryNorm
              0.5734818 = fieldWeight in 2151, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.2432623 = idf(docFreq=634, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2151)
          0.050945267 = weight(_text_:22 in 2151) [ClassicSimilarity], result of:
            0.050945267 = score(doc=2151,freq=2.0), product of:
              0.18810736 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05371688 = queryNorm
              0.2708308 = fieldWeight in 2151, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2151)
      0.5 = coord(1/2)
    
    Abstract
    We present a model of semantic memory that utilizes a high dimensional semantic space constructed from a co-occurrence matrix. This matrix was formed by analyzing a lot) million word corpus. Word vectors were then obtained by extracting rows and columns of this matrix, These vectors were subjected to multidimensional scaling. Words were found to cluster semantically. suggesting that interword distance may be interpretable as a measure of semantic similarity, In attempting to replicate with our simulation the semantic and ...
    Source
    Proceedings of the Seventeenth Annual Conference of the Cognitive Science Society: July 22 - 25, 1995, University of Pittsburgh / ed. by Johanna D. Moore and Jill Fain Lehmann
  2. Landauer, T.K.; Foltz, P.W.; Laham, D.: ¬An introduction to Latent Semantic Analysis (1998) 0.06
    0.059949536 = product of:
      0.11989907 = sum of:
        0.11989907 = product of:
          0.23979814 = sum of:
            0.23979814 = weight(_text_:word in 1162) [ClassicSimilarity], result of:
              0.23979814 = score(doc=1162,freq=12.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.85139966 = fieldWeight in 1162, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1162)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Latent Semantic Analysis (LSA) is a theory and method for extracting and representing the contextual-usage meaning of words by statistical computations applied to a large corpus of text (Landauer and Dumais, 1997). The underlying idea is that the aggregate of all the word contexts in which a given word does and does not appear provides a set of mutual constraints that largely determines the similarity of meaning of words and sets of words to each other. The adequacy of LSA's reflection of human knowledge has been established in a variety of ways. For example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word-word and passage-word lexical priming data; and as reported in 3 following articles in this issue, it accurately estimates passage coherence, learnability of passages by individual students, and the quality and quantity of knowledge contained in an essay.
  3. Gauch, S.; Chong, M.K.: Automatic word similarity detection for TREC 4 query expansion (1996) 0.05
    0.04894859 = product of:
      0.09789718 = sum of:
        0.09789718 = product of:
          0.19579436 = sum of:
            0.19579436 = weight(_text_:word in 2991) [ClassicSimilarity], result of:
              0.19579436 = score(doc=2991,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6951649 = fieldWeight in 2991, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2991)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  4. Lee, Y.-Y.; Ke, H.; Yen, T.-Y.; Huang, H.-H.; Chen, H.-H.: Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement (2020) 0.05
    0.04894859 = product of:
      0.09789718 = sum of:
        0.09789718 = product of:
          0.19579436 = sum of:
            0.19579436 = weight(_text_:word in 5871) [ClassicSimilarity], result of:
              0.19579436 = score(doc=5871,freq=8.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6951649 = fieldWeight in 5871, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5871)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this research, we propose 3 different approaches to measure the semantic relatedness between 2 words: (i) boost the performance of GloVe word embedding model via removing or transforming abnormal dimensions; (ii) linearly combine the information extracted from WordNet and word embeddings; and (iii) utilize word embedding and 12 linguistic information extracted from WordNet as features for Support Vector Regression. We conducted our experiments on 8 benchmark data sets, and computed Spearman correlations between the outputs of our methods and the ground truth. We report our results together with 3 state-of-the-art approaches. The experimental results show that our method can outperform state-of-the-art approaches in all the selected English benchmark data sets.
  5. Harman, D.: Automatic indexing (1994) 0.05
    0.046149176 = product of:
      0.09229835 = sum of:
        0.09229835 = product of:
          0.1845967 = sum of:
            0.1845967 = weight(_text_:word in 7729) [ClassicSimilarity], result of:
              0.1845967 = score(doc=7729,freq=4.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6554078 = fieldWeight in 7729, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0625 = fieldNorm(doc=7729)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Content
    Enthält die Abschnitte: What constitutes a record; What constitutes a word and what 'words' to index; Use of stop lists; Use of suffixing or stemming; Advanced automatic indexing techniques (term weighting, query expansion, the use of multiple-word phrases for indexing)
  6. Fernández-Reyes, F.C.; Hermosillo-Valadez, J.; Montes-y-Gómez, M.: ¬A prospect-guided global query expansion strategy using word embeddings (2018) 0.05
    0.04560516 = product of:
      0.09121032 = sum of:
        0.09121032 = product of:
          0.18242064 = sum of:
            0.18242064 = weight(_text_:word in 5090) [ClassicSimilarity], result of:
              0.18242064 = score(doc=5090,freq=10.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6476817 = fieldWeight in 5090, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5090)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The effectiveness of query expansion methods depends essentially on identifying good candidates, or prospects, semantically related to query terms. Word embeddings have been used recently in an attempt to address this problem. Nevertheless query disambiguation is still necessary as the semantic relatedness of each word in the corpus is modeled, but choosing the right terms for expansion from the standpoint of the un-modeled query semantics remains an open issue. In this paper we propose a novel query expansion method using word embeddings that models the global query semantics from the standpoint of prospect vocabulary terms. The proposed method allows to explore query-vocabulary semantic closeness in such a way that new terms, semantically related to more relevant topics, are elicited and added in function of the query as a whole. The method includes candidates pooling strategies that address disambiguation issues without using exogenous resources. We tested our method with three topic sets over CLEF corpora and compared it across different Information Retrieval models and against another expansion technique using word embeddings as well. Our experiments indicate that our method achieves significant results that outperform the baselines, improving both recall and precision metrics without relevance feedback.
  7. Lund, K.; Burgess, C.: Producing high-dimensional semantic spaces from lexical co-occurrence (1996) 0.04
    0.042390727 = product of:
      0.08478145 = sum of:
        0.08478145 = product of:
          0.1695629 = sum of:
            0.1695629 = weight(_text_:word in 1704) [ClassicSimilarity], result of:
              0.1695629 = score(doc=1704,freq=6.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6020305 = fieldWeight in 1704, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1704)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).
  8. Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.04
    0.035325605 = product of:
      0.07065121 = sum of:
        0.07065121 = product of:
          0.14130242 = sum of:
            0.14130242 = weight(_text_:word in 1338) [ClassicSimilarity], result of:
              0.14130242 = score(doc=1338,freq=6.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.5016921 = fieldWeight in 1338, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1338)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
  9. Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.04
    0.035325605 = product of:
      0.07065121 = sum of:
        0.07065121 = product of:
          0.14130242 = sum of:
            0.14130242 = weight(_text_:word in 3223) [ClassicSimilarity], result of:
              0.14130242 = score(doc=3223,freq=6.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.5016921 = fieldWeight in 3223, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3223)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
  10. Colace, F.; Santo, M. de; Greco, L.; Napoletano, P.: Improving relevance feedback-based query expansion by the use of a weighted word pairs approach (2015) 0.03
    0.03461188 = product of:
      0.06922376 = sum of:
        0.06922376 = product of:
          0.13844752 = sum of:
            0.13844752 = weight(_text_:word in 2263) [ClassicSimilarity], result of:
              0.13844752 = score(doc=2263,freq=4.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.49155584 = fieldWeight in 2263, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2263)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article, the use of a new term extraction method for query expansion (QE) in text retrieval is investigated. The new method expands the initial query with a structured representation made of weighted word pairs (WWP) extracted from a set of training documents (relevance feedback). Standard text retrieval systems can handle a WWP structure through custom Boolean weighted models. We experimented with both the explicit and pseudorelevance feedback schemas and compared the proposed term extraction method with others in the literature, such as KLD and RM3. Evaluations have been conducted on a number of test collections (Text REtrivel Conference [TREC]-6, -7, -8, -9, and -10). Results demonstrated that the QE method based on this new structure outperforms the baseline.
  11. Bernier-Colborne, G.: Identifying semantic relations in a specialized corpus through distributional analysis of a cooccurrence tensor (2014) 0.03
    0.032632396 = product of:
      0.06526479 = sum of:
        0.06526479 = product of:
          0.13052958 = sum of:
            0.13052958 = weight(_text_:word in 2153) [ClassicSimilarity], result of:
              0.13052958 = score(doc=2153,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.46344328 = fieldWeight in 2153, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2153)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We describe a method of encoding cooccurrence information in a three-way tensor from which HAL-style word space models can be derived. We use these models to identify semantic relations in a specialized corpus. Results suggest that the tensor-based methods we propose are more robust than the basic HAL model in some respects.
  12. Colace, F.; Santo, M. De; Greco, L.; Napoletano, P.: Weighted word pairs for query expansion (2015) 0.03
    0.028553344 = product of:
      0.05710669 = sum of:
        0.05710669 = product of:
          0.11421338 = sum of:
            0.11421338 = weight(_text_:word in 2687) [ClassicSimilarity], result of:
              0.11421338 = score(doc=2687,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.40551287 = fieldWeight in 2687, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2687)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  13. Boyack, K.W.; Wylie,B.N.; Davidson, G.S.: Information Visualization, Human-Computer Interaction, and Cognitive Psychology : Domain Visualizations (2002) 0.03
    0.025731245 = product of:
      0.05146249 = sum of:
        0.05146249 = product of:
          0.10292498 = sum of:
            0.10292498 = weight(_text_:22 in 1352) [ClassicSimilarity], result of:
              0.10292498 = score(doc=1352,freq=4.0), product of:
                0.18810736 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05371688 = queryNorm
                0.54716086 = fieldWeight in 1352, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1352)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 2.2003 17:25:39
    22. 2.2003 18:17:40
  14. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.03
    0.025472634 = product of:
      0.050945267 = sum of:
        0.050945267 = product of:
          0.101890534 = sum of:
            0.101890534 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.101890534 = score(doc=2134,freq=2.0), product of:
                0.18810736 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05371688 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    30. 3.2001 13:32:22
  15. Fieldhouse, M.; Hancock-Beaulieu, M.: ¬The changing face of OKAPI (1994) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 1493) [ClassicSimilarity], result of:
              0.09789718 = score(doc=1493,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 1493, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1493)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Describes the OKAPI projects and OKAPI's development as an experimental online catalogue system over 10 years, first at the University of Westminster (formerly the Polytechnique of Central London) and subsequently at City University, London. The 1st OKAPI project in 1984 introduced 'best match' retrieval and focused on the user interface design. The 2nd investigated word stemming, spelling correction and cross reference tables as retrieval aids. A comparative study of 2 library catalogues was undertaken in 1987, while in 1988 query expansion and relevance feedback were introduced and evaluated by laboratory tests. In 1990 live evaluation of automatic query expansion was carried out and in 1993 subject enhancement of bibliographic records was investigated. The latest project has examined the design of a graphical user interface to support interactive query expansion. Discusses the research and evaluation of each project
  16. Fidel, R.; Efthimiadis, E.N.: Terminological knowledge structure for intermediary expert systems (1995) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 5695) [ClassicSimilarity], result of:
              0.09789718 = score(doc=5695,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 5695, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5695)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    To provide advice for online searching about term selection and query expansion, an intermediary expert system should indicate a terminological knowledge structure. Terminological attributes could provide the foundation of a knowledge base, and knowledge acquisition could rely on knowledge base techniques coupled with statistical techniques. The strategies of expert searchers would provide 1 source of knowledge. The knowledge structure would include 3 constructs for each term: frequency data, a hedge, and a position in a classification scheme. Switching vocabularies could provide a meta-scheme and facilitate the interoperability of databases in similar subjects. To develop such knowledge structure, research should focus on terminological attributes, word and phrase disambiguation, automated text processing, and the role of thesauri and classification schemes in indexing and retrieval. It should develop techniques that combine knowledge base and statistical methods and that consider user preferences
  17. Poynder, R.: Web research engines? (1996) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 5698) [ClassicSimilarity], result of:
              0.09789718 = score(doc=5698,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 5698, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5698)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Describes the shortcomings of search engines for the WWW comparing their current capabilities to those of the first generation CD-ROM products. Some allow phrase searching and most are improving their Boolean searching. Few allow truncation, wild cards or nested logic. They are stateless, losing previous search criteria. Unlike the indexing and classification systems for today's CD-ROMs, those for Web pages are random, unstructured and of variable quality. Considers that at best Web search engines can only offer free text searching. Discusses whether automatic data classification systems such as Infoseek Ultra can overcome the haphazard nature of the Web with neural network technology, and whether Boolean search techniques may be redundant when replaced by technology such as the Euroferret search engine. However, artificial intelligence is rarely successful on huge, varied databases. Relevance ranking and automatic query expansion still use the same simple inverted indexes. Most Web search engines do nothing more than word counting. Further complications arise with foreign languages
  18. Mayr, P.; Schaer, P.; Mutschke, P.: ¬A science model driven retrieval prototype (2011) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 649) [ClassicSimilarity], result of:
              0.09789718 = score(doc=649,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 649, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=649)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper is about a better understanding of the structure and dynamics of science and the usage of these insights for compensating the typical problems that arises in metadata-driven Digital Libraries. Three science model driven retrieval services are presented: co-word analysis based query expansion, re-ranking via Bradfordizing and author centrality. The services are evaluated with relevance assessments from which two important implications emerge: (1) precision values of the retrieval services are the same or better than the tf-idf retrieval baseline and (2) each service retrieved a disjoint set of documents. The different services each favor quite other - but still relevant - documents than pure term-frequency based rankings. The proposed models and derived retrieval services therefore open up new viewpoints on the scientific knowledge space and provide an alternative framework to structure scholarly information systems.
  19. Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 1165) [ClassicSimilarity], result of:
              0.09789718 = score(doc=1165,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 1165, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1165)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.
  20. Brezillon, P.; Saker, I.: Modeling context in information seeking (1999) 0.02
    0.023074588 = product of:
      0.046149176 = sum of:
        0.046149176 = product of:
          0.09229835 = sum of:
            0.09229835 = weight(_text_:word in 276) [ClassicSimilarity], result of:
              0.09229835 = score(doc=276,freq=4.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.3277039 = fieldWeight in 276, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.03125 = fieldNorm(doc=276)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Context plays an important role in a number of domains where reasoning intervenes as in understanding, interpretation, diagnosis, etc. The reason is that reasoning activities heavily rely on a background (or experience) that is generally not made explicit and that gives a contextual dimension to knowledge. On the Web in December 1996, AItaVista gave more than 710000 pages containing the word context, when concept gives only 639000 references. A clear definition of this word stays to be found. There are several formal definitions of this concept (references are given in Brézillon, 1996): a set of preferences and/or beliefs, an infinite and only partially known collection of assumptions, a list of attributes, the product of an interpretation, possible worlds, assumptions under which a statement is true or false. One faces the same situation at the programming level: a collection of context schemas; a path in information retrieval; slots in object-oriented languages; a special, buffer-like data structure; a window on the screen, buttons which are functional customisable and shareable; an interpreter which controls the system's activity; the characteristics of the situation and the goals of the knowledge use; or entities (things or events) related in a certain way that permits to listen what is said and what is not said. Context is often assimilated at a set of restrictions (e.g., preconditions) that limit access to parts of the applications. The first works considering context explicitly are in Natural Language. Researchers in this domain focus on the linguistic context, sometimes associated with other types of contexts as: semantic context, cognitive context, physical and perceptual context, and social context (Bunt, 1997).

Authors

Years

Languages

  • e 44
  • d 5

Types

  • a 46
  • el 4
  • x 1
  • More… Less…