Search (80 results, page 1 of 4)

  • × theme_ss:"Semantisches Umfeld in Indexierung u. Retrieval"
  1. Fowler, R.H.; Wilson, B.A.; Fowler, W.A.L.: Information navigator : an information system using associative networks for display and retrieval (1992) 0.03
    0.032838944 = product of:
      0.13135578 = sum of:
        0.13135578 = weight(_text_:graphic in 919) [ClassicSimilarity], result of:
          0.13135578 = score(doc=919,freq=2.0), product of:
            0.29924196 = queryWeight, product of:
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.045191016 = queryNorm
            0.43896174 = fieldWeight in 919, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.6217136 = idf(docFreq=159, maxDocs=44218)
              0.046875 = fieldNorm(doc=919)
      0.25 = coord(1/4)
    
    Abstract
    Document retrieval is a highly interactive process dealing with large amounts of information. Visual representations can provide both a means for managing the complexity of large information structures and an interface style well suited to interactive manipulation. The system we have designed utilizes visually displayed graphic structures and a direct manipulation interface style to supply an integrated environment for retrieval. A common visually displayed network structure is used for query, document content, and term relations. A query can be modified through direct manipulation of its visual form by incorporating terms from any other information structure the system displays. An associative thesaurus of terms and an inter-document network provide information about a document collection that can complement other retrieval aids. Visualization of these large data structures makes use of fisheye views and overview diagrams to help overcome some of the inherent difficulties of orientation and navigation in large information structures.
  2. Sacco, G.M.: Dynamic taxonomies and guided searches (2006) 0.03
    0.02927637 = product of:
      0.11710548 = sum of:
        0.11710548 = sum of:
          0.056493253 = weight(_text_:methods in 5295) [ClassicSimilarity], result of:
            0.056493253 = score(doc=5295,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.31093797 = fieldWeight in 5295, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5295)
          0.060612224 = weight(_text_:22 in 5295) [ClassicSimilarity], result of:
            0.060612224 = score(doc=5295,freq=4.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.38301262 = fieldWeight in 5295, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5295)
      0.25 = coord(1/4)
    
    Abstract
    A new search paradigm, in which the primary user activity is the guided exploration of a complex information space rather than the retrieval of items based on precise specifications, is proposed. The author claims that this paradigm is the norm in most practical applications, and that solutions based on traditional search methods are not effective in this context. He then presents a solution based on dynamic taxonomies, a knowledge management model that effectively guides users to reach their goal while giving them total freedom in exploring the information base. Applications, benefits, and current research are discussed.
    Date
    22. 7.2006 17:56:22
  3. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.02
    0.021920148 = product of:
      0.08768059 = sum of:
        0.08768059 = sum of:
          0.057066802 = weight(_text_:methods in 1343) [ClassicSimilarity], result of:
            0.057066802 = score(doc=1343,freq=4.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.31409478 = fieldWeight in 1343, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1343)
          0.030613795 = weight(_text_:22 in 1343) [ClassicSimilarity], result of:
            0.030613795 = score(doc=1343,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.19345059 = fieldWeight in 1343, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1343)
      0.25 = coord(1/4)
    
    Abstract
    A substantial fraction of web search queries contain references to entities, such as persons, organizations, and locations. Recently, methods that exploit named entities have been shown to be more effective for query expansion than traditional pseudorelevance feedback methods. In this article, we introduce a supervised learning approach that exploits named entities for query expansion using Wikipedia as a repository of high-quality feedback documents. In contrast with existing entity-oriented pseudorelevance feedback approaches, we tackle query expansion as a learning-to-rank problem. As a result, not only do we select effective expansion terms but we also weigh these terms according to their predicted effectiveness. To this end, we exploit the rich structure of Wikipedia articles to devise discriminative term features, including each candidate term's proximity to the original query terms, as well as its frequency across multiple article fields and in category and infobox descriptors. Experiments on three Text REtrieval Conference web test collections attest the effectiveness of our approach, with gains of up to 23.32% in terms of mean average precision, 19.49% in terms of precision at 10, and 7.86% in terms of normalized discounted cumulative gain compared with a state-of-the-art approach for entity-oriented query expansion.
    Date
    22. 8.2014 17:07:50
  4. Song, D.; Bruza, P.D.: Towards context sensitive information inference (2003) 0.02
    0.01774153 = product of:
      0.07096612 = sum of:
        0.07096612 = sum of:
          0.040352322 = weight(_text_:methods in 1428) [ClassicSimilarity], result of:
            0.040352322 = score(doc=1428,freq=2.0), product of:
              0.18168657 = queryWeight, product of:
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.045191016 = queryNorm
              0.22209854 = fieldWeight in 1428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0204134 = idf(docFreq=2156, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
          0.030613795 = weight(_text_:22 in 1428) [ClassicSimilarity], result of:
            0.030613795 = score(doc=1428,freq=2.0), product of:
              0.15825124 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.045191016 = queryNorm
              0.19345059 = fieldWeight in 1428, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1428)
      0.25 = coord(1/4)
    
    Date
    22. 3.2003 19:35:46
    Footnote
    Beitrag eines Themenheftes: Mathematical, logical, and formal methods in information retrieval
  5. Ekmekcioglu, F.C.; Robertson, A.M.; Willett, P.: Effectiveness of query expansion in ranked-output document retrieval systems (1992) 0.01
    0.013978455 = product of:
      0.05591382 = sum of:
        0.05591382 = product of:
          0.11182764 = sum of:
            0.11182764 = weight(_text_:methods in 5689) [ClassicSimilarity], result of:
              0.11182764 = score(doc=5689,freq=6.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.6154976 = fieldWeight in 5689, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5689)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Reports an evaluation of 3 methods for the expansion of natural language queries in ranked output retrieval systems. The methods are based on term co-occurrence data, on Soundex codes, and on a string similarity measure. Searches for 110 queries in a data base of 26.280 titles and abstracts suggest that there is no significant difference in retrieval effectiveness between any of these methods and unexpanded searches
  6. Walker, S.: Subject access in online catalogues (1991) 0.01
    0.013978455 = product of:
      0.05591382 = sum of:
        0.05591382 = product of:
          0.11182764 = sum of:
            0.11182764 = weight(_text_:methods in 5690) [ClassicSimilarity], result of:
              0.11182764 = score(doc=5690,freq=6.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.6154976 = fieldWeight in 5690, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5690)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Discusses some of the methods of subject access to on-line catalohues (OPACs) and argues that none are entirley satisfactory. Describes 2 methods for improving subject access: best match searching; and automatic query expansion application and discusses their feasibility. Mentions emerging application standards for information retrieval and concludes that existing standards are incompatible with most methods for improving standards
  7. Celik, I.; Abel, F.; Siehndel, P.: Adaptive faceted search on Twitter (2011) 0.01
    0.011413361 = product of:
      0.045653444 = sum of:
        0.045653444 = product of:
          0.09130689 = sum of:
            0.09130689 = weight(_text_:methods in 2221) [ClassicSimilarity], result of:
              0.09130689 = score(doc=2221,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.5025517 = fieldWeight in 2221, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2221)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    In the last few years, Twitter has become a powerful tool for publishing and discussing information. Yet, content exploration in Twitter requires substantial efforts and users often have to scan information streams by hand. In this paper, we approach this problem by means of faceted search. We propose strategies for inferring facets and facet values on Twitter by enriching the semantics of individual Twitter messages and present di erent methods, including personalized and context-adaptive methods, for making faceted search on Twitter more effective.
  8. Gödert, W.: Inhaltliche Dokumenterschließung, Information Retrieval und Navigation in Informationsräumen (1995) 0.01
    0.011413361 = product of:
      0.045653444 = sum of:
        0.045653444 = product of:
          0.09130689 = sum of:
            0.09130689 = weight(_text_:methods in 4438) [ClassicSimilarity], result of:
              0.09130689 = score(doc=4438,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.5025517 = fieldWeight in 4438, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4438)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Examines the advantages and disadvantages of precoordinated, postcoordinated and automatic indexing with regard to existing information storage systems, such as card catalogues, OPACs, CR-ROM databases, and online databases. Presents a general model of document content representation and concludes that the library profession needs to address the development of databank design models, relevance feedback methods and automatic indexing assessment methods, to make indexing more effective
  9. Pal, D.; Mitra, M.; Datta, K.: Improving query expansion using WordNet (2014) 0.01
    0.011278818 = product of:
      0.045115273 = sum of:
        0.045115273 = product of:
          0.09023055 = sum of:
            0.09023055 = weight(_text_:methods in 1545) [ClassicSimilarity], result of:
              0.09023055 = score(doc=1545,freq=10.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4966275 = fieldWeight in 1545, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1545)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This study proposes a new way of using WordNet for query expansion (QE). We choose candidate expansion terms from a set of pseudo-relevant documents; however, the usefulness of these terms is measured based on their definitions provided in a hand-crafted lexical resource such as WordNet. Experiments with a number of standard TREC collections WordNet-based that this method outperforms existing WordNet-based methods. It also compares favorably with established QE methods such as KLD and RM3. Leveraging earlier work in which a combination of QE methods was found to outperform each individual method (as well as other well-known QE methods), we next propose a combination-based QE method that takes into account three different aspects of a candidate expansion term's usefulness: (a) its distribution in the pseudo-relevant documents and in the target corpus, (b) its statistical association with query terms, and (c) its semantic relation with the query, as determined by the overlap between the WordNet definitions of the term and query terms. This combination of diverse sources of information appears to work well on a number of test collections, viz., TREC123, TREC5, TREC678, TREC robust (new), and TREC910 collections, and yields significant improvements over competing methods on most of these collections.
  10. Boyack, K.W.; Wylie,B.N.; Davidson, G.S.: Information Visualization, Human-Computer Interaction, and Cognitive Psychology : Domain Visualizations (2002) 0.01
    0.010823611 = product of:
      0.043294445 = sum of:
        0.043294445 = product of:
          0.08658889 = sum of:
            0.08658889 = weight(_text_:22 in 1352) [ClassicSimilarity], result of:
              0.08658889 = score(doc=1352,freq=4.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.54716086 = fieldWeight in 1352, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1352)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 2.2003 17:25:39
    22. 2.2003 18:17:40
  11. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.01
    0.010714828 = product of:
      0.042859312 = sum of:
        0.042859312 = product of:
          0.085718624 = sum of:
            0.085718624 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.085718624 = score(doc=2134,freq=2.0), product of:
                0.15825124 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045191016 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    30. 3.2001 13:32:22
  12. Li, D.; Kwong, C.-P.; Lee, D.L.: Unified linear subspace approach to semantic analysis (2009) 0.01
    0.010088081 = product of:
      0.040352322 = sum of:
        0.040352322 = product of:
          0.080704644 = sum of:
            0.080704644 = weight(_text_:methods in 3321) [ClassicSimilarity], result of:
              0.080704644 = score(doc=3321,freq=8.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4441971 = fieldWeight in 3321, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3321)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The Basic Vector Space Model (BVSM) is well known in information retrieval. Unfortunately, its retrieval effectiveness is limited because it is based on literal term matching. The Generalized Vector Space Model (GVSM) and Latent Semantic Indexing (LSI) are two prominent semantic retrieval methods, both of which assume there is some underlying latent semantic structure in a dataset that can be used to improve retrieval performance. However, while this structure may be derived from both the term space and the document space, GVSM exploits only the former and LSI the latter. In this article, the latent semantic structure of a dataset is examined from a dual perspective; namely, we consider the term space and the document space simultaneously. This new viewpoint has a natural connection to the notion of kernels. Specifically, a unified kernel function can be derived for a class of vector space models. The dual perspective provides a deeper understanding of the semantic space and makes transparent the geometrical meaning of the unified kernel function. New semantic analysis methods based on the unified kernel function are developed, which combine the advantages of LSI and GVSM. We also prove that the new methods are stable because although the selected rank of the truncated Singular Value Decomposition (SVD) is far from the optimum, the retrieval performance will not be degraded significantly. Experiments performed on standard test collections show that our methods are promising.
  13. Jiang, Y.; Bai, W.; Zhang, X.; Hu, J.: Wikipedia-based information content and semantic similarity computation (2017) 0.01
    0.010088081 = product of:
      0.040352322 = sum of:
        0.040352322 = product of:
          0.080704644 = sum of:
            0.080704644 = weight(_text_:methods in 2877) [ClassicSimilarity], result of:
              0.080704644 = score(doc=2877,freq=8.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4441971 = fieldWeight in 2877, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2877)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
  14. Xu, B.; Lin, H.; Lin, Y.: Assessment of learning to rank methods for query expansion (2016) 0.01
    0.010088081 = product of:
      0.040352322 = sum of:
        0.040352322 = product of:
          0.080704644 = sum of:
            0.080704644 = weight(_text_:methods in 2929) [ClassicSimilarity], result of:
              0.080704644 = score(doc=2929,freq=8.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4441971 = fieldWeight in 2929, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2929)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Pseudo relevance feedback, as an effective query expansion method, can significantly improve information retrieval performance. However, the method may negatively impact the retrieval performance when some irrelevant terms are used in the expanded query. Therefore, it is necessary to refine the expansion terms. Learning to rank methods have proven effective in information retrieval to solve ranking problems by ranking the most relevant documents at the top of the returned list, but few attempts have been made to employ learning to rank methods for term refinement in pseudo relevance feedback. This article proposes a novel framework to explore the feasibility of using learning to rank to optimize pseudo relevance feedback by means of reranking the candidate expansion terms. We investigate some learning approaches to choose the candidate terms and introduce some state-of-the-art learning to rank methods to refine the expansion terms. In addition, we propose two term labeling strategies and examine the usefulness of various term features to optimize the framework. Experimental results with three TREC collections show that our framework can effectively improve retrieval performance.
  15. Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.01
    0.010088081 = product of:
      0.040352322 = sum of:
        0.040352322 = product of:
          0.080704644 = sum of:
            0.080704644 = weight(_text_:methods in 3223) [ClassicSimilarity], result of:
              0.080704644 = score(doc=3223,freq=8.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.4441971 = fieldWeight in 3223, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3223)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
  16. Robertson, S.E.: OKAPI at TREC-3 (1995) 0.01
    0.009986691 = product of:
      0.039946765 = sum of:
        0.039946765 = product of:
          0.07989353 = sum of:
            0.07989353 = weight(_text_:methods in 5694) [ClassicSimilarity], result of:
              0.07989353 = score(doc=5694,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.43973273 = fieldWeight in 5694, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5694)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Reports text information retrieval experiments performed as part of the 3 rd round of Text Retrieval Conferences (TREC) using the Okapi online catalogue system at City University, UK. The emphasis in TREC-3 was: further refinement of term weighting functions; an investigation of run time passage determination and searching; expansion of ad hoc queries by terms extracted from the top documents retrieved by a trial search; new methods for choosing query expansion terms after relevance feedback, now split into methods of ranking terms prior to selection and subsequent selection procedures; and the development of a user interface procedure within the new TREC interactive search framework
  17. Lobin, H.; Witt, A.: Semantic and thematic navigation in electronic encyclopedias (1999) 0.01
    0.009986691 = product of:
      0.039946765 = sum of:
        0.039946765 = product of:
          0.07989353 = sum of:
            0.07989353 = weight(_text_:methods in 624) [ClassicSimilarity], result of:
              0.07989353 = score(doc=624,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.43973273 = fieldWeight in 624, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=624)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    In the field of electronic publishing, encyclopedias represent a unique sort of text for investigating advanced methods of navigation. The user of an electronic excyclopedia normally expects special methods for accessing the entries in an encyclopedia database. Navigation through printed encyclopedias in the traditional sense focuses on the alphabetic order of the entries. In electronic encyclopedias, however, thematic structuring of lemmas and, of course, extensive (hyper-) linking mechanisms have been added. This paper will focus on showing developments, which go beyond these navigational strucutres. We will concentrate on the semantic space formed by lemmas to build a network of semantic distances and thematic trails through the encyclopedia
  18. Greenberg, J.: Optimal query expansion (QE) processing methods with semantically encoded structured thesaurus terminology (2001) 0.01
    0.0085600205 = product of:
      0.034240082 = sum of:
        0.034240082 = product of:
          0.068480164 = sum of:
            0.068480164 = weight(_text_:methods in 5750) [ClassicSimilarity], result of:
              0.068480164 = score(doc=5750,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.37691376 = fieldWeight in 5750, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5750)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    While researchers have explored the value of structured thesauri as controlled vocabularies for general information retrieval (IR) activities, they have not identified the optimal query expansion (QE) processing methods for taking advantage of the semantic encoding underlying the terminology in these tools. The study reported on in this article addresses this question, and examined whether QE via semantically encoded thesauri terminology is more effective in the automatic or interactive processing environment. The research found that, regardless of end-users' retrieval goals, synonyms and partial synonyms (SYNs) and narrower terms (NTs) are generally good candidates for automatic QE and that related (RTs) are better candidates for interactive QE. The study also examined end-users' selection of semantically encoded thesauri terms for interactive QE, and explored how retrieval goals and QE processes may be combined in future thesauri-supported IR systems
  19. Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.01
    0.0085600205 = product of:
      0.034240082 = sum of:
        0.034240082 = product of:
          0.068480164 = sum of:
            0.068480164 = weight(_text_:methods in 1165) [ClassicSimilarity], result of:
              0.068480164 = score(doc=1165,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.37691376 = fieldWeight in 1165, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1165)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.
  20. Vechtomova, O.; Robertson, S.E.: ¬A domain-independent approach to finding related entities (2012) 0.01
    0.0085600205 = product of:
      0.034240082 = sum of:
        0.034240082 = product of:
          0.068480164 = sum of:
            0.068480164 = weight(_text_:methods in 2733) [ClassicSimilarity], result of:
              0.068480164 = score(doc=2733,freq=4.0), product of:
                0.18168657 = queryWeight, product of:
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.045191016 = queryNorm
                0.37691376 = fieldWeight in 2733, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.0204134 = idf(docFreq=2156, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2733)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    We propose an approach to the retrieval of entities that have a specific relationship with the entity given in a query. Our research goal is to investigate whether related entity finding problem can be addressed by combining a measure of relatedness of candidate answer entities to the query, and likelihood that the candidate answer entity belongs to the target entity category specified in the query. An initial list of candidate entities, extracted from top ranked documents retrieved for the query, is refined using a number of statistical and linguistic methods. The proposed method extracts the category of the target entity from the query, identifies instances of this category as seed entities, and computes similarity between candidate and seed entities. The evaluation was conducted on the Related Entity Finding task of the Entity Track of TREC 2010, as well as the QA list questions from TREC 2005 and 2006. Evaluation results demonstrate that the proposed methods are effective in finding related entities.

Authors

Years

Languages

  • e 74
  • d 5
  • More… Less…

Types

  • a 72
  • el 9
  • m 4
  • p 1
  • s 1
  • x 1
  • More… Less…