Search (118 results, page 1 of 6)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10
    0.103721194 = sum of:
      0.08258625 = product of:
        0.24775875 = sum of:
          0.24775875 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.24775875 = score(doc=562,freq=2.0), product of:
              0.4408377 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.051997773 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.33333334 = coord(1/3)
      0.02113494 = product of:
        0.04226988 = sum of:
          0.04226988 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.04226988 = score(doc=562,freq=2.0), product of:
              0.18208735 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051997773 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.05
    0.04944489 = product of:
      0.09888978 = sum of:
        0.09888978 = sum of:
          0.049074244 = weight(_text_:search in 2541) [ClassicSimilarity], result of:
            0.049074244 = score(doc=2541,freq=4.0), product of:
              0.18072747 = queryWeight, product of:
                3.475677 = idf(docFreq=3718, maxDocs=44218)
                0.051997773 = queryNorm
              0.27153727 = fieldWeight in 2541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.475677 = idf(docFreq=3718, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
          0.049815536 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
            0.049815536 = score(doc=2541,freq=4.0), product of:
              0.18208735 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051997773 = queryNorm
              0.27358043 = fieldWeight in 2541, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
    
    Abstract
    The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.
    Date
    14. 8.2004 17:22:56
    Source
    Online. 28(2004) no.3, S.22-29
  3. Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.05
    0.048947945 = product of:
      0.09789589 = sum of:
        0.09789589 = sum of:
          0.048581023 = weight(_text_:search in 1361) [ClassicSimilarity], result of:
            0.048581023 = score(doc=1361,freq=2.0), product of:
              0.18072747 = queryWeight, product of:
                3.475677 = idf(docFreq=3718, maxDocs=44218)
                0.051997773 = queryNorm
              0.2688082 = fieldWeight in 1361, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.475677 = idf(docFreq=3718, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1361)
          0.049314864 = weight(_text_:22 in 1361) [ClassicSimilarity], result of:
            0.049314864 = score(doc=1361,freq=2.0), product of:
              0.18208735 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051997773 = queryNorm
              0.2708308 = fieldWeight in 1361, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1361)
      0.5 = coord(1/2)
    
    Abstract
    THESYS is based on the natural language processing of free-text databases. It yields statistically evaluated correlations between words of the database. These correlations correspond to traditional thesaurus relations. The person who has to build a thesaurus is thus assisted by the proposals made by THESYS. THESYS is being tested on commercial databases under real world conditions. It is part of a text processing project at Siemens, called TINA (Text-Inhalts-Analyse). Software from TINA is actually being applied and evaluated by the US Department of Commerce for patent search and indexing (REALIST: REtrieval Aids by Linguistics and STatistics)
    Date
    6. 1.1999 10:22:07
  4. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04
    0.041293126 = product of:
      0.08258625 = sum of:
        0.08258625 = product of:
          0.24775875 = sum of:
            0.24775875 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.24775875 = score(doc=862,freq=2.0), product of:
                0.4408377 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.051997773 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  5. Warner, A.J.: Natural language processing (1987) 0.03
    0.028179923 = product of:
      0.056359846 = sum of:
        0.056359846 = product of:
          0.11271969 = sum of:
            0.11271969 = weight(_text_:22 in 337) [ClassicSimilarity], result of:
              0.11271969 = score(doc=337,freq=2.0), product of:
                0.18208735 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051997773 = queryNorm
                0.61904186 = fieldWeight in 337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=337)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Annual review of information science and technology. 22(1987), S.79-108
  6. Prasad, A.R.D.; Kar, B.B.: Parsing Boolean search expression using definite clause grammars (1994) 0.03
    0.027760584 = product of:
      0.055521168 = sum of:
        0.055521168 = product of:
          0.111042336 = sum of:
            0.111042336 = weight(_text_:search in 8188) [ClassicSimilarity], result of:
              0.111042336 = score(doc=8188,freq=8.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.6144187 = fieldWeight in 8188, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.0625 = fieldNorm(doc=8188)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Briefly discusses the role of search languages in information retrieval and broadly groups the search languages into 4 categories. Explains the idea of definite clause grammars and demonstrates how parsers for Boolean logic-based search languages can easily be developed. Presents a partial Prolog code of the parser that was used in an object-oriented bibliographic database management system
  7. Zaitseva, E.M.: Developing linguistic tools of thematic search in library information systems (2023) 0.03
    0.026025547 = product of:
      0.052051093 = sum of:
        0.052051093 = product of:
          0.10410219 = sum of:
            0.10410219 = weight(_text_:search in 1187) [ClassicSimilarity], result of:
              0.10410219 = score(doc=1187,freq=18.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5760175 = fieldWeight in 1187, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1187)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Within the R&D program "Information support of research by scientists and specialists on the basis of RNPLS&T Open Archive - the system of scientific knowledge aggregation", the RNPLS&T analyzes the use of linguistic tools of thematic search in the modern library information systems and the prospects for their development. The author defines the key common characteristics of e-catalogs of the largest Russian libraries revealed at the first stage of the analysis. Based on the specified common characteristics and detailed comparison analysis, the author outlines and substantiates the vectors for enhancing search inter faces of e-catalogs. The focus is made on linguistic tools of thematic search in library information systems; the key vectors are suggested: use of thematic search at different search levels with the clear-cut level differentiation; use of combined functionality within thematic search system; implementation of classification search in all e-catalogs; hierarchical representation of classifications; use of the matching systems for classification information retrieval languages, and in the long term classification and verbal information retrieval languages, and various verbal information retrieval languages. The author formulates practical recommendations to improve thematic search in library information systems.
  8. McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02
    0.024657432 = product of:
      0.049314864 = sum of:
        0.049314864 = product of:
          0.09862973 = sum of:
            0.09862973 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
              0.09862973 = score(doc=3164,freq=2.0), product of:
                0.18208735 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5416616 = fieldWeight in 3164, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3164)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Computational linguistics. 22(1996) no.2, S.217-248
  9. Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02
    0.024657432 = product of:
      0.049314864 = sum of:
        0.049314864 = product of:
          0.09862973 = sum of:
            0.09862973 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
              0.09862973 = score(doc=4506,freq=2.0), product of:
                0.18208735 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5416616 = fieldWeight in 4506, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4506)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    8.10.2000 11:52:22
  10. Somers, H.: Example-based machine translation : Review article (1999) 0.02
    0.024657432 = product of:
      0.049314864 = sum of:
        0.049314864 = product of:
          0.09862973 = sum of:
            0.09862973 = weight(_text_:22 in 6672) [ClassicSimilarity], result of:
              0.09862973 = score(doc=6672,freq=2.0), product of:
                0.18208735 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5416616 = fieldWeight in 6672, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6672)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    31. 7.1996 9:22:19
  11. New tools for human translators (1997) 0.02
    0.024657432 = product of:
      0.049314864 = sum of:
        0.049314864 = product of:
          0.09862973 = sum of:
            0.09862973 = weight(_text_:22 in 1179) [ClassicSimilarity], result of:
              0.09862973 = score(doc=1179,freq=2.0), product of:
                0.18208735 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5416616 = fieldWeight in 1179, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1179)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    31. 7.1996 9:22:19
  12. Baayen, R.H.; Lieber, H.: Word frequency distributions and lexical semantics (1997) 0.02
    0.024657432 = product of:
      0.049314864 = sum of:
        0.049314864 = product of:
          0.09862973 = sum of:
            0.09862973 = weight(_text_:22 in 3117) [ClassicSimilarity], result of:
              0.09862973 = score(doc=3117,freq=2.0), product of:
                0.18208735 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5416616 = fieldWeight in 3117, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3117)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    28. 2.1999 10:48:22
  13. ¬Der Student aus dem Computer (2023) 0.02
    0.024657432 = product of:
      0.049314864 = sum of:
        0.049314864 = product of:
          0.09862973 = sum of:
            0.09862973 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
              0.09862973 = score(doc=1079,freq=2.0), product of:
                0.18208735 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5416616 = fieldWeight in 1079, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1079)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    27. 1.2023 16:22:55
  14. Krueger, S.: Getting more out of NEXIS (1996) 0.02
    0.024537122 = product of:
      0.049074244 = sum of:
        0.049074244 = product of:
          0.09814849 = sum of:
            0.09814849 = weight(_text_:search in 4512) [ClassicSimilarity], result of:
              0.09814849 = score(doc=4512,freq=4.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.54307455 = fieldWeight in 4512, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4512)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The MORE search command on the LEXIS/NEXIS online databses analyzes the words in a retrieved document, selects and creates a FREESTYLE search, and retrieves th 25 most relevant documents. Shows how MORE works and gives advice about when and when not to use it
  15. Griffith, C.: FREESTYLE: LEXIS-NEXIS goes natural (1994) 0.02
    0.024537122 = product of:
      0.049074244 = sum of:
        0.049074244 = product of:
          0.09814849 = sum of:
            0.09814849 = weight(_text_:search in 2512) [ClassicSimilarity], result of:
              0.09814849 = score(doc=2512,freq=4.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.54307455 = fieldWeight in 2512, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2512)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Describes FREESTYLE, the associative language search engine, developed by Mead Data Central for its LEXIS/NEXIS online service. The special feature of the associative language in FREESTYLE allows users to enter search descriptions in plain English
  16. Notess, G.R.: Up and coming search technologies (2000) 0.02
    0.024290511 = product of:
      0.048581023 = sum of:
        0.048581023 = product of:
          0.097162046 = sum of:
            0.097162046 = weight(_text_:search in 5467) [ClassicSimilarity], result of:
              0.097162046 = score(doc=5467,freq=2.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5376164 = fieldWeight in 5467, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.109375 = fieldNorm(doc=5467)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  17. Bedathur, S.; Narang, A.: Mind your language : effects of spoken query formulation on retrieval effectiveness (2013) 0.02
    0.024290511 = product of:
      0.048581023 = sum of:
        0.048581023 = product of:
          0.097162046 = sum of:
            0.097162046 = weight(_text_:search in 1150) [ClassicSimilarity], result of:
              0.097162046 = score(doc=1150,freq=8.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5376164 = fieldWeight in 1150, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1150)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Voice search is becoming a popular mode for interacting with search engines. As a result, research has gone into building better voice transcription engines, interfaces, and search engines that better handle inherent verbosity of queries. However, when one considers its use by non- native speakers of English, another aspect that becomes important is the formulation of the query by users. In this paper, we present the results of a preliminary study that we conducted with non-native English speakers who formulate queries for given retrieval tasks. Our results show that the current search engines are sensitive in their rankings to the query formulation, and thus highlights the need for developing more robust ranking methods.
  18. Hsinchun, C.: Knowledge-based document retrieval framework and design (1992) 0.02
    0.024041371 = product of:
      0.048082743 = sum of:
        0.048082743 = product of:
          0.096165486 = sum of:
            0.096165486 = weight(_text_:search in 6686) [ClassicSimilarity], result of:
              0.096165486 = score(doc=6686,freq=6.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5321022 = fieldWeight in 6686, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6686)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Presents research on the design of knowledge-based document retrieval systems in which a semantic network was adopted to represent subject knowledge and classification scheme knowledge and experts' search strategies and user modelling capability were modelled as procedural knowledge. These functionalities were incorporated into a prototype knowledge-based retrieval system, Metacat. Describes a system, the design of which was based on the blackboard architecture, which was able to create a user profile, identify task requirements, suggest heuristics-based search strategies, perform semantic-based search assistance, and assist online query refinement
  19. Robertson, S.E.; Sparck Jones, K.: Relevance weighting of search terms (1976) 0.02
    0.024041371 = product of:
      0.048082743 = sum of:
        0.048082743 = product of:
          0.096165486 = sum of:
            0.096165486 = weight(_text_:search in 71) [ClassicSimilarity], result of:
              0.096165486 = score(doc=71,freq=6.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5321022 = fieldWeight in 71, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.0625 = fieldNorm(doc=71)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Examines statistical techniques for exploiting relevance information to weight search terms. These techniques are presented as a natural extension of weighting methods using information about the distribution of index terms in documents in general. A series of relevance weighting functions is derived and is justified by theoretical considerations. In particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval. Different applications of relevance weighting are illustrated by experimental results for test collections
  20. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.02
    0.024041371 = product of:
      0.048082743 = sum of:
        0.048082743 = product of:
          0.096165486 = sum of:
            0.096165486 = weight(_text_:search in 1264) [ClassicSimilarity], result of:
              0.096165486 = score(doc=1264,freq=24.0), product of:
                0.18072747 = queryWeight, product of:
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.051997773 = queryNorm
                0.5321022 = fieldWeight in 1264, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  3.475677 = idf(docFreq=3718, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1264)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).

Years

Languages

  • e 98
  • d 20
  • el 1
  • m 1
  • More… Less…

Types

  • a 97
  • el 14
  • m 7
  • s 5
  • p 3
  • x 3
  • d 1
  • r 1
  • More… Less…