Search (151 results, page 2 of 8)

  • × theme_ss:"Suchmaschinen"
  • × type_ss:"a"
  1. Bager, J.: Weniger ist mehr : Internet-Suchmaschinen richtig einsetzen (1998) 0.02
    0.020916866 = product of:
      0.041833732 = sum of:
        0.041833732 = product of:
          0.083667465 = sum of:
            0.083667465 = weight(_text_:22 in 1489) [ClassicSimilarity], result of:
              0.083667465 = score(doc=1489,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.46428138 = fieldWeight in 1489, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1489)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    29.12.1998 11:22:00
  2. Lob, S.: Per Mausklick auf die neusten Nachrichten : Internet-Suchmaschinen liefern Presse-Überblicke und stellen persönliche Zeitungen zusammen (1998) 0.02
    0.020916866 = product of:
      0.041833732 = sum of:
        0.041833732 = product of:
          0.083667465 = sum of:
            0.083667465 = weight(_text_:22 in 1622) [ClassicSimilarity], result of:
              0.083667465 = score(doc=1622,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.46428138 = fieldWeight in 1622, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1622)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    29.12.1998 11:22:25
  3. Hannemann, M.: Online ins Schlaraffenland der Wissenschaft : Literatur-Recherche im Internet ist ein teures Unterfangen ohne Erfolgsgarantie - Doch wer systematisch sucht, gelangt zügig ans Ziel (1999) 0.02
    0.020916866 = product of:
      0.041833732 = sum of:
        0.041833732 = product of:
          0.083667465 = sum of:
            0.083667465 = weight(_text_:22 in 3106) [ClassicSimilarity], result of:
              0.083667465 = score(doc=3106,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.46428138 = fieldWeight in 3106, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3106)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    3. 5.1997 8:44:22
  4. Ozcan, R.; Altingovde, I.S.; Ulusoy, O.: Exploiting navigational queries for result presentation and caching in Web search engines (2011) 0.02
    0.01993374 = product of:
      0.03986748 = sum of:
        0.03986748 = product of:
          0.07973496 = sum of:
            0.07973496 = weight(_text_:network in 4364) [ClassicSimilarity], result of:
              0.07973496 = score(doc=4364,freq=4.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.34791988 = fieldWeight in 4364, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4364)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Caching of query results is an important mechanism for efficiency and scalability of web search engines. Query results are cached and presented in terms of pages, which typically include 10 results each. In navigational queries, users seek a particular website, which would be typically listed at the top ranks (maybe, first or second) by the search engine, if found. For this type of query, caching and presenting results in the 10-per-page manner may waste cache space and network bandwidth. In this article, we propose nonuniform result page models with varying numbers of results for navigational queries. The experimental results show that our approach reduces the cache miss count by up to 9.17% (because of better utilization of cache space). Furthermore, bandwidth usage, which is measured in terms of number of snippets sent, is also reduced by 71% for navigational queries. This means a considerable reduction in the number of transmitted network packets, i.e., a crucial gain especially for mobile-search scenarios. A user study reveals that users easily adapt to the proposed result page model and that the efficiency gains observed in the experiments can be carried over to real-life situations.
  5. Gibson, P.: HotBot's future is in Lycos' hands : users hope that the search engine won't be hobbled by an acquisition (1999) 0.02
    0.019733394 = product of:
      0.039466787 = sum of:
        0.039466787 = product of:
          0.078933574 = sum of:
            0.078933574 = weight(_text_:network in 5195) [ClassicSimilarity], result of:
              0.078933574 = score(doc=5195,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.3444231 = fieldWeight in 5195, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5195)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Presents an overview of Wired Digital Inc.'s HotBot search engine, and ponders the future of the product, now that the company is being acquired by Lycos. Reviews the business strategy that drove Wired Digital to seek acquisition by a company capable of providing needed financial backing, technology infrastructure, and product development and marketing muscle. Lycos was interested in the property as part of its 'best-of-breed' acquisition plan for building out the new Lycos Network. Explores the likely scenarios for the HotBot product going forward under the Lycos brand, and expresses hope that Lycos will have the foresight to keep the attractive and sophisticated search engine well funded and developed
  6. Hock, R.E.: How to do field searching in Web search engines : a field trip (1998) 0.02
    0.019720612 = product of:
      0.039441224 = sum of:
        0.039441224 = product of:
          0.07888245 = sum of:
            0.07888245 = weight(_text_:22 in 3601) [ClassicSimilarity], result of:
              0.07888245 = score(doc=3601,freq=4.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.4377287 = fieldWeight in 3601, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3601)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Online. 22(1998) no.3, S.18-22
  7. Loeper, D. von: Sherlock Holmes im Netz (1997) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 6566) [ClassicSimilarity], result of:
              0.06972289 = score(doc=6566,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 6566, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=6566)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 2.1997 19:50:29
  8. Hüskes, R.; Kleber, D.: ¬Den Server im Griff (1999) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 4008) [ClassicSimilarity], result of:
              0.06972289 = score(doc=4008,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 4008, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4008)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 8.1999 21:21:10
  9. Price, A.: Five new Danish subject gateways under development (2000) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 4878) [ClassicSimilarity], result of:
              0.06972289 = score(doc=4878,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 4878, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4878)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 6.2002 19:41:31
  10. Eggeling, T.; Kroschel, A.: Alles finden im Web (2000) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 4884) [ClassicSimilarity], result of:
              0.06972289 = score(doc=4884,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 4884, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4884)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    9. 7.2000 14:06:22
  11. Poulakos, I.: ¬"Die Leute suchen immer dasselbe" (2001) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 5541) [ClassicSimilarity], result of:
              0.06972289 = score(doc=5541,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 5541, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=5541)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    18. 1.1997 12:15:22
  12. Sauer, D.: Alles schneller finden (2001) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 6835) [ClassicSimilarity], result of:
              0.06972289 = score(doc=6835,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 6835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=6835)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    11.11.2001 17:25:22
  13. Breyer, K.: Kommerz statt Information (2002) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 568) [ClassicSimilarity], result of:
              0.06972289 = score(doc=568,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 568, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=568)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    9. 5.2002 21:21:22
  14. Iwazume, M.; Takeda, H.; Nishida, T.: Ontoloy-based information capturing from the Internet (1996) 0.02
    0.016914338 = product of:
      0.033828676 = sum of:
        0.033828676 = product of:
          0.06765735 = sum of:
            0.06765735 = weight(_text_:network in 5185) [ClassicSimilarity], result of:
              0.06765735 = score(doc=5185,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.29521978 = fieldWeight in 5185, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5185)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we present a system called IICA (Intelligent Information Collector and Analyzer) which gathers, classifies, and reorganizes information from the Internet. Ontology plays an important role in IICA. It specifies the common background knowledge shared by the user and IICA, allows IICA to make inexact match between the user's request and the candidates, and assigns user-oriented categories. IICA extracts information using a state transition network grammar and concept frames. We have implemented and evaluated IICA. The results shows the feasibility and robustness of the approach
  15. Poynder, R.: Web research engines? (1996) 0.02
    0.016914338 = product of:
      0.033828676 = sum of:
        0.033828676 = product of:
          0.06765735 = sum of:
            0.06765735 = weight(_text_:network in 5698) [ClassicSimilarity], result of:
              0.06765735 = score(doc=5698,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.29521978 = fieldWeight in 5698, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5698)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Describes the shortcomings of search engines for the WWW comparing their current capabilities to those of the first generation CD-ROM products. Some allow phrase searching and most are improving their Boolean searching. Few allow truncation, wild cards or nested logic. They are stateless, losing previous search criteria. Unlike the indexing and classification systems for today's CD-ROMs, those for Web pages are random, unstructured and of variable quality. Considers that at best Web search engines can only offer free text searching. Discusses whether automatic data classification systems such as Infoseek Ultra can overcome the haphazard nature of the Web with neural network technology, and whether Boolean search techniques may be redundant when replaced by technology such as the Euroferret search engine. However, artificial intelligence is rarely successful on huge, varied databases. Relevance ranking and automatic query expansion still use the same simple inverted indexes. Most Web search engines do nothing more than word counting. Further complications arise with foreign languages
  16. Ke, W.: Decentralized search and the clustering paradox in large scale information networks (2012) 0.02
    0.016914338 = product of:
      0.033828676 = sum of:
        0.033828676 = product of:
          0.06765735 = sum of:
            0.06765735 = weight(_text_:network in 94) [ClassicSimilarity], result of:
              0.06765735 = score(doc=94,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.29521978 = fieldWeight in 94, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=94)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Amid the rapid growth of information today is the increasing challenge for people to navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web raise important questions about information retrieval in these environments. Collection of all information in advance and centralization of IR operations are extremely difficult, if not impossible, because systems are dynamic and information is distributed. The chapter discusses some of the key issues facing classic information retrieval models and presents a decentralized, organic view of information systems pertaining to search in large scale networks. It focuses on the impact of network structure on search performance and discusses a phenomenon we refer to as the Clustering Paradox, in which the topology of interconnected systems imposes a scalability limit.
  17. Shapira, B.; Zabar, B.: Personalized search : integrating collaboration and social networks (2011) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 4140) [ClassicSimilarity], result of:
              0.05638113 = score(doc=4140,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 4140, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4140)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Despite improvements in their capabilities, search engines still fail to provide users with only relevant results. One reason is that most search engines implement a "one size fits all" approach that ignores personal preferences when retrieving the results of a user's query. Recent studies (Smyth, 2010) have elaborated the importance of personalizing search results and have proposed integrating recommender system methods for enhancing results using contextual and extrinsic information that might indicate the user's actual needs. In this article, we review recommender system methods used for personalizing and improving search results and examine the effect of two such methods that are merged for this purpose. One method is based on collaborative users' knowledge; the second integrates information from the user's social network. We propose new methods for collaborative-and social-based search and demonstrate that each of these methods, when separately applied, produce more accurate search results than does a purely keyword-based search engine (referred to as "standard search engine"), where the social search engine is more accurate than is the collaborative one. However, separately applied, these methods do not produce a sufficient number of results (low coverage). Nevertheless, merging these methods with those implemented by standard search engines overcomes the low-coverage problem and produces personalized results for users that display significantly more accurate results while also providing sufficient coverage than do standard search engines. The improvement, however, is significant only for topics for which the diversity of terms used for queries among users is low.
  18. Cheng, S.; YunTao, P.; JunPeng, Y.; Hong, G.; ZhengLu, Y.; ZhiYu, H.: PageRank, HITS and impact factor for journal ranking (2009) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 2513) [ClassicSimilarity], result of:
              0.05638113 = score(doc=2513,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 2513, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2513)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Journal citation measures are one of the most widely used bibliometric tools. The most well-known measure is the ISI Impact Factor, under the standard definition, the impact factor of journal j in a given year is the average number of citations received by papers published in the previous two years of journal j. However, the impact factor has its "intrinsic" limitations, it is a ranking measure based fundamentally on a pure counting of the in-degrees of nodes in the network, and its calculation does not take into account the "impact" or "prestige" of the journals in which the citations appear. Google's PageRank algorithm and Kleinberg's HITS method are webpage ranking algorithm, they compute the scores of webpages based on a combination of the number of hyperlinks that point to the page and the status of pages that the hyperlinks originate from, a page is important if it is pointed to by other important pages. We demonstrate how popular webpage algorithm PageRank and HITS can be used ranking journal, and we compared ISI impact factor, PageRank and HITS for journal ranking, and with PageRank and HITS compute respectively including self-citation and non self-citation, and discussed the merit and shortcomings and the scope of application that the various algorithms are used to rank journal.
  19. Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 2688) [ClassicSimilarity], result of:
              0.05638113 = score(doc=2688,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 2688, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2688)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.
  20. Roy, R.S.; Agarwal, S.; Ganguly, N.; Choudhury, M.: Syntactic complexity of Web search queries through the lenses of language models, networks and users (2016) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 3188) [ClassicSimilarity], result of:
              0.05638113 = score(doc=3188,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 3188, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3188)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.

Languages

  • e 76
  • d 73
  • f 1
  • nl 1
  • More… Less…