Search (2218 results, page 1 of 111)

  • × language_ss:"e"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.29
    0.28942767 = product of:
      0.48237944 = sum of:
        0.04544258 = product of:
          0.13632774 = sum of:
            0.13632774 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.13632774 = score(doc=562,freq=2.0), product of:
                0.24256827 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.028611459 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.13632774 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.13632774 = score(doc=562,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.13632774 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.13632774 = score(doc=562,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.020200694 = weight(_text_:web in 562) [ClassicSimilarity], result of:
          0.020200694 = score(doc=562,freq=2.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.21634221 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.13632774 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.13632774 = score(doc=562,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.0077529154 = product of:
          0.023258746 = sum of:
            0.023258746 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.023258746 = score(doc=562,freq=2.0), product of:
                0.10019246 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.028611459 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
      0.6 = coord(6/10)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Mas, S.; Marleau, Y.: Proposition of a faceted classification model to support corporate information organization and digital records management (2009) 0.23
    0.23112455 = product of:
      0.4622491 = sum of:
        0.04544258 = product of:
          0.13632774 = sum of:
            0.13632774 = weight(_text_:3a in 2918) [ClassicSimilarity], result of:
              0.13632774 = score(doc=2918,freq=2.0), product of:
                0.24256827 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.028611459 = queryNorm
                0.56201804 = fieldWeight in 2918, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2918)
          0.33333334 = coord(1/3)
        0.13632774 = weight(_text_:2f in 2918) [ClassicSimilarity], result of:
          0.13632774 = score(doc=2918,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.56201804 = fieldWeight in 2918, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=2918)
        0.13632774 = weight(_text_:2f in 2918) [ClassicSimilarity], result of:
          0.13632774 = score(doc=2918,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.56201804 = fieldWeight in 2918, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=2918)
        0.13632774 = weight(_text_:2f in 2918) [ClassicSimilarity], result of:
          0.13632774 = score(doc=2918,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.56201804 = fieldWeight in 2918, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=2918)
        0.007823291 = product of:
          0.023469873 = sum of:
            0.023469873 = weight(_text_:29 in 2918) [ClassicSimilarity], result of:
              0.023469873 = score(doc=2918,freq=2.0), product of:
                0.10064617 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.028611459 = queryNorm
                0.23319192 = fieldWeight in 2918, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2918)
          0.33333334 = coord(1/3)
      0.5 = coord(5/10)
    
    Date
    29. 8.2009 21:15:48
    Footnote
    Vgl.: http://ieeexplore.ieee.org/Xplore/login.jsp?reload=true&url=http%3A%2F%2Fieeexplore.ieee.org%2Fiel5%2F4755313%2F4755314%2F04755480.pdf%3Farnumber%3D4755480&authDecision=-203.
  3. Vetere, G.; Lenzerini, M.: Models for semantic interoperability in service-oriented architectures (2005) 0.21
    0.21206538 = product of:
      0.53016347 = sum of:
        0.05301635 = product of:
          0.15904905 = sum of:
            0.15904905 = weight(_text_:3a in 306) [ClassicSimilarity], result of:
              0.15904905 = score(doc=306,freq=2.0), product of:
                0.24256827 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.028611459 = queryNorm
                0.65568775 = fieldWeight in 306, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=306)
          0.33333334 = coord(1/3)
        0.15904905 = weight(_text_:2f in 306) [ClassicSimilarity], result of:
          0.15904905 = score(doc=306,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.65568775 = fieldWeight in 306, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0546875 = fieldNorm(doc=306)
        0.15904905 = weight(_text_:2f in 306) [ClassicSimilarity], result of:
          0.15904905 = score(doc=306,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.65568775 = fieldWeight in 306, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0546875 = fieldNorm(doc=306)
        0.15904905 = weight(_text_:2f in 306) [ClassicSimilarity], result of:
          0.15904905 = score(doc=306,freq=2.0), product of:
            0.24256827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.028611459 = queryNorm
            0.65568775 = fieldWeight in 306, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0546875 = fieldNorm(doc=306)
      0.4 = coord(4/10)
    
    Content
    Vgl.: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5386707&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5386707.
  4. Hochheiser, H.; Shneiderman, B.: Using interactive visualizations of WWW log data to characterize access patterns and inform site design (2001) 0.06
    0.059582274 = product of:
      0.19860758 = sum of:
        0.03498863 = weight(_text_:web in 5765) [ClassicSimilarity], result of:
          0.03498863 = score(doc=5765,freq=6.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.37471575 = fieldWeight in 5765, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=5765)
        0.15579566 = weight(_text_:log in 5765) [ClassicSimilarity], result of:
          0.15579566 = score(doc=5765,freq=8.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.84967107 = fieldWeight in 5765, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.046875 = fieldNorm(doc=5765)
        0.007823291 = product of:
          0.023469873 = sum of:
            0.023469873 = weight(_text_:29 in 5765) [ClassicSimilarity], result of:
              0.023469873 = score(doc=5765,freq=2.0), product of:
                0.10064617 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.028611459 = queryNorm
                0.23319192 = fieldWeight in 5765, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5765)
          0.33333334 = coord(1/3)
      0.3 = coord(3/10)
    
    Abstract
    HTTP server log files provide Web site operators with substantial detail regarding the visitors to their sites. Interest in interpreting this data has spawned an active market for software packages that summarize and analyze this data, providing histograms, pie graphs, and other charts summarizing usage patterns. Although useful, these summaries obscure useful information and restrict users to passive interpretation of static displays. Interactive visualizations can be used to provide users with greater abilities to interpret and explore Web log data. By combining two-dimensional displays of thousands of individual access requests, color, and size coding for additional attributes, and facilities for zooming and filtering, these visualizations provide capabilities for examining data that exceed those of traditional Web log analysis tools. We introduce a series of interactive visualizations that can be used to explore server data across various dimensions. Possible uses of these visualizations are discussed, and difficulties of data collection, presentation, and interpretation are explored
    Date
    29. 9.2001 14:00:46
  5. Jansen, B.J.; Spink, A.; Pedersen, J.: ¬A temporal comparison of AItaVista Web searching (2005) 0.04
    0.04056076 = product of:
      0.13520253 = sum of:
        0.0494814 = weight(_text_:web in 3454) [ClassicSimilarity], result of:
          0.0494814 = score(doc=3454,freq=12.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.5299281 = fieldWeight in 3454, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=3454)
        0.07789783 = weight(_text_:log in 3454) [ClassicSimilarity], result of:
          0.07789783 = score(doc=3454,freq=2.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.42483553 = fieldWeight in 3454, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.046875 = fieldNorm(doc=3454)
        0.007823291 = product of:
          0.023469873 = sum of:
            0.023469873 = weight(_text_:29 in 3454) [ClassicSimilarity], result of:
              0.023469873 = score(doc=3454,freq=2.0), product of:
                0.10064617 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.028611459 = queryNorm
                0.23319192 = fieldWeight in 3454, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3454)
          0.33333334 = coord(1/3)
      0.3 = coord(3/10)
    
    Abstract
    Major Web search engines, such as AItaVista, are essential tools in the quest to locate online information. This article reports research that used transaction log analysis to examine the characteristics and changes in AItaVista Web searching that occurred from 1998 to 2002. The research questions we examined are (1) What are the changes in AItaVista Web searching from 1998 to 2002? (2) What are the current characteristics of AItaVista searching, including the duration and frequency of search sessions? (3) What changes in the information needs of AItaVista users occurred between 1998 and 2002? The results of our research show (1) a move toward more interactivity with increases in session and query length, (2) with 70% of session durations at 5 minutes or less, the frequency of interaction is increasing, but it is happening very quickly, and (3) a broadening range of Web searchers' information needs, with the most frequent terms accounting for less than 1% of total term usage. We discuss the implications of these findings for the development of Web search engines.
    Date
    3. 6.2005 19:29:59
  6. Chen, Z.; Wenyin, L.; Zhang, F.; Li, M.; Zhang, H.: Web mining for Web image retrieval (2001) 0.03
    0.034791782 = product of:
      0.11597261 = sum of:
        0.044538345 = weight(_text_:web in 6521) [ClassicSimilarity], result of:
          0.044538345 = score(doc=6521,freq=14.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.47698978 = fieldWeight in 6521, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6521)
        0.06491486 = weight(_text_:log in 6521) [ClassicSimilarity], result of:
          0.06491486 = score(doc=6521,freq=2.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.3540296 = fieldWeight in 6521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6521)
        0.00651941 = product of:
          0.019558229 = sum of:
            0.019558229 = weight(_text_:29 in 6521) [ClassicSimilarity], result of:
              0.019558229 = score(doc=6521,freq=2.0), product of:
                0.10064617 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.028611459 = queryNorm
                0.19432661 = fieldWeight in 6521, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6521)
          0.33333334 = coord(1/3)
      0.3 = coord(3/10)
    
    Abstract
    The popularity of digital images is rapidly increasing due to improving digital imaging technologies and convenient availability facilitated by the Internet. However, how to find user-intended images from the Internet is nontrivial. The main reason is that the Web images are usually not annotated using semantic descriptors. In this article, we present an effective approach to and a prototype system for image retrieval from the Internet using Web mining. The system can also serve as a Web image search engine. One of the key ideas in the approach is to extract the text information on the Web pages to semantically describe the images. The text description is then combined with other low-level image features in the image similarity assessment. Another main contribution of this work is that we apply data mining on the log of users' feedback to improve image retrieval performance in three aspects. First, the accuracy of the document space model of image representation obtained from the Web pages is improved by removing clutter and irrelevant text information. Second, to construct the user space model of users' representation of images, which is then combined with the document space model to eliminate mismatch between the page author's expression and the user's understanding and expectation. Third, to discover the relationship between low-level and high-level features, which is extremely useful for assigning the low-level features' weights in similarity assessment
    Date
    29. 9.2001 17:32:09
  7. Slone, D.J.: ¬The influence of mental models and goals on search patterns during Web interaction (2002) 0.03
    0.033800628 = product of:
      0.11266876 = sum of:
        0.041234493 = weight(_text_:web in 5229) [ClassicSimilarity], result of:
          0.041234493 = score(doc=5229,freq=12.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.4416067 = fieldWeight in 5229, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5229)
        0.06491486 = weight(_text_:log in 5229) [ClassicSimilarity], result of:
          0.06491486 = score(doc=5229,freq=2.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.3540296 = fieldWeight in 5229, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5229)
        0.00651941 = product of:
          0.019558229 = sum of:
            0.019558229 = weight(_text_:29 in 5229) [ClassicSimilarity], result of:
              0.019558229 = score(doc=5229,freq=2.0), product of:
                0.10064617 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.028611459 = queryNorm
                0.19432661 = fieldWeight in 5229, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5229)
          0.33333334 = coord(1/3)
      0.3 = coord(3/10)
    
    Abstract
    Thirty-one patrons, who were selected by Slone to provide a range of age and experience, agreed when approached while using the catalog of the Wake County library system to try searching via the Internet. Fifteen searched the Wake County online catalog in this manner and 16 searched the World Wide Web, including that catalog. They were subjected to brief pre-structured taped interviews before and after their searches and observed during the searching process resulting in a log of behaviors, comments, pages accessed, and time spent. Data were analyzed across participants and categories. Web searches were characterized as linking, URL, search engine, within a site domain, and searching a web catalog; and participants by the number of these techniques used. Four used only one, 13 used two, 11 used three, two used four, and one all five. Participant experience was characterized as never used, used search engines, browsing experience, email experience, URL experience, catalog experience, and finally chat room/newsgroup experience. Sixteen percent of the participants had never used the Internet, 71% had used search engines, 65% had browsed, 58% had used email, 39% had used URLs, 39% had used online catalogs, and 32% had used chat rooms. The catalog was normally consulted before the web, where both were used, and experience with an online catalog assists in web use. Scrolling was found to be unpopular and practiced halfheartedly.
    Date
    21. 7.2006 11:26:29
  8. Zuccala, A.; Thelwall, M.; Oppenheim, C.; Dhiensa, R.: Web intelligence analyses of digital libraries : a case study of the National electronic Library for Health (NeLH) (2007) 0.03
    0.03255495 = product of:
      0.16277474 = sum of:
        0.04665151 = weight(_text_:web in 838) [ClassicSimilarity], result of:
          0.04665151 = score(doc=838,freq=24.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.49962097 = fieldWeight in 838, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
        0.11612323 = weight(_text_:log in 838) [ClassicSimilarity], result of:
          0.11612323 = score(doc=838,freq=10.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.6333074 = fieldWeight in 838, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.03125 = fieldNorm(doc=838)
      0.2 = coord(2/10)
    
    Abstract
    Purpose - The purpose of this paper is to explore the use of LexiURL as a Web intelligence tool for collecting and analysing links to digital libraries, focusing specifically on the National electronic Library for Health (NeLH). Design/methodology/approach - The Web intelligence techniques in this study are a combination of link analysis (web structure mining), web server log file analysis (web usage mining), and text analysis (web content mining), utilizing the power of commercial search engines and drawing upon the information science fields of bibliometrics and webometrics. LexiURL is a computer program designed to calculate summary statistics for lists of links or URLs. Its output is a series of standard reports, for example listing and counting all of the different domain names in the data. Findings - Link data, when analysed together with user transaction log files (i.e. Web referring domains) can provide insights into who is using a digital library and when, and who could be using the digital library if they are "surfing" a particular part of the Web; in this case any site that is linked to or colinked with the NeLH. This study found that the NeLH was embedded in a multifaceted Web context, including many governmental, educational, commercial and organisational sites, with the most interesting being sites from the.edu domain, representing American Universities. Not many links directed to the NeLH were followed on September 25, 2005 (the date of the log file analysis and link extraction analysis), which means that users who access the digital library have been arriving at the site via only a few select links, bookmarks and search engine searches, or non-electronic sources. Originality/value - A number of studies concerning digital library users have been carried out using log file analysis as a research tool. Log files focus on real-time user transactions; while LexiURL can be used to extract links and colinks associated with a digital library's growing Web network. This Web network is not recognized often enough, and can be a useful indication of where potential users are surfing, even if they have not yet specifically visited the NeLH site.
  9. Deussen, N.: Sogar der Mars könnte bald eine virutelle Heimat bekommen : Gut 4,2 Milliarden sind nicht genug: Die sechste Version des Internet-Protokolls schafft viele zusätzliche Online-Adressen (2001) 0.03
    0.030893732 = product of:
      0.07723433 = sum of:
        0.02087996 = weight(_text_:kommunikation in 5729) [ClassicSimilarity], result of:
          0.02087996 = score(doc=5729,freq=2.0), product of:
            0.14706601 = queryWeight, product of:
              5.140109 = idf(docFreq=703, maxDocs=44218)
              0.028611459 = queryNorm
            0.14197679 = fieldWeight in 5729, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.140109 = idf(docFreq=703, maxDocs=44218)
              0.01953125 = fieldNorm(doc=5729)
        0.0119033735 = weight(_text_:web in 5729) [ClassicSimilarity], result of:
          0.0119033735 = score(doc=5729,freq=4.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.12748088 = fieldWeight in 5729, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.01953125 = fieldNorm(doc=5729)
        0.04119129 = weight(_text_:schutz in 5729) [ClassicSimilarity], result of:
          0.04119129 = score(doc=5729,freq=2.0), product of:
            0.20656188 = queryWeight, product of:
              7.2195506 = idf(docFreq=87, maxDocs=44218)
              0.028611459 = queryNorm
            0.1994138 = fieldWeight in 5729, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              7.2195506 = idf(docFreq=87, maxDocs=44218)
              0.01953125 = fieldNorm(doc=5729)
        0.003259705 = product of:
          0.009779114 = sum of:
            0.009779114 = weight(_text_:29 in 5729) [ClassicSimilarity], result of:
              0.009779114 = score(doc=5729,freq=2.0), product of:
                0.10064617 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.028611459 = queryNorm
                0.097163305 = fieldWeight in 5729, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.01953125 = fieldNorm(doc=5729)
          0.33333334 = coord(1/3)
      0.4 = coord(4/10)
    
    Abstract
    In der Virtualität wird's eng. Die Möglichkeiten des Scheinbaren sind anscheinend ausgereizt. Es mangelt bald an InternetAdressen. Wenn WhirIpools und Wasclunaschinen ihren eigenen Zugang zum Internet brauchen, wird der Vorrat an Kennzahlen knapp. Um dem drohenden Mangel zu begegnen, wird seit Jahren an einer überarbeiteten Fassung des Internet-Protokolls (IP) gebastelt. Doch die Neuauflage hat bis auf ein paar Testläufe - bisher ihren Weg ins Netz noch nicht gefunden. Für Aufregung sorgte sie dennoch bereits: wegen Datenschutzproblemen. Für die Kommunikation zwischen Computern im Internet gibt es eine Art Knigge. Die protokollarische Vorschrift legt fest; wie die Rechner Daten untereinander austauschen. Doch zuvor brauchen die Maschinen Namen (wie www.fr-aktuell.de) und Anschriften (hier: 194.175.173.20), damit sie sich einander vorstellen (Shake Hands) und später Daten schicken können. Vergeben werden die Bezeichnungen von der Internet Corporation for Assigned Names and Numbers Icann). Den ersten Vorschlag für eine einheitliche Übergaberegelung machten Bob Kahn und Vint Cerf im Jahr 1974. Damals versuchten im inzwischen legendären, militärisch genutzten Arpanet kaum tausend Großrechner an etwa 250 Standorten miteinander zu kommunizieren. Um Ordnung in das Sprachengewirr der verschiedenen Bautypen zu bringen, mussten Regeln her. Die Idee entwickelte sich zum Protokoll, das nach Informatik-Manier mit dem Kürzel TCP/IP belegt wurde. Mit etwa 100000 angeschlossenen Computern wurde das Netz 1983 zivil - und TCP/IP zum offiziellen Standard. Derzeit regelt die vierte Version des Internet-Protokolls (IPv4) den Bit-Transport. Die Adresse wird jedem Datenpaket vorangestellt. Sie besteht aus Ziffern und ist exakt 32 Bit lang. Daraus ergeben sich mehr als 4,2 Milliarden Zahlenkombinationen. Genug für einen Globus, auf dem erst kürzlich der sechsmilliardste Erdenbürger das Licht der realen Welt erblickte - dachten die Computer-Operateure damals. Dann kam das World Wide Web.
    Der Geniestreich aus dem Europäischen Labor für Teilchenphysik (Cern) in Genf machte aus dem Wissenschaftsnetz ein Massenmedium. Zudem erfuhr die elektronische Post einen Aufschwung. Das Wachstum der Netze sprengt alle Erwartungen", resümiert Klaus Birkenbihl vom InformatikForschungszentrum GMI). Jede Web-Site, jede E-Mail-Box, jeder Computer, der per Standleitung online ist, braucht eine eindeutige Identifizierung. Die Schätzungen, wie viele IPv4-Adressen noch frei sind, schwanken zwischen 40 und zehn Prozent. Der Verbrauch jedenfalls steigt rasant: Die Anzahl der WebSites steuert derzeit auf eine Milliarde zu, weit mehr Netznummern gehen bereits für E-Mail-Anschriften drauf. Den Adressraum weiter ausschöpfen werden demnächst die intelligenten Haushaltsgeräte. Der Laden an der Ecke will wissen, welcher Kühlschrank die Milch bestellt hat, die Videozentrale braucht für das Überspielen des Films die Kennung des PC-Recorders, der Computer des Installateurs benötigt die IP-Anschrift der Heizungsanlage für die Fernwartung. Handys, die später Nachrichten übers Internet schicken, und Internet-Telefonie gehen möglicherweise leer aus. Doch bevor Internet-Adressen zur heiß begehrten Schieberware werden, soll ein neues Adresssystern mit mehr Möglichkeiten her. Schon 1990 hatte sich die Internet Engineering Task Force (IETF) Gedanken über einen neues Internet-Protokoll mit einem größeren Adressangebot gemacht. Im IETF kümmern sich Forscher, Soft- und HardwareIngenieure um die fortlaufende Verbesserung von Architektur und Arbeit des Netz werks. Eine ihrer Arbeitsgruppen prognostizierte, der IPv4-Vorrat gehe 2005 zu Ende. Fünf Jahre dauerte es, dann waren sich alle Internet-Gremien einig: Eine neue Protokollversion, IPv6, muss her. Dann passierte weiter nichts. Endlich verkündete 1999 Josh Elliot von der Icann, ab sofort würden neue Anschriften verteilt. Ein historischer Moment", freute er sich.
    Der neue 128-Bit-Header treibt die Möglichkeiten ins Astronomische: 3,4 mal zehn hoch 38 Adressen, eine 3,4 mit 38 Nullen. -Das IPv6-Forum zerhackte den Zahlentrumm in anschauliche Stücke: Pro Quadratmillimeter Erdoberfläche stehen nun zirka 667 Billiarden, pro Mensch 6,5 mal zehn hoch 28 Adressen, bereit." Eine Billiarde bringt es immerhin auf respektable 15 Nullen. Schon kurz darauf ging ein Aufschrei durch die Netzgemeinde. Das neue Protokoll schrieb die weltweit eindeutigen Seriennummern bestimmter Netzwerkkarten auf den virtuellen Adressaufkleber. Die Ethernet-Adapter bewerkstelligen den Datentransport bei Computern, die über eine Standleitung, ein Koaxialkabel, dauernd online sind. Die Spur von Ethernet-Usern wäre damit leicht zu verfolgen gewesen, ihre Nutzerprofile, ihre Surfgewohnheiten einsehbar wie offene Bücher. Das Problem, ließ Icann nun wissen, sei behoben: Es gebe keine festen Kennzahlen mehr in den Adressköpfen. Bei jedem Hochfahren eines Rechners oder sogar noch öfter werden die Nummern neu durchgemischt", erläutert Hans Petter Dittler, stellvertretender Vorsitzender der deutschen Sektion der Internet Society. Das Betriebssystem Linux kann bereits mit dem IPv6 arbeiten. Microsoft will den Standard in das nächste Windows-Betriebssystem einbauen: "Wir denken, der vorgeschlagene Standard ist wichtig zum Schutz der Privatsphäre der Internet-Nutzer", sagt Jawad Khaki, Vizepräsident für Netzwerke. Seit einigen Tagen steht auf der Microsoft-Homepage eine Vorab-Version von lPv6 für Windows 2000 zum Herunterladen bereit. Geradezu euphorisch gibt sich Protokoll-Chef Vint Cerf. Mit IPv6 haben wir die Grundlage dafür", philosophierte der Internet-Daddy auf dem ersten lPv6-Kongress 1999 in Berlin, "das Internet von unserem Planeten über den Mars und die Asteroiden bis in den Weltraum hinaus auszudehnen." Doch im Internet-Alltag wird das alte Protokoll noch lange Vorrang haben. Grund sind handfeste Programmier-Probleme. Denn Software, die sich explizit auf die vierte IP-Version bezieht, muss umgeschrieben werden - etwa um mit den längeren Adressfeldern umgehen zu können. Hubert Martens vom Münchner Multinet Services befürchtet gar einen InternetCrash: "Das Jahr-2000-Problem war harmlos gegen das, was uns mit lPv6 droht."
    Source
    Frankfurter Rundschau. Nr.79 vom 3.4.2001, S.29
  10. Cothey, V.: ¬A longitudinal study of World Wide Web users' information-searching behavior (2002) 0.03
    0.0306469 = product of:
      0.1532345 = sum of:
        0.062353685 = weight(_text_:web in 245) [ClassicSimilarity], result of:
          0.062353685 = score(doc=245,freq=14.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.6677857 = fieldWeight in 245, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0546875 = fieldNorm(doc=245)
        0.09088081 = weight(_text_:log in 245) [ClassicSimilarity], result of:
          0.09088081 = score(doc=245,freq=2.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.49564147 = fieldWeight in 245, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0546875 = fieldNorm(doc=245)
      0.2 = coord(2/10)
    
    Abstract
    A study of the "real world" Web information searching behavior of 206 college students over a 10-month period showed that, contrary to expectations, the users adopted a more passive or browsing approach to Web information searching and became more eclectic in their selection of Web hosts as they gained experience. The study used a longitudinal transaction log analysis of the URLs accessed during 5,431 user days of Web information searching to detect changes in information searching behavior associated with increased experience of using the Web. The findings have implications for the design of future Web information retrieval tools
  11. Koch, T.; Golub, K.; Ardö, A.: Users browsing behaviour in a DDC-based Web service : a log analysis (2006) 0.03
    0.030113114 = product of:
      0.15056556 = sum of:
        0.040401388 = weight(_text_:web in 2234) [ClassicSimilarity], result of:
          0.040401388 = score(doc=2234,freq=8.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.43268442 = fieldWeight in 2234, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2234)
        0.11016417 = weight(_text_:log in 2234) [ClassicSimilarity], result of:
          0.11016417 = score(doc=2234,freq=4.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.60080814 = fieldWeight in 2234, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.046875 = fieldNorm(doc=2234)
      0.2 = coord(2/10)
    
    Abstract
    This study explores the navigation behaviour of all users of a large web service, Renardus, using web log analysis. Renardus provides integrated searching and browsing access to quality-controlled web resources from major individual subject gateway services. The main navigation feature is subject browsing through the Dewey Decimal Classification (DDC) based on mapping of classes of resources from the distributed gateways to the DDC structure. Among the more surprising results are the hugely dominant share of browsing activities, the good use of browsing support features like the graphical fish-eye overviews, rather long and varied navigation sequences, as well as extensive hierarchical directory-style browsing through the large DDC system.
  12. Ozmutlu, H.C.; Cavdur, F.; Ozmutlu, S.: Cross-validation of neural network applications for automatic new topic identification (2008) 0.03
    0.030015523 = product of:
      0.15007761 = sum of:
        0.037641775 = weight(_text_:web in 1364) [ClassicSimilarity], result of:
          0.037641775 = score(doc=1364,freq=10.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.40312994 = fieldWeight in 1364, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1364)
        0.11243584 = weight(_text_:log in 1364) [ClassicSimilarity], result of:
          0.11243584 = score(doc=1364,freq=6.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.61319727 = fieldWeight in 1364, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1364)
      0.2 = coord(2/10)
    
    Abstract
    The purpose of this study is to provide results from experiments designed to investigate the cross-validation of an artificial neural network application to automatically identify topic changes in Web search engine user sessions by using data logs of different Web search engines for training and testing the neural network. Sample data logs from the FAST and Excite search engines are used in this study. The results of the study show that identification of topic shifts and continuations on a particular Web search engine user session can be achieved with neural networks that are trained on a different Web search engine data log. Although FAST and Excite search engine users differ with respect to some user characteristics (e.g., number of queries per session, number of topics per session), the results of this study demonstrate that both search engine users display similar characteristics as they shift from one topic to another during a single search session. The key finding of this study is that a neural network that is trained on a selected data log could be universal; that is, it can be applicable on all Web search engine transaction logs regardless of the source of the training data log.
  13. Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Frieder, O.; Grossman, D.: Temporal analysis of a very large topically categorized Web query log (2007) 0.03
    0.028318608 = product of:
      0.14159304 = sum of:
        0.029157192 = weight(_text_:web in 60) [ClassicSimilarity], result of:
          0.029157192 = score(doc=60,freq=6.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.3122631 = fieldWeight in 60, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=60)
        0.11243584 = weight(_text_:log in 60) [ClassicSimilarity], result of:
          0.11243584 = score(doc=60,freq=6.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.61319727 = fieldWeight in 60, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=60)
      0.2 = coord(2/10)
    
    Abstract
    The authors review a log of billions of Web queries that constituted the total query traffic for a 6-month period of a general-purpose commercial Web search service. Previously, query logs were studied from a single, cumulative view. In contrast, this study builds on the authors' previous work, which showed changes in popularity and uniqueness of topically categorized queries across the hours in a day. To further their analysis, they examine query traffic on a daily, weekly, and monthly basis by matching it against lists of queries that have been topically precategorized by human editors. These lists represent 13% of the query traffic. They show that query traffic from particular topical categories differs both from the query stream as a whole and from other categories. Additionally, they show that certain categories of queries trend differently over varying periods. The authors key contribution is twofold: They outline a method for studying both the static and topical properties of a very large query log over varying periods, and they identify and examine topical trends that may provide valuable insight for improving both retrieval effectiveness and efficiency.
  14. Koshman, S.; Spink, A.; Jansen, B.J.: Web searching on the Vivisimo search engine (2006) 0.03
    0.027268365 = product of:
      0.13634183 = sum of:
        0.044538345 = weight(_text_:web in 216) [ClassicSimilarity], result of:
          0.044538345 = score(doc=216,freq=14.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.47698978 = fieldWeight in 216, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=216)
        0.091803476 = weight(_text_:log in 216) [ClassicSimilarity], result of:
          0.091803476 = score(doc=216,freq=4.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.5006735 = fieldWeight in 216, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=216)
      0.2 = coord(2/10)
    
    Abstract
    The application of clustering to Web search engine technology is a novel approach that offers structure to the information deluge often faced by Web searchers. Clustering methods have been well studied in research labs; however, real user searching with clustering systems in operational Web environments is not well understood. This article reports on results from a transaction log analysis of Vivisimo.com, which is a Web meta-search engine that dynamically clusters users' search results. A transaction log analysis was conducted on 2-week's worth of data collected from March 28 to April 4 and April 25 to May 2, 2004, representing 100% of site traffic during these periods and 2,029,734 queries overall. The results show that the highest percentage of queries contained two terms. The highest percentage of search sessions contained one query and was less than 1 minute in duration. Almost half of user interactions with clusters consisted of displaying a cluster's result set, and a small percentage of interactions showed cluster tree expansion. Findings show that 11.1% of search sessions were multitasking searches, and there are a broad variety of search topics in multitasking search sessions. Other searching interactions and statistics on repeat users of the search engine are reported. These results provide insights into search characteristics with a cluster-based Web search engine and extend research into Web searching trends.
  15. Nicholas, D.; Nicholas, P.; Jamali, H.R.; Watkinson, A.: ¬The information seeking behaviour of the users of digital scholarly journals (2006) 0.03
    0.02585395 = product of:
      0.12926975 = sum of:
        0.016833913 = weight(_text_:web in 990) [ClassicSimilarity], result of:
          0.016833913 = score(doc=990,freq=2.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.18028519 = fieldWeight in 990, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=990)
        0.11243584 = weight(_text_:log in 990) [ClassicSimilarity], result of:
          0.11243584 = score(doc=990,freq=6.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.61319727 = fieldWeight in 990, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=990)
      0.2 = coord(2/10)
    
    Abstract
    The article employs deep log analysis (DLA) techniques, a more sophisticated form of transaction log analysis, to demonstrate what usage data can disclose about information seeking behaviour of virtual scholars - academics, and researchers. DLA works with the raw server log data, not the processed, pre-defined and selective data provided by journal publishers. It can generate types of analysis that are not generally available via proprietary web logging software because the software filters out relevant data and makes unhelpful assumptions about the meaning of the data. DLA also enables usage data to be associated with search/navigational and/or user demographic data, hence the name 'deep'. In this connection the usage of two digital journal libraries, those of EmeraldInsight, and Blackwell Synergy are investigated. The information seeking behaviour of nearly three million users is analyzed in respect to the extent to which they penetrate the site, the number of visits made, as well as the type of items and content they view. The users are broken down by occupation, place of work, type of subscriber ("Big Deal", non-subscriber, etc.), geographical location, type of university (old and new), referrer link used, and number of items viewed in a session.
  16. Jansen, B.J.; Booth, D.L.; Spink, A.: Determining the informational, navigational, and transactional intent of Web queries (2008) 0.02
    0.024613593 = product of:
      0.12306796 = sum of:
        0.04517013 = weight(_text_:web in 2091) [ClassicSimilarity], result of:
          0.04517013 = score(doc=2091,freq=10.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.48375595 = fieldWeight in 2091, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2091)
        0.07789783 = weight(_text_:log in 2091) [ClassicSimilarity], result of:
          0.07789783 = score(doc=2091,freq=2.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.42483553 = fieldWeight in 2091, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.046875 = fieldNorm(doc=2091)
      0.2 = coord(2/10)
    
    Abstract
    In this paper, we define and present a comprehensive classification of user intent for Web searching. The classification consists of three hierarchical levels of informational, navigational, and transactional intent. After deriving attributes of each, we then developed a software application that automatically classified queries using a Web search engine log of over a million and a half queries submitted by several hundred thousand users. Our findings show that more than 80% of Web queries are informational in nature, with about 10% each being navigational and transactional. In order to validate the accuracy of our algorithm, we manually coded 400 queries and compared the results from this manual classification to the results determined by the automated method. This comparison showed that the automatic classification has an accuracy of 74%. Of the remaining 25% of the queries, the user intent is vague or multi-faceted, pointing to the need for probabilistic classification. We discuss how search engines can use knowledge of user intent to provide more targeted and relevant results in Web searching.
  17. Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.02
    0.024613593 = product of:
      0.12306796 = sum of:
        0.04517013 = weight(_text_:web in 2555) [ClassicSimilarity], result of:
          0.04517013 = score(doc=2555,freq=10.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.48375595 = fieldWeight in 2555, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=2555)
        0.07789783 = weight(_text_:log in 2555) [ClassicSimilarity], result of:
          0.07789783 = score(doc=2555,freq=2.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.42483553 = fieldWeight in 2555, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.046875 = fieldNorm(doc=2555)
      0.2 = coord(2/10)
    
    Abstract
    Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.
  18. Huang, C.-K.; Chien, L.-F.; Oyang, Y.-J.: Relevant term suggestion in interactive Web search based on contextual information in query session logs (2003) 0.02
    0.024192134 = product of:
      0.12096067 = sum of:
        0.029157192 = weight(_text_:web in 1612) [ClassicSimilarity], result of:
          0.029157192 = score(doc=1612,freq=6.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.3122631 = fieldWeight in 1612, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1612)
        0.091803476 = weight(_text_:log in 1612) [ClassicSimilarity], result of:
          0.091803476 = score(doc=1612,freq=4.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.5006735 = fieldWeight in 1612, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1612)
      0.2 = coord(2/10)
    
    Abstract
    This paper proposes an effective term suggestion approach to interactive Web search. Conventional approaches to making term suggestions involve extracting co-occurring keyterms from highly ranked retrieved documents. Such approaches must deal with term extraction difficulties and interference from irrelevant documents, and, more importantly, have difficulty extracting terms that are conceptually related but do not frequently co-occur in documents. In this paper, we present a new, effective log-based approach to relevant term extraction and term suggestion. Using this approach, the relevant terms suggested for a user query are those that cooccur in similar query sessions from search engine logs, rather than in the retrieved documents. In addition, the suggested terms in each interactive search step can be organized according to its relevance to the entire query session, rather than to the most recent single query as in conventional approaches. The proposed approach was tested using a proxy server log containing about two million query transactions submitted to search engines in Taiwan. The obtained experimental results show that the proposed approach can provide organized and highly relevant terms, and can exploit the contextual information in a user's query session to make more effective suggestions.
    Footnote
    Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
  19. Chau, M.; Fang, X.; Sheng, O.R.U.: Analysis of the query logs of a Web site search engine (2005) 0.02
    0.023629673 = product of:
      0.118148364 = sum of:
        0.053233504 = weight(_text_:web in 4573) [ClassicSimilarity], result of:
          0.053233504 = score(doc=4573,freq=20.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.5701118 = fieldWeight in 4573, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4573)
        0.06491486 = weight(_text_:log in 4573) [ClassicSimilarity], result of:
          0.06491486 = score(doc=4573,freq=2.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.3540296 = fieldWeight in 4573, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4573)
      0.2 = coord(2/10)
    
    Abstract
    A large number of studies have investigated the transaction log of general-purpose search engines such as Excite and AItaVista, but few studies have reported an the analysis of search logs for search engines that are limited to particular Web sites, namely, Web site search engines. In this article, we report our research an analyzing the search logs of the search engine of the Utah state government Web site. Our results show that some statistics, such as the number of search terms per query, of Web users are the same for general-purpose search engines and Web site search engines, but others, such as the search topics and the terms used, are considerably different. Possible reasons for the differences include the focused domain of Web site search engines and users' different information needs. The findings are useful for Web site developers to improve the performance of their services provided an the Web and for researchers to conduct further research in this area. The analysis also can be applied in e-government research by investigating how information should be delivered to users in government Web sites.
  20. Pharo, N.; Järvelin, K.: ¬The SST method : a tool for analysing Web information search processes (2004) 0.02
    0.023122046 = product of:
      0.11561023 = sum of:
        0.023806747 = weight(_text_:web in 2533) [ClassicSimilarity], result of:
          0.023806747 = score(doc=2533,freq=4.0), product of:
            0.0933738 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.028611459 = queryNorm
            0.25496176 = fieldWeight in 2533, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2533)
        0.091803476 = weight(_text_:log in 2533) [ClassicSimilarity], result of:
          0.091803476 = score(doc=2533,freq=4.0), product of:
            0.18335998 = queryWeight, product of:
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.028611459 = queryNorm
            0.5006735 = fieldWeight in 2533, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.4086204 = idf(docFreq=197, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2533)
      0.2 = coord(2/10)
    
    Abstract
    The article presents the search situation transition (SST) method for analysing Web information search (WIS) processes. The idea of the method is to analyse searching behaviour, the process, in detail and connect both the searchers' actions (captured in a log) and his/her intentions and goals, which log analysis never captures. On the other hand, ex post factor surveys, while popular in WIS research, cannot capture the actual search processes. The method is presented through three facets: its domain, its procedure, and its justification. The method's domain is presented in the form of a conceptual framework which maps five central categories that influence WIS processes; the searcher, the social/organisational environment, the work task, the search task, and the process itself. The method's procedure includes various techniques for data collection and analysis. The article presents examples from real WIS processes and shows how the method can be used to identify the interplay of the categories during the processes. It is shown that the method presents a new approach in information seeking and retrieval by focusing on the search process as a phenomenon and by explicating how different information seeking factors directly affect the search process.

Types

Themes