Search (866 results, page 1 of 44)

Li, L.; Shang, Y.; Zhang, W.: Improvement of HITS-based algorithms on Web documents 0.31

0.31110388 = product of:
  0.72590905 = sum of:
    0.03902899 = product of:
      0.11708697 = sum of:
        0.11708697 = weight(_text_:3a in 2514) [ClassicSimilarity], result of:
          0.11708697 = score(doc=2514,freq=2.0), product of:
            0.20833312 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.024573348 = queryNorm
            0.56201804 = fieldWeight in 2514, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=2514)
      0.33333334 = coord(1/3)
    0.024536107 = weight(_text_:web in 2514) [ClassicSimilarity], result of:
      0.024536107 = score(doc=2514,freq=4.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.3059541 = fieldWeight in 2514, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2514)
    0.16558598 = weight(_text_:2f in 2514) [ClassicSimilarity], result of:
      0.16558598 = score(doc=2514,freq=4.0), product of:
        0.20833312 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.024573348 = queryNorm
        0.7948135 = fieldWeight in 2514, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2514)
    0.16558598 = weight(_text_:2f in 2514) [ClassicSimilarity], result of:
      0.16558598 = score(doc=2514,freq=4.0), product of:
        0.20833312 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.024573348 = queryNorm
        0.7948135 = fieldWeight in 2514, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2514)
    0.16558598 = weight(_text_:2f in 2514) [ClassicSimilarity], result of:
      0.16558598 = score(doc=2514,freq=4.0), product of:
        0.20833312 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.024573348 = queryNorm
        0.7948135 = fieldWeight in 2514, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2514)
    0.16558598 = weight(_text_:2f in 2514) [ClassicSimilarity], result of:
      0.16558598 = score(doc=2514,freq=4.0), product of:
        0.20833312 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.024573348 = queryNorm
        0.7948135 = fieldWeight in 2514, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=2514)
  0.42857143 = coord(6/14)

Content: Vgl.: http%3A%2F%2Fdelab.csd.auth.gr%2F~dimitris%2Fcourses%2Fir_spring06%2Fpage_rank_computing%2Fp527-li.pdf. Vgl. auch: http://www2002.org/CDROM/refereed/643/.
Source: WWW '02: Proceedings of the 11th International Conference on World Wide Web, May 7-11, 2002, Honolulu, Hawaii, USA

Poynder, R.: Web research engines? (1996) 0.03

0.033356033 = product of:
  0.116746105 = sum of:
    0.038794994 = weight(_text_:web in 5698) [ClassicSimilarity], result of:
      0.038794994 = score(doc=5698,freq=10.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.48375595 = fieldWeight in 5698, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5698)
    0.04711391 = weight(_text_:indexierung in 5698) [ClassicSimilarity], result of:
      0.04711391 = score(doc=5698,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.35650903 = fieldWeight in 5698, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.046875 = fieldNorm(doc=5698)
    0.0050200885 = weight(_text_:information in 5698) [ClassicSimilarity], result of:
      0.0050200885 = score(doc=5698,freq=2.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.116372846 = fieldWeight in 5698, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5698)
    0.02581711 = weight(_text_:retrieval in 5698) [ClassicSimilarity], result of:
      0.02581711 = score(doc=5698,freq=6.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.34732026 = fieldWeight in 5698, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=5698)
  0.2857143 = coord(4/14)

Abstract: Describes the shortcomings of search engines for the WWW comparing their current capabilities to those of the first generation CD-ROM products. Some allow phrase searching and most are improving their Boolean searching. Few allow truncation, wild cards or nested logic. They are stateless, losing previous search criteria. Unlike the indexing and classification systems for today's CD-ROMs, those for Web pages are random, unstructured and of variable quality. Considers that at best Web search engines can only offer free text searching. Discusses whether automatic data classification systems such as Infoseek Ultra can overcome the haphazard nature of the Web with neural network technology, and whether Boolean search techniques may be redundant when replaced by technology such as the Euroferret search engine. However, artificial intelligence is rarely successful on huge, varied databases. Relevance ranking and automatic query expansion still use the same simple inverted indexes. Most Web search engines do nothing more than word counting. Further complications arise with foreign languages
Source: Information world review. 1996, no.120, S.47-48
Theme: Verbale Doksprachen im Online-Retrieval
Klassifikationssysteme im Online-Retrieval
Semantisches Umfeld in Indexierung u. Retrieval

Pahlevi, S.M.; Kitagawa, H.: Conveying taxonomy context for topic-focused Web search (2005) 0.03

0.03226925 = product of:
  0.11294237 = sum of:
    0.045902856 = weight(_text_:web in 3310) [ClassicSimilarity], result of:
      0.045902856 = score(doc=3310,freq=14.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.57238775 = fieldWeight in 3310, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3310)
    0.04711391 = weight(_text_:indexierung in 3310) [ClassicSimilarity], result of:
      0.04711391 = score(doc=3310,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.35650903 = fieldWeight in 3310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.046875 = fieldNorm(doc=3310)
    0.0050200885 = weight(_text_:information in 3310) [ClassicSimilarity], result of:
      0.0050200885 = score(doc=3310,freq=2.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.116372846 = fieldWeight in 3310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3310)
    0.014905514 = weight(_text_:retrieval in 3310) [ClassicSimilarity], result of:
      0.014905514 = score(doc=3310,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.20052543 = fieldWeight in 3310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3310)
  0.2857143 = coord(4/14)

Abstract: Introducing context to a user query is effective to improve the search effectiveness. In this article we propose a method employing the taxonomy-based search services such as Web directories to facilitate searches in any Web search interfaces that support Boolean queries. The proposed method enables one to convey current search context an taxonomy of a taxonomy-based search service to the searches conducted with the Web search interfaces. The basic idea is to learn the search context in the form of a Boolean condition that is commonly accepted by many Web search interfaces, and to use the condition to modify the user query before forwarding it to the Web search interfaces. To guarantee that the modified query can always be processed by the Web search interfaces and to make the method adaptive to different user requirements an search result effectiveness, we have developed new fast classification learning algorithms.
Source: Journal of the American Society for Information Science and Technology. 56(2005) no.2, S.173-188
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Milonas, E.: ¬The use of facets in Web search engines 0.03

0.031188522 = product of:
  0.10915982 = sum of:
    0.05355333 = weight(_text_:web in 3545) [ClassicSimilarity], result of:
      0.05355333 = score(doc=3545,freq=14.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.6677857 = fieldWeight in 3545, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3545)
    0.008282723 = weight(_text_:information in 3545) [ClassicSimilarity], result of:
      0.008282723 = score(doc=3545,freq=4.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.1920054 = fieldWeight in 3545, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3545)
    0.03283024 = weight(_text_:frankfurt in 3545) [ClassicSimilarity], result of:
      0.03283024 = score(doc=3545,freq=2.0), product of:
        0.10213336 = queryWeight, product of:
          4.1562657 = idf(docFreq=1882, maxDocs=44218)
          0.024573348 = queryNorm
        0.32144478 = fieldWeight in 3545, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.1562657 = idf(docFreq=1882, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3545)
    0.014493528 = product of:
      0.043480583 = sum of:
        0.043480583 = weight(_text_:2010 in 3545) [ClassicSimilarity], result of:
          0.043480583 = score(doc=3545,freq=2.0), product of:
            0.117538005 = queryWeight, product of:
              4.7831497 = idf(docFreq=1005, maxDocs=44218)
              0.024573348 = queryNorm
            0.36992785 = fieldWeight in 3545, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7831497 = idf(docFreq=1005, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3545)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: The World Wide Web consists of a plethora of information that a Web searcher can retrieve via Web search engines such as Google. These Web search engines display an insurmountable amount of information in a seemingly unorganized linear format. Recently, some Web search engines have incorporated facets or terms alongside the linear display allowing the searcher the ability to narrow search results. The goal of this study is to examine the use of facets in these Web search engines.
Source: Paradigms and conceptual systems in knowledge organization: Proceedings of the Eleventh International ISKO conference, Rome, 23-26 February 2010, ed. Claudio Gnoli, Indeks, Frankfurt M

Jindal, V.; Bawa, S.; Batra, S.: ¬A review of ranking approaches for semantic search on Web (2014) 0.03

0.031054968 = product of:
  0.108692385 = sum of:
    0.024536107 = weight(_text_:web in 2799) [ClassicSimilarity], result of:
      0.024536107 = score(doc=2799,freq=4.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.3059541 = fieldWeight in 2799, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
    0.04711391 = weight(_text_:indexierung in 2799) [ClassicSimilarity], result of:
      0.04711391 = score(doc=2799,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.35650903 = fieldWeight in 2799, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
    0.01122526 = weight(_text_:information in 2799) [ClassicSimilarity], result of:
      0.01122526 = score(doc=2799,freq=10.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.2602176 = fieldWeight in 2799, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
    0.02581711 = weight(_text_:retrieval in 2799) [ClassicSimilarity], result of:
      0.02581711 = score(doc=2799,freq=6.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.34732026 = fieldWeight in 2799, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2799)
  0.2857143 = coord(4/14)

Abstract: With ever increasing information being available to the end users, search engines have become the most powerful tools for obtaining useful information scattered on the Web. However, it is very common that even most renowned search engines return result sets with not so useful pages to the user. Research on semantic search aims to improve traditional information search and retrieval methods where the basic relevance criteria rely primarily on the presence of query keywords within the returned pages. This work is an attempt to explore different relevancy ranking approaches based on semantics which are considered appropriate for the retrieval of relevant information. In this paper, various pilot projects and their corresponding outcomes have been investigated based on methodologies adopted and their most distinctive characteristics towards ranking. An overview of selected approaches and their comparison by means of the classification criteria has been presented. With the help of this comparison, some common concepts and outstanding features have been identified.
Source: Information processing and management. 50(2014) no.2, S.416-425
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Horch, A.; Kett, H.; Weisbecker, A.: Semantische Suchsysteme für das Internet : Architekturen und Komponenten semantischer Suchmaschinen (2013) 0.03

0.029805508 = product of:
  0.104319274 = sum of:
    0.02891608 = weight(_text_:web in 4063) [ClassicSimilarity], result of:
      0.02891608 = score(doc=4063,freq=8.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.36057037 = fieldWeight in 4063, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4063)
    0.03926159 = weight(_text_:indexierung in 4063) [ClassicSimilarity], result of:
      0.03926159 = score(doc=4063,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.29709086 = fieldWeight in 4063, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4063)
    0.008366814 = weight(_text_:information in 4063) [ClassicSimilarity], result of:
      0.008366814 = score(doc=4063,freq=8.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.19395474 = fieldWeight in 4063, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4063)
    0.027774787 = weight(_text_:retrieval in 4063) [ClassicSimilarity], result of:
      0.027774787 = score(doc=4063,freq=10.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.37365708 = fieldWeight in 4063, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4063)
  0.2857143 = coord(4/14)

Abstract: In der heutigen Zeit nimmt die Flut an Informationen exponentiell zu. In dieser »Informationsexplosion« entsteht täglich eine unüberschaubare Menge an neuen Informationen im Web: Beispielsweise 430 deutschsprachige Artikel bei Wikipedia, 2,4 Mio. Tweets bei Twitter und 12,2 Mio. Kommentare bei Facebook. Während in Deutschland vor einigen Jahren noch Google als nahezu einzige Suchmaschine beim Zugriff auf Informationen im Web genutzt wurde, nehmen heute die u.a. in Social Media veröffentlichten Meinungen und damit die Vorauswahl sowie Bewertung von Informationen einzelner Experten und Meinungsführer an Bedeutung zu. Aber wie können themenspezifische Informationen nun effizient für konkrete Fragestellungen identifiziert und bedarfsgerecht aufbereitet und visualisiert werden? Diese Studie gibt einen Überblick über semantische Standards und Formate, die Prozesse der semantischen Suche, Methoden und Techniken semantischer Suchsysteme, Komponenten zur Entwicklung semantischer Suchmaschinen sowie den Aufbau bestehender Anwendungen. Die Studie erläutert den prinzipiellen Aufbau semantischer Suchsysteme und stellt Methoden der semantischen Suche vor. Zudem werden Softwarewerkzeuge vorgestellt, mithilfe derer einzelne Funktionalitäten von semantischen Suchmaschinen realisiert werden können. Abschließend erfolgt die Betrachtung bestehender semantischer Suchmaschinen zur Veranschaulichung der Unterschiede der Systeme im Aufbau sowie in der Funktionalität.
RSWK: Suchmaschine / Semantic Web / Information Retrieval
Suchmaschine / Information Retrieval / Ranking / Datenstruktur / Kontextbezogenes System
Subject: Suchmaschine / Semantic Web / Information Retrieval
Suchmaschine / Information Retrieval / Ranking / Datenstruktur / Kontextbezogenes System
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Scholer, F.; Williams, H.E.; Turpin, A.: Query association surrogates for Web search (2004) 0.03

0.029662343 = product of:
  0.10381819 = sum of:
    0.03469929 = weight(_text_:web in 2236) [ClassicSimilarity], result of:
      0.03469929 = score(doc=2236,freq=8.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.43268442 = fieldWeight in 2236, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2236)
    0.04711391 = weight(_text_:indexierung in 2236) [ClassicSimilarity], result of:
      0.04711391 = score(doc=2236,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.35650903 = fieldWeight in 2236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.046875 = fieldNorm(doc=2236)
    0.007099477 = weight(_text_:information in 2236) [ClassicSimilarity], result of:
      0.007099477 = score(doc=2236,freq=4.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.16457605 = fieldWeight in 2236, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2236)
    0.014905514 = weight(_text_:retrieval in 2236) [ClassicSimilarity], result of:
      0.014905514 = score(doc=2236,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.20052543 = fieldWeight in 2236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2236)
  0.2857143 = coord(4/14)

Abstract: Collection sizes, query rates, and the number of users of Web search engines are increasing. Therefore, there is continued demand for innovation in providing search services that meet user information needs. In this article, we propose new techniques to add additional terms to documents with the goal of providing more accurate searches. Our techniques are based an query association, where queries are stored with documents that are highly similar statistically. We show that adding query associations to documents improves the accuracy of Web topic finding searches by up to 7%, and provides an excellent complement to existing supplement techniques for site finding. We conclude that using document surrogates derived from query association is a valuable new technique for accurate Web searching.
Source: Journal of the American Society for Information Science and technology. 55(2004) no.7, S.637-650
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Schwartz, C.: Web search engines (1998) 0.03

0.02722879 = product of:
  0.09530076 = sum of:
    0.017349645 = weight(_text_:web in 5700) [ClassicSimilarity], result of:
      0.017349645 = score(doc=5700,freq=2.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.21634221 = fieldWeight in 5700, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5700)
    0.04711391 = weight(_text_:indexierung in 5700) [ClassicSimilarity], result of:
      0.04711391 = score(doc=5700,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.35650903 = fieldWeight in 5700, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.046875 = fieldNorm(doc=5700)
    0.0050200885 = weight(_text_:information in 5700) [ClassicSimilarity], result of:
      0.0050200885 = score(doc=5700,freq=2.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.116372846 = fieldWeight in 5700, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5700)
    0.02581711 = weight(_text_:retrieval in 5700) [ClassicSimilarity], result of:
      0.02581711 = score(doc=5700,freq=6.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.34732026 = fieldWeight in 5700, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=5700)
  0.2857143 = coord(4/14)

Abstract: This reviews looks briefly at the history of WWW search engine development, considers the current state of affairs, and reflects on the future. Networked discovery tools have evolved along with Internet resource availability. WWW search engines display some complexity in their variety, content, resource acquisition strategies, and in the array of tools the deploy to assist users. A small but growing body of evaluation literature, much of it not systematic in nature, indicates that performance effectiveness is difficult to assess in this setting. Significant improvements in general-content search engine retrieval and ranking performance may not be possible, and are probalby not worth the effort, although search engine providers have introduced some rudimentary attempts at personalization, summarization, and query expansion. The shift to distributed search across multitype database systems could extend general networked discovery and retrieval to include smaller resource collections with rich metadata and navigation tools
Source: Journal of the American Society for Information Science. 49(1998) no.11, S.973-982
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Roy, R.S.; Agarwal, S.; Ganguly, N.; Choudhury, M.: Syntactic complexity of Web search queries through the lenses of language models, networks and users (2016) 0.03

0.026073685 = product of:
  0.09125789 = sum of:
    0.03232916 = weight(_text_:web in 3188) [ClassicSimilarity], result of:
      0.03232916 = score(doc=3188,freq=10.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.40312994 = fieldWeight in 3188, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3188)
    0.03926159 = weight(_text_:indexierung in 3188) [ClassicSimilarity], result of:
      0.03926159 = score(doc=3188,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.29709086 = fieldWeight in 3188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3188)
    0.007245874 = weight(_text_:information in 3188) [ClassicSimilarity], result of:
      0.007245874 = score(doc=3188,freq=6.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.16796975 = fieldWeight in 3188, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3188)
    0.012421262 = weight(_text_:retrieval in 3188) [ClassicSimilarity], result of:
      0.012421262 = score(doc=3188,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.16710453 = fieldWeight in 3188, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3188)
  0.2857143 = coord(4/14)

Abstract: Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.
Source: Information processing and management. 52(2016) no.5, S.923-948
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Stock, W.G.: Qualitätskriterien von Suchmaschinen : Checkliste für Retrievalsysteme (2000) 0.03
```
0.025129285 = product of:
  0.087952495 = sum of:
    0.01445804 = weight(_text_:web in 5773) [ClassicSimilarity], result of:
      0.01445804 = score(doc=5773,freq=2.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.18028519 = fieldWeight in 5773, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5773)
    0.055524275 = weight(_text_:indexierung in 5773) [ClassicSimilarity], result of:
      0.055524275 = score(doc=5773,freq=4.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.42014992 = fieldWeight in 5773, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5773)
    0.012421262 = weight(_text_:retrieval in 5773) [ClassicSimilarity], result of:
      0.012421262 = score(doc=5773,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.16710453 = fieldWeight in 5773, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5773)
    0.0055489163 = product of:
      0.016646748 = sum of:
        0.016646748 = weight(_text_:22 in 5773) [ClassicSimilarity], result of:
          0.016646748 = score(doc=5773,freq=2.0), product of:
            0.08605168 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024573348 = queryNorm
            0.19345059 = fieldWeight in 5773, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5773)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)
```
Abstract

Suchmaschinen im World Wide Web wird nachgesagt, dass sie - insbesondere im Vergleich zur Retrievalsoftware kommerzieller Online-Archive suboptimale Methoden und Werkzeuge einsetzen. Elaborierte befehlsorientierte Retrievalsysteme sind vom Laien gar nicht und vom Professional nur dann zu bedienen, wenn man stets damit arbeitet. Die Suchsysteme einiger "independents", also isolierter Informationsproduzenten im Internet, zeichnen sich durch einen Minimalismus aus, der an den Befehlsumfang anfangs der 70er Jahre erinnert. Retrievalsoftware in Intranets, wenn sie denn überhaupt benutzt wird, setzt fast ausnahmslos auf automatische Methoden von Indexierung und Retrieval und ignoriert dabei nahezu vollständig dokumentarisches Know how. Suchmaschinen bzw. Retrievalsysteme - wir wollen beide Bezeichnungen synonym verwenden - bereiten demnach, egal wo sie vorkommen, Schwierigkeiten. An ihrer Qualität wird gezweifelt. Aber was heißt überhaupt: Qualität von Suchmaschinen? Was zeichnet ein gutes Retrievalsystem aus? Und was fehlt einem schlechten? Wir wollen eine Liste von Kriterien entwickeln, die für gutes Suchen (und Finden!) wesentlich sind. Es geht also ausschließlich um Quantität und Qualität der Suchoptionen, nicht um weitere Leistungsindikatoren wie Geschwindigkeit oder ergonomische Benutzerschnittstellen. Stillschweigend vorausgesetzt wirdjedoch der Abschied von ausschließlich befehlsorientierten Systemen, d.h. wir unterstellen Bildschirmgestaltungen, die die Befehle intuitiv einleuchtend darstellen. Unsere Checkliste enthält nur solche Optionen, die entweder (bei irgendwelchen Systemen) schon im Einsatz sind (und wiederholt damit zum Teil Altbekanntes) oder deren technische Realisierungsmöglichkeit bereits in experimentellen Umgebungen aufgezeigt worden ist. insofern ist die Liste eine Minimalforderung an Retrievalsysteme, die durchaus erweiterungsfähig ist. Gegliedert wird der Kriterienkatalog nach (1.) den Basisfunktionen zur Suche singulärer Datensätze, (2.) den informetrischen Funktionen zur Charakterisierunggewisser Nachweismengen sowie (3.) den Kriterien zur Mächtigkeit automatischer Indexierung und natürlichsprachiger Suche

Source

Password. 2000, H.5, S.22-31

Vidinli, I.B.; Ozcan, R.: New query suggestion framework and algorithms : a case study for an educational search engine (2016) 0.02

0.024111189 = product of:
  0.08438916 = sum of:
    0.017349645 = weight(_text_:web in 3185) [ClassicSimilarity], result of:
      0.017349645 = score(doc=3185,freq=2.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.21634221 = fieldWeight in 3185, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3185)
    0.04711391 = weight(_text_:indexierung in 3185) [ClassicSimilarity], result of:
      0.04711391 = score(doc=3185,freq=2.0), product of:
        0.13215348 = queryWeight, product of:
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.024573348 = queryNorm
        0.35650903 = fieldWeight in 3185, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.377919 = idf(docFreq=554, maxDocs=44218)
          0.046875 = fieldNorm(doc=3185)
    0.0050200885 = weight(_text_:information in 3185) [ClassicSimilarity], result of:
      0.0050200885 = score(doc=3185,freq=2.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.116372846 = fieldWeight in 3185, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3185)
    0.014905514 = weight(_text_:retrieval in 3185) [ClassicSimilarity], result of:
      0.014905514 = score(doc=3185,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.20052543 = fieldWeight in 3185, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3185)
  0.2857143 = coord(4/14)

Abstract: Query suggestion is generally an integrated part of web search engines. In this study, we first redefine and reduce the query suggestion problem as "comparison of queries". We then propose a general modular framework for query suggestion algorithm development. We also develop new query suggestion algorithms which are used in our proposed framework, exploiting query, session and user features. As a case study, we use query logs of a real educational search engine that targets K-12 students in Turkey. We also exploit educational features (course, grade) in our query suggestion algorithms. We test our framework and algorithms over a set of queries by an experiment and demonstrate a 66-90% statistically significant increase in relevance of query suggestions compared to a baseline method.
Source: Information processing and management. 52(2016) no.5, S.733-752
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Bradley, P.: Advanced Internet searcher's handbook (1998) 0.02

0.023418711 = product of:
  0.10928732 = sum of:
    0.04089351 = weight(_text_:web in 5454) [ClassicSimilarity], result of:
      0.04089351 = score(doc=5454,freq=4.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.5099235 = fieldWeight in 5454, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=5454)
    0.018708764 = weight(_text_:information in 5454) [ClassicSimilarity], result of:
      0.018708764 = score(doc=5454,freq=10.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.43369597 = fieldWeight in 5454, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=5454)
    0.04968505 = weight(_text_:retrieval in 5454) [ClassicSimilarity], result of:
      0.04968505 = score(doc=5454,freq=8.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.6684181 = fieldWeight in 5454, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=5454)
  0.21428572 = coord(3/14)

Footnote: Rez. in: Information world review. 1999, no.146, S.26 (D. Parr)
LCSH: World Wide Web (Information retrieval system)
Information retrieval
Subject: World Wide Web (Information retrieval system)
Information retrieval

Fu, T.; Abbasi, A.; Chen, H.: ¬A focused crawler for Dark Web forums (2010) 0.02

0.02181359 = product of:
  0.07634756 = sum of:
    0.043374117 = weight(_text_:web in 3471) [ClassicSimilarity], result of:
      0.043374117 = score(doc=3471,freq=18.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.5408555 = fieldWeight in 3471, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3471)
    0.004183407 = weight(_text_:information in 3471) [ClassicSimilarity], result of:
      0.004183407 = score(doc=3471,freq=2.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.09697737 = fieldWeight in 3471, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3471)
    0.012421262 = weight(_text_:retrieval in 3471) [ClassicSimilarity], result of:
      0.012421262 = score(doc=3471,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.16710453 = fieldWeight in 3471, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3471)
    0.016368773 = product of:
      0.049106315 = sum of:
        0.049106315 = weight(_text_:2010 in 3471) [ClassicSimilarity], result of:
          0.049106315 = score(doc=3471,freq=5.0), product of:
            0.117538005 = queryWeight, product of:
              4.7831497 = idf(docFreq=1005, maxDocs=44218)
              0.024573348 = queryNorm
            0.41779095 = fieldWeight in 3471, product of:
              2.236068 = tf(freq=5.0), with freq of:
                5.0 = termFreq=5.0
              4.7831497 = idf(docFreq=1005, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3471)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: The unprecedented growth of the Internet has given rise to the Dark Web, the problematic facet of the Web associated with cybercrime, hate, and extremism. Despite the need for tools to collect and analyze Dark Web forums, the covert nature of this part of the Internet makes traditional Web crawling techniques insufficient for capturing such content. In this study, we propose a novel crawling system designed to collect Dark Web forum content. The system uses a human-assisted accessibility approach to gain access to Dark Web forums. Several URL ordering features and techniques enable efficient extraction of forum postings. The system also includes an incremental crawler coupled with a recall-improvement mechanism intended to facilitate enhanced retrieval and updating of collected content. Experiments conducted to evaluate the effectiveness of the human-assisted accessibility approach and the recall-improvement-based, incremental-update procedure yielded favorable results. The human-assisted approach significantly improved access to Dark Web forums while the incremental crawler with recall improvement also outperformed standard periodic- and incremental-update approaches. Using the system, we were able to collect over 100 Dark Web forums from three regions. A case study encompassing link and content analysis of collected forums was used to illustrate the value and importance of gathering and analyzing content from such online communities.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1213-1231
Year: 2010

Herrera-Viedma, E.; Pasi, G.: Soft approaches to information retrieval and information access on the Web : an introduction to the special topic section (2006) 0.02

0.021017525 = product of:
  0.07356133 = sum of:
    0.03271481 = weight(_text_:web in 5285) [ClassicSimilarity], result of:
      0.03271481 = score(doc=5285,freq=16.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.4079388 = fieldWeight in 5285, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=5285)
    0.01206679 = weight(_text_:information in 5285) [ClassicSimilarity], result of:
      0.01206679 = score(doc=5285,freq=26.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.2797255 = fieldWeight in 5285, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=5285)
    0.024340604 = weight(_text_:retrieval in 5285) [ClassicSimilarity], result of:
      0.024340604 = score(doc=5285,freq=12.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.32745665 = fieldWeight in 5285, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=5285)
    0.0044391328 = product of:
      0.013317398 = sum of:
        0.013317398 = weight(_text_:22 in 5285) [ClassicSimilarity], result of:
          0.013317398 = score(doc=5285,freq=2.0), product of:
            0.08605168 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024573348 = queryNorm
            0.15476047 = fieldWeight in 5285, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5285)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: The World Wide Web is a popular and interactive medium used to collect, disseminate, and access an increasingly huge amount of information, which constitutes the mainstay of the so-called information and knowledge society. Because of its spectacular growth, related to both Web resources (pages, sites, and services) and number of users, the Web is nowadays the main information repository and provides some automatic systems for locating, accessing, and retrieving information. However, an open and crucial question remains: how to provide fast and effective retrieval of the information relevant to specific users' needs. This is a very hard and complex task, since it is pervaded with subjectivity, vagueness, and uncertainty. The expression soft computing refers to techniques and methodologies that work synergistically with the aim of providing flexible information processing tolerant of imprecision, vagueness, partial truth, and approximation. So, soft computing represents a good candidate to design effective systems for information access and retrieval on the Web. One of the most representative tools of soft computing is fuzzy set theory. This special topic section collects research articles witnessing some recent advances in improving the processes of information access and retrieval on the Web by using soft computing tools, and in particular, by using fuzzy sets and/or integrating them with other soft computing tools. In this introductory article, we first review the problem of Web retrieval and the concept of soft computing technology. We then briefly introduce the articles in this section and conclude by highlighting some future research directions that could benefit from the use of soft computing technologies.
Date: 22. 7.2006 16:59:33
Footnote: Beitrag in einer Special Topic Section on Soft Approaches to Information Retrieval and Information Access on the Web
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.511-514

Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.02
```
0.02083417 = product of:
  0.09722613 = sum of:
    0.055995747 = weight(_text_:web in 607) [ClassicSimilarity], result of:
      0.055995747 = score(doc=607,freq=30.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.69824153 = fieldWeight in 607, product of:
          5.477226 = tf(freq=30.0), with freq of:
            30.0 = termFreq=30.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
    0.008366814 = weight(_text_:information in 607) [ClassicSimilarity], result of:
      0.008366814 = score(doc=607,freq=8.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.19395474 = fieldWeight in 607, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
    0.03286357 = weight(_text_:retrieval in 607) [ClassicSimilarity], result of:
      0.03286357 = score(doc=607,freq=14.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.442117 = fieldWeight in 607, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
  0.21428572 = coord(3/14)
```
Abstract

Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1884-1898

Kurzke, C.; Galle, M.; Bathelt, M.: WebAssistant : a user profile specific information retrieval assistant (1998) 0.02

0.019571388 = product of:
  0.068499856 = sum of:
    0.035058882 = weight(_text_:web in 3559) [ClassicSimilarity], result of:
      0.035058882 = score(doc=3559,freq=6.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.43716836 = fieldWeight in 3559, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3559)
    0.008282723 = weight(_text_:information in 3559) [ClassicSimilarity], result of:
      0.008282723 = score(doc=3559,freq=4.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.1920054 = fieldWeight in 3559, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3559)
    0.017389767 = weight(_text_:retrieval in 3559) [ClassicSimilarity], result of:
      0.017389767 = score(doc=3559,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.23394634 = fieldWeight in 3559, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3559)
    0.007768482 = product of:
      0.023305446 = sum of:
        0.023305446 = weight(_text_:22 in 3559) [ClassicSimilarity], result of:
          0.023305446 = score(doc=3559,freq=2.0), product of:
            0.08605168 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024573348 = queryNorm
            0.2708308 = fieldWeight in 3559, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3559)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: Describes the concept of a proxy based information classification and filtering utility, named Web Assistant. On the behalf of users a private view of the WWW is generated based on a previously determined profile. This profile is created by monitoring the user anf group activities when browsing WWW pages. Additional features are integrated to allow for easy interoperability workgroups with similar project interests, maintain personal and common hotlists with automatic modification checks and a sophisticated search engine front-end
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia
Theme: Web-Agenten

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.02

0.019571388 = product of:
  0.068499856 = sum of:
    0.035058882 = weight(_text_:web in 1673) [ClassicSimilarity], result of:
      0.035058882 = score(doc=1673,freq=6.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.43716836 = fieldWeight in 1673, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.008282723 = weight(_text_:information in 1673) [ClassicSimilarity], result of:
      0.008282723 = score(doc=1673,freq=4.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.1920054 = fieldWeight in 1673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.017389767 = weight(_text_:retrieval in 1673) [ClassicSimilarity], result of:
      0.017389767 = score(doc=1673,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.23394634 = fieldWeight in 1673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1673)
    0.007768482 = product of:
      0.023305446 = sum of:
        0.023305446 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.023305446 = score(doc=1673,freq=2.0), product of:
            0.08605168 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024573348 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia; vgl. auch: http://www7.scu.edu.au/programme/posters/1846/com1846.htm.
Theme: Klassifikationssysteme im Online-Retrieval

Amato, G.; Rabitti, F.; Savino, P.: Multimedia document search on the Web (1998) 0.02

0.019474443 = product of:
  0.06816055 = sum of:
    0.03271481 = weight(_text_:web in 3605) [ClassicSimilarity], result of:
      0.03271481 = score(doc=3605,freq=4.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.4079388 = fieldWeight in 3605, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0625 = fieldNorm(doc=3605)
    0.006693451 = weight(_text_:information in 3605) [ClassicSimilarity], result of:
      0.006693451 = score(doc=3605,freq=2.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.1551638 = fieldWeight in 3605, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3605)
    0.01987402 = weight(_text_:retrieval in 3605) [ClassicSimilarity], result of:
      0.01987402 = score(doc=3605,freq=2.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.26736724 = fieldWeight in 3605, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=3605)
    0.0088782655 = product of:
      0.026634796 = sum of:
        0.026634796 = weight(_text_:22 in 3605) [ClassicSimilarity], result of:
          0.026634796 = score(doc=3605,freq=2.0), product of:
            0.08605168 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024573348 = queryNorm
            0.30952093 = fieldWeight in 3605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=3605)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: Presents a multimedia model which describes the various multimedia components, their structure and their relationships with a pre-defined taxonomy of concepts, in order to support search engine information retrieval process
Date: 1. 8.1996 22:08:06
Footnote: Contribution to a special issue devoted to the Proceedings of the 7th International World Wide Web Conference, held 14-18 April 1998, Brisbane, Australia

Park, E.-K.; Ra, D.-Y.; Jang, M.-G.: Techniques for improving web retrieval effectiveness (2005) 0.02

0.019398844 = product of:
  0.09052794 = sum of:
    0.038794994 = weight(_text_:web in 1060) [ClassicSimilarity], result of:
      0.038794994 = score(doc=1060,freq=10.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.48375595 = fieldWeight in 1060, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=1060)
    0.012296655 = weight(_text_:information in 1060) [ClassicSimilarity], result of:
      0.012296655 = score(doc=1060,freq=12.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.2850541 = fieldWeight in 1060, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1060)
    0.039436284 = weight(_text_:retrieval in 1060) [ClassicSimilarity], result of:
      0.039436284 = score(doc=1060,freq=14.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.5305404 = fieldWeight in 1060, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1060)
  0.21428572 = coord(3/14)

Abstract: This paper talks about several schemes for improving retrieval effectiveness that can be used in the named page finding tasks of web information retrieval (Overview of the TREC-2002 web track. In: Proceedings of the Eleventh Text Retrieval Conference TREC-2002, NIST Special Publication #500-251, 2003). These methods were applied on top of the basic information retrieval model as additional mechanisms to upgrade the system. Use of the title of web pages was found to be effective. It was confirmed that anchor texts of incoming links was beneficial as suggested in other works. Sentence-query similarity is a new type of information proposed by us and was identified to be the best information to take advantage of. Stratifying and re-ranking the retrieval list based on the maximum count of index terms in common between a sentence and a query resulted in significant improvement of performance. To demonstrate these facts a large-scale web information retrieval system was developed and used for experimentation.
Source: Information processing and management. 41(2005) no.5, S.1207-1224

Garcés, P.J.; Olivas, J.A.; Romero, F.P.: Concept-matching IR systems versus word-matching information retrieval systems : considering fuzzy interrelations for indexing Web pages (2006) 0.02

0.019359758 = product of:
  0.06775915 = sum of:
    0.03232916 = weight(_text_:web in 5288) [ClassicSimilarity], result of:
      0.03232916 = score(doc=5288,freq=10.0), product of:
        0.08019538 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.024573348 = queryNorm
        0.40312994 = fieldWeight in 5288, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5288)
    0.008366814 = weight(_text_:information in 5288) [ClassicSimilarity], result of:
      0.008366814 = score(doc=5288,freq=8.0), product of:
        0.04313797 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.024573348 = queryNorm
        0.19395474 = fieldWeight in 5288, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5288)
    0.021514257 = weight(_text_:retrieval in 5288) [ClassicSimilarity], result of:
      0.021514257 = score(doc=5288,freq=6.0), product of:
        0.07433229 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.024573348 = queryNorm
        0.28943354 = fieldWeight in 5288, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5288)
    0.0055489163 = product of:
      0.016646748 = sum of:
        0.016646748 = weight(_text_:22 in 5288) [ClassicSimilarity], result of:
          0.016646748 = score(doc=5288,freq=2.0), product of:
            0.08605168 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.024573348 = queryNorm
            0.19345059 = fieldWeight in 5288, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5288)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: This article presents a semantic-based Web retrieval system that is capable of retrieving the Web pages that are conceptually related to the implicit concepts of the query. The concept of concept is managed from a fuzzy point of view by means of semantic areas. In this context, the proposed system improves most search engines that are based on matching words. The key of the system is to use a new version of the Fuzzy Interrelations and Synonymy-Based Concept Representation Model (FIS-CRM) to extract and represent the concepts contained in both the Web pages and the user query. This model, which was integrated into other tools such as the Fuzzy Interrelations and Synonymy based Searcher (FISS) metasearcher and the fz-mail system, considers the fuzzy synonymy and the fuzzy generality interrelations as a means of representing word interrelations (stored in a fuzzy synonymy dictionary and ontologies). The new version of the model, which is based on the study of the cooccurrences of synonyms, integrates a soft method for disambiguating word senses. This method also considers the context of the word to be disambiguated and the thematic ontologies and sets of synonyms stored in the dictionary.
Date: 22. 7.2006 17:14:12
Footnote: Beitrag in einer Special Topic Section on Soft Approaches to Information Retrieval and Information Access on the Web
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.4, S.564-576

Search (866 results, page 1 of 44)

Authors

Years

Languages

Types

Themes

Subjects

Classifications