Search (131 results, page 1 of 7)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.28
    0.2845536 = product of:
      0.42683035 = sum of:
        0.05952498 = product of:
          0.17857493 = sum of:
            0.17857493 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.17857493 = score(doc=562,freq=2.0), product of:
                0.3177388 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03747799 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.17857493 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.17857493 = score(doc=562,freq=2.0), product of:
            0.3177388 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03747799 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.17857493 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.17857493 = score(doc=562,freq=2.0), product of:
            0.3177388 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03747799 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.0101555 = product of:
          0.030466499 = sum of:
            0.030466499 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.030466499 = score(doc=562,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
      0.6666667 = coord(4/6)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.21
    0.20833743 = product of:
      0.41667485 = sum of:
        0.05952498 = product of:
          0.17857493 = sum of:
            0.17857493 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.17857493 = score(doc=862,freq=2.0), product of:
                0.3177388 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03747799 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
        0.17857493 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.17857493 = score(doc=862,freq=2.0), product of:
            0.3177388 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03747799 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
        0.17857493 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.17857493 = score(doc=862,freq=2.0), product of:
            0.3177388 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03747799 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.5 = coord(3/6)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  3. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.18
    0.18365268 = product of:
      0.36730537 = sum of:
        0.17857493 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.17857493 = score(doc=563,freq=2.0), product of:
            0.3177388 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03747799 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.17857493 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.17857493 = score(doc=563,freq=2.0), product of:
            0.3177388 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03747799 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.0101555 = product of:
          0.030466499 = sum of:
            0.030466499 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.030466499 = score(doc=563,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.33333334 = coord(1/3)
      0.5 = coord(3/6)
    
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  4. Weiß, E.-M.: ChatGPT soll es richten : Microsoft baut KI in Suchmaschine Bing ein (2023) 0.03
    0.03453535 = product of:
      0.20721208 = sum of:
        0.20721208 = weight(_text_:suchmaschine in 866) [ClassicSimilarity], result of:
          0.20721208 = score(doc=866,freq=10.0), product of:
            0.21191008 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.03747799 = queryNorm
            0.9778302 = fieldWeight in 866, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.0546875 = fieldNorm(doc=866)
      0.16666667 = coord(1/6)
    
    Abstract
    ChatGPT, die künstliche Intelligenz der Stunde, ist von OpenAI entwickelt worden. Und OpenAI ist in der Vergangenheit nicht unerheblich von Microsoft unterstützt worden. Nun geht es ums Profitieren: Die KI soll in die Suchmaschine Bing eingebaut werden, was eine direkte Konkurrenz zu Googles Suchalgorithmen und Intelligenzen bedeutet. Bing war da bislang nicht sonderlich erfolgreich. Wie "The Information" mit Verweis auf zwei Insider berichtet, plant Microsoft, ChatGPT in seine Suchmaschine Bing einzubauen. Bereits im März könnte die neue, intelligente Suche verfügbar sein. Microsoft hatte zuvor auf der hauseigenen Messe Ignite zunächst die Integration des Bildgenerators DALL·E 2 in seine Suchmaschine angekündigt - ohne konkretes Startdatum jedoch. Fragt man ChatGPT selbst, bestätigt der Chatbot seine künftige Aufgabe noch nicht. Weiß aber um potentielle Vorteile.
    Source
    https://www.heise.de/news/ChatGPT-soll-es-richten-Microsoft-baut-KI-in-Suchmaschine-Bing-ein-7447837.html
  5. Jones, K.: Linguistic searching versus relevance ranking : DR-LINK and TARGET (1999) 0.03
    0.028268103 = product of:
      0.16960861 = sum of:
        0.16960861 = weight(_text_:ranking in 6423) [ClassicSimilarity], result of:
          0.16960861 = score(doc=6423,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.8366664 = fieldWeight in 6423, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.109375 = fieldNorm(doc=6423)
      0.16666667 = coord(1/6)
    
  6. Winterschladen, S.; Gurevych, I.: ¬Die perfekte Suchmaschine : Forschungsgruppe entwickelt ein System, das artverwandte Begriffe finden soll (2006) 0.03
    0.025612097 = product of:
      0.15367258 = sum of:
        0.15367258 = weight(_text_:suchmaschine in 5912) [ClassicSimilarity], result of:
          0.15367258 = score(doc=5912,freq=22.0), product of:
            0.21191008 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.03747799 = queryNorm
            0.72517824 = fieldWeight in 5912, product of:
              4.690416 = tf(freq=22.0), with freq of:
                22.0 = termFreq=22.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.02734375 = fieldNorm(doc=5912)
      0.16666667 = coord(1/6)
    
    Content
    "KÖLNER STADT-ANZEIGER: Frau Gurevych, Sie entwickeln eine Suchmaschine der nächsten Generation? Wie kann man sich diese vorstellen? IRYNA GUREVYCH Jeder kennt die herkömmlichen Suchmaschinen wie Google, Yahoo oder Altavista. Diese sind aber nicht perfekt, weil sie nur nach dem Prinzip der Zeichenerkennung funktionieren. Das steigende Informationsbedürfnis können herkömmliche Suchmaschinen nicht befriedigen. KStA: Wieso nicht? GUREVYCH Nehmen wir mal ein konkretes Beispiel: Sie suchen bei Google nach einem Rezept für einen Kuchen, der aber kein Obst enthalten soll. Keine Suchmaschine der Welt kann bisher sinnvoll solche oder ähnliche Anfragen ausführen. Meistens kommen Tausende von Ergebnissen, in denen der Nutzer die relevanten Informationen wie eine Nadel im Heuhaufen suchen muss. KStA: Und Sie können dieses Problem lösen? GUREVYCH Wir entwickeln eine Suchmaschine, die sich nicht nur auf das System der Zeichenerkennung verlässt, sondern auch linguistische Merkmale nutzt. Unsere Suchmaschine soll also auch artverwandte Begriffe zeigen. KStA: Wie weit sind Sie mit Ihrer Forschung? GUREVYCH Das Projekt ist auf zwei Jahre angelegt. Wir haben vor einem halben Jahr begonnen, haben also noch einen großen Teil vor uns. Trotzdem sind die ersten Zwischenergebnisse schon sehr beachtlich. KStA: Und wann geht die Suchmaschine ins Internet? GUREVYCH Da es sich um ein Projekt der Deutschen Forschungsgemeinschaft handelt, wird die Suchmaschine vorerst nicht veröffentlicht. Wir sehen es als unsere Aufgabe an, Verbesserungsmöglichkeiten durch schlaue Such-Algorithmen mit unseren Forschungsarbeiten nachzuweisen und Fehler der bekannten Suchmaschinen zu beseitigen. Und da sind wir auf einem guten Weg. KStA: Arbeiten Sie auch an einem ganz speziellen Projekt? GUREVYCH Ja, ihre erste Bewährungsprobe muss die neue Technologie auf einem auf den ersten Blick ungewöhnlichen Feld bestehen: Unsere Forschungsgruppe an der Technischen Universität Darmstadt entwickelt derzeit ein neuartiges System zur Unterstützung Jugendlicher bei der Berufsauswahl. Dazu stellt uns die Bundesagentur für Arbeit die Beschreibungen von 5800 Berufen in Deutschland zur Verfügung. KStA: Und was sollen Sie dann mit diesen konkreten Informationen machen? GUREVYCH Jugendliche sollen unsere Suchmaschine mit einem Aufsatz über ihre beruflichen Vorlieben flittern. Das System soll dann eine Suchabfrage starten und mögliche Berufe anhand des Interesses des Jugendlichen heraussuchen. Die persönliche Beratung durch die Bundesagentur für Arbeit kann dadurch auf alternative Angebote ausgeweitet werden. Ein erster Prototyp soll Ende des Jahres bereitstehen. KStA: Es geht also zunächst einmal nicht darum, einen Jobfür den Jugendlichen zu finden, sondern den perfekten Beruf für ihn zu ermitteln? GUREVYCH Ja, anhand der Beschreibung des Jugendlichen startet die Suchmaschine eine semantische Abfrage und sucht den passenden Beruf heraus. KStA: Gab es schon weitere Anfragen seitens der Industrie? GUREVYCH Nein, wir haben bisher noch keine Werbung betrieben. Meine Erfahrung zeigt, dass angesehene Kongresse die beste Plattform sind, um die Ergebnisse zu präsentieren und auf sich aufmerksam zu machen. Einige erste Veröffentlichungen sind bereits unterwegs und werden 2006 noch erscheinen. KStA: Wie sieht denn Ihrer Meinung nach die Suchmaschine der Zukunft aus? GUREVYCH Suchmaschinen werden immer spezieller. Das heißt, dass es etwa in der Medizin, bei den Krankenkassen oder im Sport eigene Suchmaschinen geben wird. Außerdem wird die Tendenz verstärkt zu linguistischen Suchmaschinen gehen, die nach artverwandten Begriffen fahnden. Die perfekte Suchmaschine wird wohl eine Kombination aus statistischem und linguistisch-semantischem Suchverhalten sein. Algorithmen, die wir am Fachgebiet Telekooperation an der TU Darmstadt entwickeln, werden für den nächsten qualitativen Sprung bei der Entwicklung der Suchmaschinen von größter Bedeutung sein."
  7. Luo, L.; Ju, J.; Li, Y.-F.; Haffari, G.; Xiong, B.; Pan, S.: ChatRule: mining logical rules with large language models for knowledge graph reasoning (2023) 0.02
    0.023012474 = product of:
      0.06903742 = sum of:
        0.0605745 = weight(_text_:ranking in 1171) [ClassicSimilarity], result of:
          0.0605745 = score(doc=1171,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.29880944 = fieldWeight in 1171, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1171)
        0.008462917 = product of:
          0.025388751 = sum of:
            0.025388751 = weight(_text_:22 in 1171) [ClassicSimilarity], result of:
              0.025388751 = score(doc=1171,freq=2.0), product of:
                0.13124153 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03747799 = queryNorm
                0.19345059 = fieldWeight in 1171, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1171)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Logical rules are essential for uncovering the logical connections between relations, which could improve the reasoning performance and provide interpretable results on knowledge graphs (KGs). Although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from the computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. Besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.
    Date
    23.11.2023 19:07:22
  8. Lobo, S.: ¬Das Ende von Google, wie wir es kannten : Bessere Treffer durch ChatGPT (2022) 0.02
    0.022063822 = product of:
      0.13238293 = sum of:
        0.13238293 = weight(_text_:suchmaschine in 852) [ClassicSimilarity], result of:
          0.13238293 = score(doc=852,freq=2.0), product of:
            0.21191008 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.03747799 = queryNorm
            0.62471277 = fieldWeight in 852, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.078125 = fieldNorm(doc=852)
      0.16666667 = coord(1/6)
    
    Abstract
    Höchste Alarmstufe bei der weltgrößten Suchmaschine: Mit ChatGPT und künstlicher Intelligenz könnte eine neue Ära beginnen.
  9. Nhongkai, S.N.; Bentz, H.-J.: Bilinguale Suche mittels Konzeptnetzen (2006) 0.02
    0.017651059 = product of:
      0.105906345 = sum of:
        0.105906345 = weight(_text_:suchmaschine in 3914) [ClassicSimilarity], result of:
          0.105906345 = score(doc=3914,freq=2.0), product of:
            0.21191008 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.03747799 = queryNorm
            0.4997702 = fieldWeight in 3914, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.0625 = fieldNorm(doc=3914)
      0.16666667 = coord(1/6)
    
    Abstract
    Eine neue Methode der Volltextsuche in bilingualen Textsammlungen wird vorgestellt und anhand eines parallelen Textkorpus (Englisch-Deutsch) geprüft. Die Brücke liefern passende Wortcluster, die aus einer Kookkurrenzanalyse stammen, geliefert von der neuartigen Suchmaschine SENTRAX (Essente Extractor Engine). Diese Cluster repräsentieren Konzepte, die sich in beiden Textsammlungen finden. Die Hypothese ist, dass das Finden mittels solcher Strukturvergleiche erfolgreich möglich ist.
  10. Lee, J.H.; Kim, M.H.; Lee, Y.J.: Information retrieval based on conceptual distance in is-a hierarchies (1993) 0.01
    0.014134051 = product of:
      0.084804304 = sum of:
        0.084804304 = weight(_text_:ranking in 6729) [ClassicSimilarity], result of:
          0.084804304 = score(doc=6729,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.4183332 = fieldWeight in 6729, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6729)
      0.16666667 = coord(1/6)
    
    Abstract
    There have been several document ranking methods to calculate the conceptual distance or closeness between a Boolean query and a document. Though they provide good retrieval effectiveness in many cases, they do not support effective weighting schemes for queries and documents and also have several problems resulting from inappropriate evaluation of Boolean operators. We propose a new method called Knowledge-Based Extended Boolean Model (KB-EBM) in which Salton's extended Boolean model is incorporated. KB-EBM evaluates weighted queries and documents effectively, and avoids the problems of the previous methods. KB-EBM provides high quality document rankings by using term dependence information from is-a hierarchies. The performance experiments show that the proposed method closely simulates human behaviour
  11. Brenner, E.H.: Beyond Boolean : new approaches in information retrieval; the quest for intuitive online search systems past, present & future (1995) 0.01
    0.014134051 = product of:
      0.084804304 = sum of:
        0.084804304 = weight(_text_:ranking in 2547) [ClassicSimilarity], result of:
          0.084804304 = score(doc=2547,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.4183332 = fieldWeight in 2547, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2547)
      0.16666667 = coord(1/6)
    
    Abstract
    The challenge of effectively bringing specific, relevant information from the global sea of data to our fingertips, has become an increasingly difficult one. Discusses how the online information industry, founded on Boolean search systems, may be evolving to take advantage of other methods, such as 'term weighting', 'relevance ranking' and 'query by example'
  12. Bedathur, S.; Narang, A.: Mind your language : effects of spoken query formulation on retrieval effectiveness (2013) 0.01
    0.014134051 = product of:
      0.084804304 = sum of:
        0.084804304 = weight(_text_:ranking in 1150) [ClassicSimilarity], result of:
          0.084804304 = score(doc=1150,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.4183332 = fieldWeight in 1150, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1150)
      0.16666667 = coord(1/6)
    
    Abstract
    Voice search is becoming a popular mode for interacting with search engines. As a result, research has gone into building better voice transcription engines, interfaces, and search engines that better handle inherent verbosity of queries. However, when one considers its use by non- native speakers of English, another aspect that becomes important is the formulation of the query by users. In this paper, we present the results of a preliminary study that we conducted with non-native English speakers who formulate queries for given retrieval tasks. Our results show that the current search engines are sensitive in their rankings to the query formulation, and thus highlights the need for developing more robust ranking methods.
  13. dpa: 14 Forscher mit viel Geld angelockt : Wolfgang-Paul-Preis (2001) 0.01
    0.013238294 = product of:
      0.07942976 = sum of:
        0.07942976 = weight(_text_:suchmaschine in 6814) [ClassicSimilarity], result of:
          0.07942976 = score(doc=6814,freq=2.0), product of:
            0.21191008 = queryWeight, product of:
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.03747799 = queryNorm
            0.37482765 = fieldWeight in 6814, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6542544 = idf(docFreq=420, maxDocs=44218)
              0.046875 = fieldNorm(doc=6814)
      0.16666667 = coord(1/6)
    
    Content
    Darin. "Die Sprachwissenschaftlerin Christiane Fellbaum (dpa-Bild) wird ihr Preisgeld für das an der Berlin-Brandenburgischen Akademie der Wissenschaften zu erstellende "Digitale Wörterbuch der Deutschen Sprache des 20. Jahrhunderts" einsetzen. Sie setzt mit ihrem Computer dort an, wo konventionelle Wörterbücher nicht mehr mithalten können. Sie stellt per Knopfdruck Wortverbindungen her, die eine Sprache so reich an Bildern und Vorstellungen - und damit einzigartig - machen. Ihr elektronisches Lexikon aus über 500 Millionen Wörtern soll später als Datenbank zugänglich sein. Seine Grundlage ist die deutsche Sprache der vergangenen hundert Jahre - ein repräsentativer Querschnitt, zusammengestellt aus Literatur, Zeitungsdeutsch, Fachbuchsprache, Werbetexten und niedergeschriebener Umgangssprache. Wo ein Wörterbuch heute nur ein Wort mit Synonymen oder wenigen Verwendungsmöglichkeiten präsentiert, spannt die Forscherin ein riesiges Netz von Wortverbindungen. Bei Christiane Fellbaums Systematik heißt es beispielsweise nicht nur "verlieren", sondern auch noch "den Faden" oder "die Geduld" verlieren - samt allen möglichen weiteren Kombinationen, die der Computer wie eine Suchmaschine in seinen gespeicherten Texten findet."
  14. Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.01
    0.0121149 = product of:
      0.0726894 = sum of:
        0.0726894 = weight(_text_:ranking in 4199) [ClassicSimilarity], result of:
          0.0726894 = score(doc=4199,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.35857132 = fieldWeight in 4199, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=4199)
      0.16666667 = coord(1/6)
    
    Abstract
    This article studies aggressive word removal in text categorization to reduce the noice in free texts to enhance the computational efficiency of categorization. We use a novel stop word identification method to automatically generate domain specific stoplists which are much larger than a conventional domain-independent stoplist. In our tests with 3 categorization methods on text collections from different domains/applications, significant numbers of words were removed without sacrificing categorization effectiveness. In the test of the Expert Network method on CACM documents, for example, an 87% removal of unique qords reduced the vocabulary of documents from 8.002 distinct words to 1.045 words, which resulted in a 63% time savings and a 74% memory savings in the computation of category ranking, with a 10% precision improvement on average over not using word removal. It is evident in this study that automated word removal based on corpus statistics has a practical and significant impact on the computational tractability of categorization methods in large databases
  15. Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.01
    0.0121149 = product of:
      0.0726894 = sum of:
        0.0726894 = weight(_text_:ranking in 3455) [ClassicSimilarity], result of:
          0.0726894 = score(doc=3455,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.35857132 = fieldWeight in 3455, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=3455)
      0.16666667 = coord(1/6)
    
    Abstract
    Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.
  16. Strötgen, R.; Mandl, T.; Schneider, R.: Entwicklung und Evaluierung eines Question Answering Systems im Rahmen des Cross Language Evaluation Forum (CLEF) (2006) 0.01
    0.0121149 = product of:
      0.0726894 = sum of:
        0.0726894 = weight(_text_:ranking in 5981) [ClassicSimilarity], result of:
          0.0726894 = score(doc=5981,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.35857132 = fieldWeight in 5981, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=5981)
      0.16666667 = coord(1/6)
    
    Abstract
    Question Answering Systeme versuchen, zu konkreten Fragen eine korrekte Antwort zu liefern. Dazu durchsuchen sie einen Dokumentenbestand und extrahieren einen Bruchteil eines Dokuments. Dieser Beitrag beschreibt die Entwicklung eines modularen Systems zum multilingualen Question Answering. Die Strategie bei der Entwicklung zielte auf eine schnellstmögliche Verwendbarkeit eines modularen Systems, das auf viele frei verfügbare Ressourcen zugreift. Das System integriert Module zur Erkennung von Eigennamen, zu Indexierung und Retrieval, elektronische Wörterbücher, Online-Übersetzungswerkzeuge sowie Textkorpora zu Trainings- und Testzwecken und implementiert eigene Ansätze zu den Bereichen der Frage- und AntwortTaxonomien, zum Passagenretrieval und zum Ranking alternativer Antworten.
  17. Vechtomova, O.; Karamuftuoglum, M.; Robertson, S.E.: On document relevance and lexical cohesion between query terms (2006) 0.01
    0.0121149 = product of:
      0.0726894 = sum of:
        0.0726894 = weight(_text_:ranking in 987) [ClassicSimilarity], result of:
          0.0726894 = score(doc=987,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.35857132 = fieldWeight in 987, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=987)
      0.16666667 = coord(1/6)
    
    Abstract
    Lexical cohesion is a property of text, achieved through lexical-semantic relations between words in text. Most information retrieval systems make use of lexical relations in text only to a limited extent. In this paper we empirically investigate whether the degree of lexical cohesion between the contexts of query terms' occurrences in a document is related to its relevance to the query. Lexical cohesion between distinct query terms in a document is estimated on the basis of the lexical-semantic relations (repetition, synonymy, hyponymy and sibling) that exist between there collocates - words that co-occur with them in the same windows of text. Experiments suggest significant differences between the lexical cohesion in relevant and non-relevant document sets exist. A document ranking method based on lexical cohesion shows some performance improvements.
  18. Vechtomova, O.: ¬A method for automatic extraction of multiword units representing business aspects from user reviews (2014) 0.01
    0.0121149 = product of:
      0.0726894 = sum of:
        0.0726894 = weight(_text_:ranking in 1304) [ClassicSimilarity], result of:
          0.0726894 = score(doc=1304,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.35857132 = fieldWeight in 1304, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.046875 = fieldNorm(doc=1304)
      0.16666667 = coord(1/6)
    
    Abstract
    The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
  19. Ferber, R.; Wettler, M.; Rapp, R.: ¬An associative model of word selection in the generation of search queries (1995) 0.01
    0.010095751 = product of:
      0.0605745 = sum of:
        0.0605745 = weight(_text_:ranking in 3177) [ClassicSimilarity], result of:
          0.0605745 = score(doc=3177,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.29880944 = fieldWeight in 3177, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3177)
      0.16666667 = coord(1/6)
    
    Abstract
    To generate a search query based on an end user request, a database searcher has to select appropriate search terms. These terms can either be taken from the request, or they can be added by the searcher. This selection process is simulated by an associative lexical net; the nodes of the net are the terms used in 94 records of written requests to a psychological information agency and the respective online searches. The weights connecting the nodes are calculated from the co-occurrences of these terms in the abstracts of the database PsycLit. To simulate the term selection process of a query, the nodes of all terms used in the written requests are activated, and 1 or more spreading activation cycles are performed. The result of the simulation is a ranking of the terms according to the activities of their nodes. Simulations for all 94 records show a low mean activity rank for the terms selected from the request; the mean activity rank for new terms added by the searcher is lower than the mean activity rank for thode terms of the request that were not used in the query
  20. Sankarasubramaniam, Y.; Ramanathan, K.; Ghosh, S.: Text summarization using Wikipedia (2014) 0.01
    0.010095751 = product of:
      0.0605745 = sum of:
        0.0605745 = weight(_text_:ranking in 2693) [ClassicSimilarity], result of:
          0.0605745 = score(doc=2693,freq=2.0), product of:
            0.20271951 = queryWeight, product of:
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.03747799 = queryNorm
            0.29880944 = fieldWeight in 2693, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4090285 = idf(docFreq=537, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2693)
      0.16666667 = coord(1/6)
    
    Abstract
    Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization - they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence-concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization - users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles.

Years

Languages

  • e 89
  • d 37
  • ru 2
  • el 1
  • More… Less…

Types

  • a 109
  • el 15
  • m 10
  • s 6
  • x 3
  • p 2
  • d 1
  • More… Less…