Search (91 results, page 1 of 5)

Sprachtechnologie : ein Überblick (2012) 0.01
```
0.012843607 = product of:
  0.051374428 = sum of:
    0.051374428 = weight(_text_:digitale in 1750) [ClassicSimilarity], result of:
      0.051374428 = score(doc=1750,freq=2.0), product of:
        0.18027179 = queryWeight, product of:
          5.158747 = idf(docFreq=690, maxDocs=44218)
          0.034944877 = queryNorm
        0.2849832 = fieldWeight in 1750, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.158747 = idf(docFreq=690, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1750)
  0.25 = coord(1/4)
```
Abstract

Seit mehr als einem halben Jahrhundert existieren ernsthafte und ernst zu nehmende Versuche, menschliche Sprache maschinell zu verarbeiten. Maschinelle Übersetzung oder "natürliche" Dialoge mit Computern gehören zu den ersten Ideen, die den Bereich der späteren Computerlinguistik oder Sprachtechnologie abgesteckt und deren Vorhaben geleitet haben. Heute ist dieser auch maschinelle Sprachverarbeitung (natural language processing, NLP) genannte Bereich stark ausdiversifiziert: Durch die rapide Entwicklung der Informatik ist vieles vorher Unvorstellbare Realität (z. B. automatische Telefonauskunft), einiges früher Unmögliche immerhin möglich geworden (z. B. Handhelds mit Sprachein- und -ausgabe als digitale persönliche (Informations-)Assistenten). Es gibt verschiedene Anwendungen der Computerlinguistik, von denen einige den Sprung in die kommerzielle Nutzung geschafft haben (z. B. Diktiersysteme, Textklassifikation, maschinelle Übersetzung). Immer noch wird an natürlichsprachlichen Systemen (natural language systems, NLS) verschiedenster Funktionalität (z. B. zur Beantwortung beliebiger Fragen oder zur Generierung komplexer Texte) intensiv geforscht, auch wenn die hoch gesteckten Ziele von einst längst nicht erreicht sind (und deshalb entsprechend "heruntergefahren" wurden). Wo die maschinelle Sprachverarbeitung heute steht, ist allerdings angesichts der vielfältigen Aktivitäten in der Computerlinguistik und Sprachtechnologie weder offensichtlich noch leicht in Erfahrung zu bringen (für Studierende des Fachs und erst recht für Laien). Ein Ziel dieses Buches ist, es, die aktuelle Literaturlage in dieser Hinsicht zu verbessern, indem spezifisch systembezogene Aspekte der Computerlinguistik als Überblick über die Sprachtechnologie zusammengetragen werden.
Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.01
```
0.010274886 = product of:
  0.041099545 = sum of:
    0.041099545 = weight(_text_:digitale in 4217) [ClassicSimilarity], result of:
      0.041099545 = score(doc=4217,freq=2.0), product of:
        0.18027179 = queryWeight, product of:
          5.158747 = idf(docFreq=690, maxDocs=44218)
          0.034944877 = queryNorm
        0.22798656 = fieldWeight in 4217, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.158747 = idf(docFreq=690, maxDocs=44218)
          0.03125 = fieldNorm(doc=4217)
  0.25 = coord(1/4)
```
Abstract

Jetzt scheint es allmählich ans Eingemachte zu gehen. Ein von der chinesischen Alibaba-Gruppe entwickelte KI-Programm konnte erstmals Menschen in der Beantwortung von Fragen und dem Verständnis von Text schlagen. Die chinesische Regierung will das Land führend in der Entwicklung von Künstlicher Intelligenz machen und hat dafür eine nationale Strategie aufgestellt. Dazu ernannte das Ministerium für Wissenschaft und Technik die Internetkonzerne Baidu, Alibaba und Tencent sowie iFlyTek zum ersten nationalen Team für die Entwicklung der KI-Technik der nächsten Generation. Baidu ist zuständig für die Entwicklung autonomer Fahrzeuge, Alibaba für die Entwicklung von Clouds für "city brains" (Smart Cities sollen sich an ihre Einwohner und ihre Umgebung anpassen), Tencent für die Enwicklung von Computervision für medizinische Anwendungen und iFlyTec für "Stimmenintelligenz". Die vier Konzerne sollen offene Plattformen herstellen, die auch andere Firmen und Start-ups verwenden können. Überdies wird bei Peking für eine Milliarde US-Dollar ein Technologiepark für die Entwicklung von KI gebaut. Dabei geht es selbstverständlich nicht nur um zivile Anwendungen, sondern auch militärische. Noch gibt es in den USA mehr KI-Firmen, aber China liegt bereits an zweiter Stelle. Das Pentagon ist beunruhigt. Offenbar kommt China rasch vorwärts. Ende 2017 stellte die KI-Firma iFlyTek, die zunächst auf Stimmerkennung und digitale Assistenten spezialisiert war, einen Roboter vor, der den schriftlichen Test der nationalen Medizinprüfung erfolgreich bestanden hatte. Der Roboter war nicht nur mit immensem Wissen aus 53 medizinischen Lehrbüchern, 2 Millionen medizinischen Aufzeichnungen und 400.000 medizinischen Texten und Berichten gefüttert worden, er soll von Medizinexperten klinische Erfahrungen und Falldiagnosen übernommen haben. Eingesetzt werden soll er, in China herrscht vor allem auf dem Land, Ärztemangel, als Helfer, der mit der automatischen Auswertung von Patientendaten eine erste Diagnose erstellt und ansonsten Ärzten mit Vorschlägen zur Seite stehen.
Engerer, V.: Exploring interdisciplinary relationships between linguistics and information retrieval from the 1960s to today (2017) 0.01
```
0.005643786 = product of:
  0.022575144 = sum of:
    0.022575144 = weight(_text_:information in 3434) [ClassicSimilarity], result of:
      0.022575144 = score(doc=3434,freq=20.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.36800325 = fieldWeight in 3434, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3434)
  0.25 = coord(1/4)
```
Abstract

This article explores how linguistics has influenced information retrieval (IR) and attempts to explain the impact of linguistics through an analysis of internal developments in information science generally, and IR in particular. It notes that information science/IR has been evolving from a case science into a fully fledged, "disciplined"/disciplinary science. The article establishes correspondences between linguistics and information science/IR using the three established IR paradigms-physical, cognitive, and computational-as a frame of reference. The current relationship between information science/IR and linguistics is elucidated through discussion of some recent information science publications dealing with linguistic topics and a novel technique, "keyword collocation analysis," is introduced. Insights from interdisciplinarity research and case theory are also discussed. It is demonstrated that the three stages of interdisciplinarity, namely multidisciplinarity, interdisciplinarity (in the narrow sense), and transdisciplinarity, can be linked to different phases of the information science/IR-linguistics relationship and connected to different ways of using linguistic theory in information science and IR.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.3, S.660-680
Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.01
```
0.0050479556 = product of:
  0.020191822 = sum of:
    0.020191822 = weight(_text_:information in 2339) [ClassicSimilarity], result of:
      0.020191822 = score(doc=2339,freq=16.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.3291521 = fieldWeight in 2339, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
  0.25 = coord(1/4)
```
Abstract

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2553-2565
Multi-source, multilingual information extraction and summarization (2013) 0.00
```
0.0044618044 = product of:
  0.017847218 = sum of:
    0.017847218 = weight(_text_:information in 978) [ClassicSimilarity], result of:
      0.017847218 = score(doc=978,freq=18.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.2909321 = fieldWeight in 978, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=978)
  0.25 = coord(1/4)
```
Abstract

Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams---in general, in multiple languages---containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought---names vs. events,--- and the nature of the sources---news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.

RSWK

Natürlichsprachiges System / Information Extraction / Automatische Inhaltsanalyse / Zusammenfassung / Aufsatzsammlung

Subject

Natürlichsprachiges System / Information Extraction / Automatische Inhaltsanalyse / Zusammenfassung / Aufsatzsammlung

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2015) 0.00

0.0042066295 = product of:
  0.016826518 = sum of:
    0.016826518 = weight(_text_:information in 1172) [ClassicSimilarity], result of:
      0.016826518 = score(doc=1172,freq=4.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.27429342 = fieldWeight in 1172, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1172)
  0.25 = coord(1/4)

RSWK: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>
Subject: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>

Keselman, A.; Rosemblat, G.; Kilicoglu, H.; Fiszman, M.; Jin, H.; Shin, D.; Rindflesch, T.C.: Adapting semantic natural language processing technology to address information overload in influenza epidemic management (2010) 0.00
```
0.0042066295 = product of:
  0.016826518 = sum of:
    0.016826518 = weight(_text_:information in 1312) [ClassicSimilarity], result of:
      0.016826518 = score(doc=1312,freq=16.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.27429342 = fieldWeight in 1312, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1312)
  0.25 = coord(1/4)
```
Abstract

The explosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain. Methods include human review and semantic NLP analysis of a set of relevant documents. This is followed by a pilot test in which two information specialists use the adapted application for a realistic information-seeking task. According to the results, the ontology of influenza epidemics management can be described via a manageable number of semantic relationships that involve concepts from a limited number of semantic types. Test users demonstrate several ways to engage with the application to obtain useful information. This suggests that existing semantic NLP algorithms can be adapted to support information summarization and visualization in influenza epidemics and other disaster health areas. However, additional research is needed in the areas of terminology development (as many relevant relationships and terms are not part of existing standardized vocabularies), NLP, and user interface design.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2531-2543

Kocijan, K.: Visualizing natural language resources (2015) 0.00

0.0042066295 = product of:
  0.016826518 = sum of:
    0.016826518 = weight(_text_:information in 2995) [ClassicSimilarity], result of:
      0.016826518 = score(doc=2995,freq=4.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.27429342 = fieldWeight in 2995, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=2995)
  0.25 = coord(1/4)

Source: Re:inventing information science in the networked society: Proceedings of the 14th International Symposium on Information Science, Zadar/Croatia, 19th-21st May 2015. Eds.: F. Pehar, C. Schloegl u. C. Wolff

Babik, W.: Keywords as linguistic tools in information and knowledge organization (2017) 0.00
```
0.004164351 = product of:
  0.016657405 = sum of:
    0.016657405 = weight(_text_:information in 3510) [ClassicSimilarity], result of:
      0.016657405 = score(doc=3510,freq=8.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.27153665 = fieldWeight in 3510, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3510)
  0.25 = coord(1/4)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
Rosemblat, G.; Resnick, M.P.; Auston, I.; Shin, D.; Sneiderman, C.; Fizsman, M.; Rindflesch, T.C.: Extending SemRep to the public health domain (2013) 0.00
```
0.0039907596 = product of:
  0.015963038 = sum of:
    0.015963038 = weight(_text_:information in 2096) [ClassicSimilarity], result of:
      0.015963038 = score(doc=2096,freq=10.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.2602176 = fieldWeight in 2096, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2096)
  0.25 = coord(1/4)
```
Abstract

We describe the use of a domain-independent method to extend a natural language processing (NLP) application, SemRep (Rindflesch, Fiszman, & Libbus, 2005), based on the knowledge sources afforded by the Unified Medical Language System (UMLS®; Humphreys, Lindberg, Schoolman, & Barnett, 1998) to support the area of health promotion within the public health domain. Public health professionals require good information about successful health promotion policies and programs that might be considered for application within their own communities. Our effort seeks to improve access to relevant information for the public health profession, to help those in the field remain an information-savvy workforce. Natural language processing and semantic techniques hold promise to help public health professionals navigate the growing ocean of information by organizing and structuring this knowledge into a focused public health framework paired with a user-friendly visualization application as a way to summarize results of PubMed® searches in this field of knowledge.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.10, S.1963-1974
Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.00
```
0.0036430482 = product of:
  0.014572193 = sum of:
    0.014572193 = weight(_text_:information in 2123) [ClassicSimilarity], result of:
      0.014572193 = score(doc=2123,freq=12.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.23754507 = fieldWeight in 2123, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2123)
  0.25 = coord(1/4)
```
Abstract

Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1546-1558
Kim, S.; Ko, Y.; Oard, D.W.: Combining lexical and statistical translation evidence for cross-language information retrieval (2015) 0.00
```
0.0035694437 = product of:
  0.014277775 = sum of:
    0.014277775 = weight(_text_:information in 1606) [ClassicSimilarity], result of:
      0.014277775 = score(doc=1606,freq=8.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.23274569 = fieldWeight in 1606, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1606)
  0.25 = coord(1/4)
```
Abstract

This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.1, S.23-39
Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.00
```
0.0033256328 = product of:
  0.013302531 = sum of:
    0.013302531 = weight(_text_:information in 1338) [ClassicSimilarity], result of:
      0.013302531 = score(doc=1338,freq=10.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.21684799 = fieldWeight in 1338, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
  0.25 = coord(1/4)
```
Abstract

A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Source

Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1577-1596
Lu, K.; Cai, X.; Ajiferuke, I.; Wolfram, D.: Vocabulary size and its effect on topic representation (2017) 0.00
```
0.003091229 = product of:
  0.012364916 = sum of:
    0.012364916 = weight(_text_:information in 3414) [ClassicSimilarity], result of:
      0.012364916 = score(doc=3414,freq=6.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.20156369 = fieldWeight in 3414, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3414)
  0.25 = coord(1/4)
```
Abstract

This study investigates how computational overhead for topic model training may be reduced by selectively removing terms from the vocabulary of text corpora being modeled. We compare the impact of removing singly occurring terms, the top 0.5%, 1% and 5% most frequently occurring terms and both top 0.5% most frequent and singly occurring terms, along with changes in the number of topics modeled (10, 20, 30, 40, 50, 100) using three datasets. Four outcome measures are compared. The removal of singly occurring terms has little impact on outcomes for all of the measures tested. Document discriminative capacity, as measured by the document space density, is reduced by the removal of frequently occurring terms, but increases with higher numbers of topics. Vocabulary size does not greatly influence entropy, but entropy is affected by the number of topics. Finally, topic similarity, as measured by pairwise topic similarity and Jensen-Shannon divergence, decreases with the removal of frequent terms. The findings have implications for information science research in information retrieval and informetrics that makes use of topic modeling.

Source

Information processing and management. 53(2017) no.3, S.653-665
Korman, D.Z.; Mack, E.; Jett, J.; Renear, A.H.: Defining textual entailment (2018) 0.00
```
0.003091229 = product of:
  0.012364916 = sum of:
    0.012364916 = weight(_text_:information in 4284) [ClassicSimilarity], result of:
      0.012364916 = score(doc=4284,freq=6.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.20156369 = fieldWeight in 4284, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
  0.25 = coord(1/4)
```
Abstract

Textual entailment is a relationship that obtains between fragments of text when one fragment in some sense implies the other fragment. The automation of textual entailment recognition supports a wide variety of text-based tasks, including information retrieval, information extraction, question answering, text summarization, and machine translation. Much ingenuity has been devoted to developing algorithms for identifying textual entailments, but relatively little to saying what textual entailment actually is. This article is a review of the logical and philosophical issues involved in providing an adequate definition of textual entailment. We show that many natural definitions of textual entailment are refuted by counterexamples, including the most widely cited definition of Dagan et al. We then articulate and defend the following revised definition: T textually entails H?=?df typically, a human reading T would be justified in inferring the proposition expressed by H from the proposition expressed by T. We also show that textual entailment is context-sensitive, nontransitive, and nonmonotonic.

Source

Journal of the Association for Information Science and Technology. 69(2018) no.6, S.763-772
AL-Smadi, M.; Jaradat, Z.; AL-Ayyoub, M.; Jararweh, Y.: Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features (2017) 0.00
```
0.003091229 = product of:
  0.012364916 = sum of:
    0.012364916 = weight(_text_:information in 5095) [ClassicSimilarity], result of:
      0.012364916 = score(doc=5095,freq=6.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.20156369 = fieldWeight in 5095, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5095)
  0.25 = coord(1/4)
```
Abstract

The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.

Source

Information processing and management. 53(2017) no.3, S.640-652
Anizi, M.; Dichy, J.: Improving information retrieval in Arabic through a multi-agent approach and a rich lexical resource (2011) 0.00
```
0.0029745363 = product of:
  0.011898145 = sum of:
    0.011898145 = weight(_text_:information in 4738) [ClassicSimilarity], result of:
      0.011898145 = score(doc=4738,freq=8.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.19395474 = fieldWeight in 4738, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4738)
  0.25 = coord(1/4)
```
Abstract

This paper addresses the optimization of information retrieval in Arabic. The results derived from the expanding development of sites in Arabic are often spectacular. Nevertheless, several observations indicate that the responses remain disappointing, particularly upon comparing users' requests and quality of responses. One of the problems encountered by users is the loss of time when navigating between different URLs to find adequate responses. This, in many cases, is due to the absence of forms morphologically related to the research keyword. Such problems can be approached through a morphological analyzer drawing on the DIINAR.1 morpho-lexical resource. A second problem concerns the formulation of the query, which may prove ambiguous, as in everyday language. We then focus on contextual disambiguation based on a rich lexical resource that includes collocations and set expressions. The overall scheme of such a resource will only be hinted at here. Our approach leads to the elaboration of a multi-agent system, motivated by a need to solve problems encountered when using conventional methods of analysis, and to improve the results of queries thanks to a better collaboration between different levels of analysis. We suggest resorting to four agents: morphological, morpho-lexical, contextualization, and an interface agent. These agents 'negotiate' and 'cooperate' throughout the analysis process, starting from the submission of the initial query, and going on until an adequate query is obtained.

Content

Beitrag innerhalb einer Special Section: Knowledge Organization, Competitive Intelligence, and Information Systems - Papers from 4th International Conference on "Information Systems & Economic Intelligence," February 17-19th, 2011. Marrakech - Morocco.

Hahn, U.: Methodische Grundlagen der Informationslinguistik (2013) 0.00

0.0029745363 = product of:
  0.011898145 = sum of:
    0.011898145 = weight(_text_:information in 719) [ClassicSimilarity], result of:
      0.011898145 = score(doc=719,freq=2.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.19395474 = fieldWeight in 719, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=719)
  0.25 = coord(1/4)

Source: Grundlagen der praktischen Information und Dokumentation. Handbuch zur Einführung in die Informationswissenschaft und -praxis. 6., völlig neu gefaßte Ausgabe. Hrsg. von R. Kuhlen, W. Semar u. D. Strauch. Begründet von Klaus Laisiepen, Ernst Lutterbeck, Karl-Heinrich Meyer-Uhlenried

Ye, Z.; He, B.; Wang, L.; Luo, T.: Utilizing term proximity for blog post retrieval (2013) 0.00
```
0.0029745363 = product of:
  0.011898145 = sum of:
    0.011898145 = weight(_text_:information in 1126) [ClassicSimilarity], result of:
      0.011898145 = score(doc=1126,freq=8.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.19395474 = fieldWeight in 1126, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1126)
  0.25 = coord(1/4)
```
Abstract

Term proximity is effective for many information retrieval (IR) research fields yet remains unexplored in blogosphere IR. The blogosphere is characterized by large amounts of noise, including incohesive, off-topic content and spam. Consequently, the classical bag-of-words unigram IR models are not reliable enough to provide robust and effective retrieval performance. In this article, we propose to boost the blog postretrieval performance by employing term proximity information. We investigate a variety of popular and state-of-the-art proximity-based statistical IR models, including a proximity-based counting model, the Markov random field (MRF) model, and the divergence from randomness (DFR) multinomial model. Extensive experimentation on the standard TREC Blog06 test dataset demonstrates that the introduction of term proximity information is indeed beneficial to retrieval from the blogosphere. Results also indicate the superiority of the unordered bi-gram model with the sequential-dependence phrases over other variants of the proximity-based models. Finally, inspired by the effectiveness of proximity models, we extend our study by exploring the proximity evidence between query terms and opinionated terms. The consequent opinionated proximity model shows promising performance in the experiments.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2278-2298

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2013) 0.00

0.0029745363 = product of:
  0.011898145 = sum of:
    0.011898145 = weight(_text_:information in 1810) [ClassicSimilarity], result of:
      0.011898145 = score(doc=1810,freq=2.0), product of:
        0.06134496 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.034944877 = queryNorm
        0.19395474 = fieldWeight in 1810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1810)
  0.25 = coord(1/4)

Content: Trägerin des VFI-Dissertationspreises 2014: "Überzeugende gründliche linguistische und quantitative Analyse eines im Information Retrieval bisher wenig beachteten Textelementes anhand eines eigens erstellten grossen Hypertextkorpus, einschliesslich der Evaluation selbsterstellter Auflösungsregeln für die Nutzung in künftigen IR-Systemen.".

Search (91 results, page 1 of 5)

Authors

Languages

Types

Themes

Subjects

Classifications