Search (10 results, page 1 of 1)

  • × theme_ss:"Computerlinguistik"
  • × type_ss:"el"
  • × year_i:[2000 TO 2010}
  1. Boleda, G.; Evert, S.: Multiword expressions : a pain in the neck of lexical semantics (2009) 0.02
    0.018346803 = product of:
      0.045867007 = sum of:
        0.008173384 = weight(_text_:a in 4888) [ClassicSimilarity], result of:
          0.008173384 = score(doc=4888,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.15287387 = fieldWeight in 4888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.09375 = fieldNorm(doc=4888)
        0.037693623 = product of:
          0.07538725 = sum of:
            0.07538725 = weight(_text_:22 in 4888) [ClassicSimilarity], result of:
              0.07538725 = score(doc=4888,freq=2.0), product of:
                0.16237405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046368346 = queryNorm
                0.46428138 = fieldWeight in 4888, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4888)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Date
    1. 3.2013 14:56:22
  2. Ramisch, C.; Schreiner, P.; Idiart, M.; Villavicencio, A.: ¬An evaluation of methods for the extraction of multiword expressions (20xx) 0.01
    0.006654713 = product of:
      0.016636781 = sum of:
        0.00770594 = weight(_text_:a in 962) [ClassicSimilarity], result of:
          0.00770594 = score(doc=962,freq=4.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.14413087 = fieldWeight in 962, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=962)
        0.0089308405 = product of:
          0.017861681 = sum of:
            0.017861681 = weight(_text_:information in 962) [ClassicSimilarity], result of:
              0.017861681 = score(doc=962,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.21943474 = fieldWeight in 962, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=962)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    This paper focuses on the evaluation of some methods for the automatic acquisition of Multiword Expressions (MWEs). First we investigate the hypothesis that MWEs can be detected solely by the distinct statistical properties of their component words, regardless of their type, comparing 3 statistical measures: Mutual Information, Chi**2 and Permutation Entropy. Moreover, we also look at the impact that the addition of type-specific linguistic information has on the performance of these methods.
    Type
    a
  3. Galitsky, B.: Can many agents answer questions better than one? (2005) 0.01
    0.005549766 = product of:
      0.013874415 = sum of:
        0.009138121 = weight(_text_:a in 3094) [ClassicSimilarity], result of:
          0.009138121 = score(doc=3094,freq=10.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.1709182 = fieldWeight in 3094, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3094)
        0.0047362936 = product of:
          0.009472587 = sum of:
            0.009472587 = weight(_text_:information in 3094) [ClassicSimilarity], result of:
              0.009472587 = score(doc=3094,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.116372846 = fieldWeight in 3094, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3094)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    The paper addresses the issue of how online natural language question answering, based on deep semantic analysis, may compete with currently popular keyword search, open domain information retrieval systems, covering a horizontal domain. We suggest the multiagent question answering approach, where each domain is represented by an agent which tries to answer questions taking into account its specific knowledge. The meta-agent controls the cooperation between question answering agents and chooses the most relevant answer(s). We argue that multiagent question answering is optimal in terms of access to business and financial knowledge, flexibility in query phrasing, and efficiency and usability of advice. The knowledge and advice encoded in the system are initially prepared by domain experts. We analyze the commercial application of multiagent question answering and the robustness of the meta-agent. The paper suggests that a multiagent architecture is optimal when a real world question answering domain combines a number of vertical ones to form a horizontal domain.
  4. Collins, C.: WordNet explorer : applying visualization principles to lexical semantics (2006) 0.00
    0.0047055925 = product of:
      0.011763981 = sum of:
        0.005448922 = weight(_text_:a in 1288) [ClassicSimilarity], result of:
          0.005448922 = score(doc=1288,freq=2.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.10191591 = fieldWeight in 1288, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=1288)
        0.006315058 = product of:
          0.012630116 = sum of:
            0.012630116 = weight(_text_:information in 1288) [ClassicSimilarity], result of:
              0.012630116 = score(doc=1288,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.1551638 = fieldWeight in 1288, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1288)
          0.5 = coord(1/2)
      0.4 = coord(2/5)
    
    Abstract
    Interface designs for lexical databases in NLP have suffered from not following design principles developed in the information visualization research community. We present a design paradigm and show it can be used to generate visualizations which maximize the usability and utility ofWordNet. The techniques can be generally applied to other lexical databases used in NLP research.
  5. WordHoard: finding multiword units (20??) 0.00
    0.0025228865 = product of:
      0.012614433 = sum of:
        0.012614433 = weight(_text_:a in 1123) [ClassicSimilarity], result of:
          0.012614433 = score(doc=1123,freq=14.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.23593865 = fieldWeight in 1123, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1123)
      0.2 = coord(1/5)
    
    Abstract
    WordHoard defines a multiword unit as a special type of collocate in which the component words comprise a meaningful phrase. For example, "Knight of the Round Table" is a meaningful multiword unit or phrase. WordHoard uses the notion of a pseudo-bigram to generalize the computation of bigram (two word) statistical measures to phrases (n-grams) longer than two words, and to allow comparisons of these measures for phrases with different word counts. WordHoard applies the localmaxs algorithm of Silva et al. to the pseudo-bigrams to identify potential compositional phrases that "stand out" in a text. WordHoard can also filter two and three word phrases using the word class filters suggested by Justeson and Katz.
    Type
    a
  6. Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.00
    0.002246648 = product of:
      0.01123324 = sum of:
        0.01123324 = weight(_text_:a in 2804) [ClassicSimilarity], result of:
          0.01123324 = score(doc=2804,freq=34.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.21010503 = fieldWeight in 2804, product of:
              5.8309517 = tf(freq=34.0), with freq of:
                34.0 = termFreq=34.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=2804)
      0.2 = coord(1/5)
    
    Abstract
    The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.
    Content
    Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].
    Type
    a
  7. Griffiths, T.L.; Steyvers, M.: ¬A probabilistic approach to semantic representation (2002) 0.00
    0.0018875621 = product of:
      0.009437811 = sum of:
        0.009437811 = weight(_text_:a in 3671) [ClassicSimilarity], result of:
          0.009437811 = score(doc=3671,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17652355 = fieldWeight in 3671, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3671)
      0.2 = coord(1/5)
    
    Abstract
    Semantic networks produced from human data have statistical properties that cannot be easily captured by spatial representations. We explore a probabilistic approach to semantic representation that explicitly models the probability with which words occurin diffrent contexts, and hence captures the probabilistic relationships between words. We show that this representation has statistical properties consistent with the large-scale structure of semantic networks constructed by humans, and trace the origins of these properties.
    Type
    a
  8. Nielsen, R.D.; Ward, W.; Martin, J.H.; Palmer, M.: Extracting a representation from text for semantic analysis (2008) 0.00
    0.0018875621 = product of:
      0.009437811 = sum of:
        0.009437811 = weight(_text_:a in 3365) [ClassicSimilarity], result of:
          0.009437811 = score(doc=3365,freq=6.0), product of:
            0.053464882 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046368346 = queryNorm
            0.17652355 = fieldWeight in 3365, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0625 = fieldNorm(doc=3365)
      0.2 = coord(1/5)
    
    Abstract
    We present a novel fine-grained semantic representation of text and an approach to constructing it. This representation is largely extractable by today's technologies and facilitates more detailed semantic analysis. We discuss the requirements driving the representation, suggest how it might be of value in the automated tutoring domain, and provide evidence of its validity.
    Type
    a
  9. Rötzer, F.: Computer ergooglen die Bedeutung von Worten (2005) 0.00
    6.698131E-4 = product of:
      0.0033490653 = sum of:
        0.0033490653 = product of:
          0.0066981306 = sum of:
            0.0066981306 = weight(_text_:information in 3385) [ClassicSimilarity], result of:
              0.0066981306 = score(doc=3385,freq=4.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.08228803 = fieldWeight in 3385, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=3385)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Content
    Mit einem bereits zuvor von Paul Vitanyi und anderen entwickeltem Verfahren, das den Zusammenhang von Objekten misst (normalized information distance - NID ), kann die Nähe zwischen bestimmten Objekten (Bilder, Worte, Muster, Intervalle, Genome, Programme etc.) anhand aller Eigenschaften analysiert und aufgrund der dominanten gemeinsamen Eigenschaft bestimmt werden. Ähnlich können auch die allgemein verwendeten, nicht unbedingt "wahren" Bedeutungen von Namen mit der Google-Suche erschlossen werden. 'At this moment one database stands out as the pinnacle of computer-accessible human knowledge and the most inclusive summary of statistical information: the Google search engine. There can be no doubt that Google has already enabled science to accelerate tremendously and revolutionized the research process. It has dominated the attention of internet users for years, and has recently attracted substantial attention of many Wall Street investors, even reshaping their ideas of company financing.' (Paul Vitanyi und Rudi Cilibrasi) Gibt man ein Wort ein wie beispielsweise "Pferd", erhält man bei Google 4.310.000 indexierte Seiten. Für "Reiter" sind es 3.400.000 Seiten. Kombiniert man beide Begriffe, werden noch 315.000 Seiten erfasst. Für das gemeinsame Auftreten beispielsweise von "Pferd" und "Bart" werden zwar noch immer erstaunliche 67.100 Seiten aufgeführt, aber man sieht schon, dass "Pferd" und "Reiter" enger zusammen hängen. Daraus ergibt sich eine bestimmte Wahrscheinlichkeit für das gemeinsame Auftreten von Begriffen. Aus dieser Häufigkeit, die sich im Vergleich mit der maximalen Menge (5.000.000.000) an indexierten Seiten ergibt, haben die beiden Wissenschaftler eine statistische Größe entwickelt, die sie "normalised Google distance" (NGD) nennen und die normalerweise zwischen 0 und 1 liegt. Je geringer NGD ist, desto enger hängen zwei Begriffe zusammen. "Das ist eine automatische Bedeutungsgenerierung", sagt Vitanyi gegenüber dern New Scientist (4). "Das könnte gut eine Möglichkeit darstellen, einen Computer Dinge verstehen und halbintelligent handeln zu lassen." Werden solche Suchen immer wieder durchgeführt, lässt sich eine Karte für die Verbindungen von Worten erstellen. Und aus dieser Karte wiederum kann ein Computer, so die Hoffnung, auch die Bedeutung der einzelnen Worte in unterschiedlichen natürlichen Sprachen und Kontexten erfassen. So habe man über einige Suchen realisiert, dass ein Computer zwischen Farben und Zahlen unterscheiden, holländische Maler aus dem 17. Jahrhundert und Notfälle sowie Fast-Notfälle auseinander halten oder elektrische oder religiöse Begriffe verstehen könne. Überdies habe eine einfache automatische Übersetzung Englisch-Spanisch bewerkstelligt werden können. Auf diese Weise ließe sich auch, so hoffen die Wissenschaftler, die Bedeutung von Worten erlernen, könne man Spracherkennung verbessern oder ein semantisches Web erstellen und natürlich endlich eine bessere automatische Übersetzung von einer Sprache in die andere realisieren.
  10. Artemenko, O.; Shramko, M.: Entwicklung eines Werkzeugs zur Sprachidentifikation in mono- und multilingualen Texten (2005) 0.00
    5.525676E-4 = product of:
      0.002762838 = sum of:
        0.002762838 = product of:
          0.005525676 = sum of:
            0.005525676 = weight(_text_:information in 572) [ClassicSimilarity], result of:
              0.005525676 = score(doc=572,freq=2.0), product of:
                0.08139861 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046368346 = queryNorm
                0.06788416 = fieldWeight in 572, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=572)
          0.5 = coord(1/2)
      0.2 = coord(1/5)
    
    Abstract
    Mit der Verbreitung des Internets vermehrt sich die Menge der im World Wide Web verfügbaren Dokumente. Die Gewährleistung eines effizienten Zugangs zu gewünschten Informationen für die Internetbenutzer wird zu einer großen Herausforderung an die moderne Informationsgesellschaft. Eine Vielzahl von Werkzeugen wird bereits eingesetzt, um den Nutzern die Orientierung in der wachsenden Informationsflut zu erleichtern. Allerdings stellt die enorme Menge an unstrukturierten und verteilten Informationen nicht die einzige Schwierigkeit dar, die bei der Entwicklung von Werkzeugen dieser Art zu bewältigen ist. Die zunehmende Vielsprachigkeit von Web-Inhalten resultiert in dem Bedarf an Sprachidentifikations-Software, die Sprache/en von elektronischen Dokumenten zwecks gezielter Weiterverarbeitung identifiziert. Solche Sprachidentifizierer können beispielsweise effektiv im Bereich des Multilingualen Information Retrieval eingesetzt werden, da auf den Sprachidentifikationsergebnissen Prozesse der automatischen Indexbildung wie Stemming, Stoppwörterextraktion etc. aufbauen. In der vorliegenden Arbeit wird das neue System "LangIdent" zur Sprachidentifikation von elektronischen Textdokumenten vorgestellt, das in erster Linie für Lehre und Forschung an der Universität Hildesheim verwendet werden soll. "LangIdent" enthält eine Auswahl von gängigen Algorithmen zu der monolingualen Sprachidentifikation, die durch den Benutzer interaktiv ausgewählt und eingestellt werden können. Zusätzlich wurde im System ein neuer Algorithmus implementiert, der die Identifikation von Sprachen, in denen ein multilinguales Dokument verfasst ist, ermöglicht. Die Identifikation beschränkt sich nicht nur auf eine Aufzählung von gefundenen Sprachen, vielmehr wird der Text in monolinguale Abschnitte aufgeteilt, jeweils mit der Angabe der identifizierten Sprache.