Search (7 results, page 1 of 1)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.19

0.1939249 = product of:
  0.5817747 = sum of:
    0.1380745 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1380745 = score(doc=563,freq=2.0), product of:
        0.24567628 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.028978055 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.1380745 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1380745 = score(doc=563,freq=2.0), product of:
        0.24567628 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.028978055 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.1380745 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1380745 = score(doc=563,freq=2.0), product of:
        0.24567628 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.028978055 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.1380745 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1380745 = score(doc=563,freq=2.0), product of:
        0.24567628 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.028978055 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.029476684 = sum of:
      0.005919926 = weight(_text_:information in 563) [ClassicSimilarity], result of:
        0.005919926 = score(doc=563,freq=2.0), product of:
          0.050870337 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.028978055 = queryNorm
          0.116372846 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
      0.023556758 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
        0.023556758 = score(doc=563,freq=2.0), product of:
          0.101476215 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.028978055 = queryNorm
          0.23214069 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
  0.33333334 = coord(5/15)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Scherer Auberson, K.: Counteracting concept drift in natural language classifiers : proposal for an automated method (2018) 0.02

0.017870152 = product of:
  0.08935076 = sum of:
    0.018872911 = weight(_text_:und in 2849) [ClassicSimilarity], result of:
      0.018872911 = score(doc=2849,freq=8.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.29385152 = fieldWeight in 2849, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=2849)
    0.06751789 = weight(_text_:informationswissenschaft in 2849) [ClassicSimilarity], result of:
      0.06751789 = score(doc=2849,freq=6.0), product of:
        0.13053758 = queryWeight, product of:
          4.504705 = idf(docFreq=1328, maxDocs=44218)
          0.028978055 = queryNorm
        0.5172295 = fieldWeight in 2849, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.504705 = idf(docFreq=1328, maxDocs=44218)
          0.046875 = fieldNorm(doc=2849)
    0.002959963 = product of:
      0.005919926 = sum of:
        0.005919926 = weight(_text_:information in 2849) [ClassicSimilarity], result of:
          0.005919926 = score(doc=2849,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.116372846 = fieldWeight in 2849, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2849)
      0.5 = coord(1/2)
  0.2 = coord(3/15)

Abstract: Natural Language Classifier helfen Unternehmen zunehmend dabei die Flut von Textdaten zu überwinden. Aber diese Classifier, einmal trainiert, verlieren mit der Zeit ihre Nützlichkeit. Sie bleiben statisch, aber die zugrundeliegende Domäne der Textdaten verändert sich: Ihre Genauigkeit nimmt aufgrund eines Phänomens ab, das als Konzeptdrift bekannt ist. Die Frage ist ob Konzeptdrift durch die Ausgabe eines Classifiers zuverlässig erkannt werden kann, und falls ja: ist es möglich dem durch nachtrainieren des Classifiers entgegenzuwirken. Es wird eine System-Implementierung mittels Proof-of-Concept vorgestellt, bei der das Konfidenzmass des Classifiers zur Erkennung von Konzeptdrift verwendet wird. Der Classifier wird dann iterativ neu trainiert, indem er Stichproben mit niedrigem Konfidenzmass auswählt, sie korrigiert und im Trainingsset der nächsten Iteration verwendet. Die Leistung des Classifiers wird über die Zeit gemessen, und die Leistung des Systems beobachtet. Basierend darauf werden schließlich Empfehlungen gegeben, die sich bei der Implementierung solcher Systeme als nützlich erweisen können.
Content: Diese Publikation entstand im Rahmen einer Thesis zum Master of Science FHO in Business Administration, Major Information and Data Management.
Imprint: Chur : Hochschule für Technik und Wirtschaft / Arbeitsbereich Informationswissenschaft
Series: Churer Schriften zur Informationswissenschaft / Arbeitsbereich Informationswissenschaft; Schrift 98

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2013) 0.00

0.00275476 = product of:
  0.020660698 = sum of:
    0.015727427 = weight(_text_:und in 1810) [ClassicSimilarity], result of:
      0.015727427 = score(doc=1810,freq=2.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.24487628 = fieldWeight in 1810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.078125 = fieldNorm(doc=1810)
    0.0049332716 = product of:
      0.009866543 = sum of:
        0.009866543 = weight(_text_:information in 1810) [ClassicSimilarity], result of:
          0.009866543 = score(doc=1810,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.19395474 = fieldWeight in 1810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1810)
      0.5 = coord(1/2)
  0.13333334 = coord(2/15)

Content: Trägerin des VFI-Dissertationspreises 2014: "Überzeugende gründliche linguistische und quantitative Analyse eines im Information Retrieval bisher wenig beachteten Textelementes anhand eines eigens erstellten grossen Hypertextkorpus, einschliesslich der Evaluation selbsterstellter Auflösungsregeln für die Nutzung in künftigen IR-Systemen.".

Bredack, J.: Automatische Extraktion fachterminologischer Mehrwortbegriffe : ein Verfahrensvergleich (2016) 0.00
```
0.0015727426 = product of:
  0.023591138 = sum of:
    0.023591138 = weight(_text_:und in 3194) [ClassicSimilarity], result of:
      0.023591138 = score(doc=3194,freq=18.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.3673144 = fieldWeight in 3194, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3194)
  0.06666667 = coord(1/15)
```
Abstract

In dieser Untersuchung wurden zwei Systeme eingesetzt, um MWT aus einer Dokumentkollektion mit fachsprachlichem Bezug (Volltexte des ACL Anthology Reference Corpus) automatisch zu extrahieren. Das thematische Spektrum umfasste alle Bereiche der natürlichen Sprachverarbeitung, im Speziellen die CL als interdisziplinäre Wissenschaft. Ziel war es MWT zu extrahieren, die als potentielle Indexterme im IR Verwendung finden können. Diese sollten auf Konzepte, Methoden, Verfahren und Algorithmen in der CL und angrenzenden Teilgebieten, wie Linguistik und Informatik hinweisen bzw. benennen.
Als Extraktionssysteme wurden der TreeTagger und die Indexierungssoftware Lingo verwendet. Der TreeTagger basiert auf einem statistischen Tagging- und Chunking- Algorithmus, mit dessen Hilfe NPs automatisch identifiziert und extrahiert werden. Er kann für verschiedene Anwendungsszenarien der natürlichen Sprachverarbeitung eingesetzt werden, in erster Linie als POS-Tagger für unterschiedliche Sprachen. Das Indexierungssystem Lingo arbeitet im Gegensatz zum TreeTagger mit elektronischen Wörterbüchern und einem musterbasierten Abgleich. Lingo ist ein auf automatische Indexierung ausgerichtetes System, was eine Vielzahl von Modulen mitliefert, die individuell auf eine bestimmte Aufgabenstellung angepasst und aufeinander abgestimmt werden können. Die unterschiedlichen Verarbeitungsweisen haben sich in den Ergebnismengen beider Systeme deutlich gezeigt. Die gering ausfallenden Übereinstimmungen der Ergebnismengen verdeutlichen die abweichende Funktionsweise und konnte mit einer qualitativen Analyse beispielhaft beschrieben werden. In der vorliegenden Arbeit kann abschließend nicht geklärt werden, welches der beiden Systeme bevorzugt für die Generierung von Indextermen eingesetzt werden sollte.
Renker, L.: Exploration von Textkorpora : Topic Models als Grundlage der Interaktion (2015) 0.00
```
0.0013870286 = product of:
  0.020805428 = sum of:
    0.020805428 = weight(_text_:und in 2380) [ClassicSimilarity], result of:
      0.020805428 = score(doc=2380,freq=14.0), product of:
        0.06422601 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.028978055 = queryNorm
        0.32394084 = fieldWeight in 2380, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2380)
  0.06666667 = coord(1/15)
```
Abstract

Das Internet birgt schier endlose Informationen. Ein zentrales Problem besteht heutzutage darin diese auch zugänglich zu machen. Es ist ein fundamentales Domänenwissen erforderlich, um in einer Volltextsuche die korrekten Suchanfragen zu formulieren. Das ist jedoch oftmals nicht vorhanden, so dass viel Zeit aufgewandt werden muss, um einen Überblick des behandelten Themas zu erhalten. In solchen Situationen findet sich ein Nutzer in einem explorativen Suchvorgang, in dem er sich schrittweise an ein Thema heranarbeiten muss. Für die Organisation von Daten werden mittlerweile ganz selbstverständlich Verfahren des Machine Learnings verwendet. In den meisten Fällen bleiben sie allerdings für den Anwender unsichtbar. Die interaktive Verwendung in explorativen Suchprozessen könnte die menschliche Urteilskraft enger mit der maschinellen Verarbeitung großer Datenmengen verbinden. Topic Models sind ebensolche Verfahren. Sie finden in einem Textkorpus verborgene Themen, die sich relativ gut von Menschen interpretieren lassen und sind daher vielversprechend für die Anwendung in explorativen Suchprozessen. Nutzer können damit beim Verstehen unbekannter Quellen unterstützt werden. Bei der Betrachtung entsprechender Forschungsarbeiten fiel auf, dass Topic Models vorwiegend zur Erzeugung statischer Visualisierungen verwendet werden. Das Sensemaking ist ein wesentlicher Bestandteil der explorativen Suche und wird dennoch nur in sehr geringem Umfang genutzt, um algorithmische Neuerungen zu begründen und in einen umfassenden Kontext zu setzen. Daraus leitet sich die Vermutung ab, dass die Verwendung von Modellen des Sensemakings und die nutzerzentrierte Konzeption von explorativen Suchen, neue Funktionen für die Interaktion mit Topic Models hervorbringen und einen Kontext für entsprechende Forschungsarbeiten bieten können.

Footnote

Masterthesis zur Erlangung des akademischen Grades Master of Science (M.Sc.) vorgelegt an der Fachhochschule Köln / Fakultät für Informatik und Ingenieurswissenschaften im Studiengang Medieninformatik.

Imprint

Gummersbach : Fakultät für Informatik und Ingenieurswissenschaften

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2015) 0.00

4.6511332E-4 = product of:
  0.0069766995 = sum of:
    0.0069766995 = product of:
      0.013953399 = sum of:
        0.013953399 = weight(_text_:information in 1172) [ClassicSimilarity], result of:
          0.013953399 = score(doc=1172,freq=4.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.27429342 = fieldWeight in 1172, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1172)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)

RSWK: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>
Subject: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>

Nagy T., I.: Detecting multiword expressions and named entities in natural language texts (2014) 0.00
```
1.1510967E-4 = product of:
  0.001726645 = sum of:
    0.001726645 = product of:
      0.00345329 = sum of:
        0.00345329 = weight(_text_:information in 1536) [ClassicSimilarity], result of:
          0.00345329 = score(doc=1536,freq=2.0), product of:
            0.050870337 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.028978055 = queryNorm
            0.06788416 = fieldWeight in 1536, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=1536)
      0.5 = coord(1/2)
  0.06666667 = coord(1/15)
```
Abstract

Multiword expressions (MWEs) are lexical items that can be decomposed into single words and display lexical, syntactic, semantic, pragmatic and/or statistical idiosyncrasy (Sag et al., 2002; Kim, 2008; Calzolari et al., 2002). The proper treatment of multiword expressions such as rock 'n' roll and make a decision is essential for many natural language processing (NLP) applications like information extraction and retrieval, terminology extraction and machine translation, and it is important to identify multiword expressions in context. For example, in machine translation we must know that MWEs form one semantic unit, hence their parts should not be translated separately. For this, multiword expressions should be identified first in the text to be translated. The chief aim of this thesis is to develop machine learning-based approaches for the automatic detection of different types of multiword expressions in English and Hungarian natural language texts. In our investigations, we pay attention to the characteristics of different types of multiword expressions such as nominal compounds, multiword named entities and light verb constructions, and we apply novel methods to identify MWEs in raw texts. In the thesis it will be demonstrated that nominal compounds and multiword amed entities may require a similar approach for their automatic detection as they behave in the same way from a linguistic point of view. Furthermore, it will be shown that the automatic detection of light verb constructions can be carried out using two effective machine learning-based approaches.

Search (7 results, page 1 of 1)

Authors

Languages

Themes

Subjects

Classifications