Search (5 results, page 1 of 1)

Renker, L.: Exploration von Textkorpora : Topic Models als Grundlage der Interaktion (2015) 0.03
```
0.03328773 = product of:
  0.07767137 = sum of:
    0.0328269 = weight(_text_:und in 2380) [ClassicSimilarity], result of:
      0.0328269 = score(doc=2380,freq=14.0), product of:
        0.1013361 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.045721713 = queryNorm
        0.32394084 = fieldWeight in 2380, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2380)
    0.01550009 = weight(_text_:in in 2380) [ClassicSimilarity], result of:
      0.01550009 = score(doc=2380,freq=22.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.24922498 = fieldWeight in 2380, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2380)
    0.029344378 = weight(_text_:den in 2380) [ClassicSimilarity], result of:
      0.029344378 = score(doc=2380,freq=4.0), product of:
        0.13104749 = queryWeight, product of:
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.045721713 = queryNorm
        0.22392172 = fieldWeight in 2380, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.866198 = idf(docFreq=6840, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2380)
  0.42857143 = coord(3/7)
```
Abstract

Das Internet birgt schier endlose Informationen. Ein zentrales Problem besteht heutzutage darin diese auch zugänglich zu machen. Es ist ein fundamentales Domänenwissen erforderlich, um in einer Volltextsuche die korrekten Suchanfragen zu formulieren. Das ist jedoch oftmals nicht vorhanden, so dass viel Zeit aufgewandt werden muss, um einen Überblick des behandelten Themas zu erhalten. In solchen Situationen findet sich ein Nutzer in einem explorativen Suchvorgang, in dem er sich schrittweise an ein Thema heranarbeiten muss. Für die Organisation von Daten werden mittlerweile ganz selbstverständlich Verfahren des Machine Learnings verwendet. In den meisten Fällen bleiben sie allerdings für den Anwender unsichtbar. Die interaktive Verwendung in explorativen Suchprozessen könnte die menschliche Urteilskraft enger mit der maschinellen Verarbeitung großer Datenmengen verbinden. Topic Models sind ebensolche Verfahren. Sie finden in einem Textkorpus verborgene Themen, die sich relativ gut von Menschen interpretieren lassen und sind daher vielversprechend für die Anwendung in explorativen Suchprozessen. Nutzer können damit beim Verstehen unbekannter Quellen unterstützt werden. Bei der Betrachtung entsprechender Forschungsarbeiten fiel auf, dass Topic Models vorwiegend zur Erzeugung statischer Visualisierungen verwendet werden. Das Sensemaking ist ein wesentlicher Bestandteil der explorativen Suche und wird dennoch nur in sehr geringem Umfang genutzt, um algorithmische Neuerungen zu begründen und in einen umfassenden Kontext zu setzen. Daraus leitet sich die Vermutung ab, dass die Verwendung von Modellen des Sensemakings und die nutzerzentrierte Konzeption von explorativen Suchen, neue Funktionen für die Interaktion mit Topic Models hervorbringen und einen Kontext für entsprechende Forschungsarbeiten bieten können.

Footnote

Masterthesis zur Erlangung des akademischen Grades Master of Science (M.Sc.) vorgelegt an der Fachhochschule Köln / Fakultät für Informatik und Ingenieurswissenschaften im Studiengang Medieninformatik.

Imprint

Gummersbach : Fakultät für Informatik und Ingenieurswissenschaften

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.00
```
0.0020029084 = product of:
  0.014020358 = sum of:
    0.014020358 = weight(_text_:in in 3223) [ClassicSimilarity], result of:
      0.014020358 = score(doc=3223,freq=18.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.22543246 = fieldWeight in 3223, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3223)
  0.14285715 = coord(1/7)
```
Abstract

The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.00
```
0.0017663995 = product of:
  0.012364795 = sum of:
    0.012364795 = weight(_text_:in in 1338) [ClassicSimilarity], result of:
      0.012364795 = score(doc=1338,freq=14.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.19881277 = fieldWeight in 1338, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
  0.14285715 = coord(1/7)
```
Abstract

A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Theme

Semantisches Umfeld in Indexierung u. Retrieval
Li, N.; Sun, J.: Improving Chinese term association from the linguistic perspective (2017) 0.00
```
0.0011330162 = product of:
  0.007931113 = sum of:
    0.007931113 = weight(_text_:in in 3381) [ClassicSimilarity], result of:
      0.007931113 = score(doc=3381,freq=4.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.12752387 = fieldWeight in 3381, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=3381)
  0.14285715 = coord(1/7)
```
Abstract

The study aims to solve how to construct the semantic relations of specific domain terms by applying linguistic rules. The semantic structure analysis at the morpheme level was used for semantic measure, and a morpheme-based term association model was proposed by improving and combining the literal-based similarity algorithm and co-occurrence relatedness methods. This study provides a novel insight into the method of semantic analysis and calculation by morpheme parsing, and the proposed solution is feasible for the automatic association of compound terms. The results show that this approach could be used to construct appropriate term association and form a reasonable structural knowledge graph. However, due to linguistic differences, the viability and effectiveness of the use of our method in non-Chinese linguistic environments should be verified.

Theme

Semantisches Umfeld in Indexierung u. Retrieval

Colace, F.; Santo, M. De; Greco, L.; Napoletano, P.: Weighted word pairs for query expansion (2015) 0.00

9.346905E-4 = product of:
  0.0065428335 = sum of:
    0.0065428335 = weight(_text_:in in 2687) [ClassicSimilarity], result of:
      0.0065428335 = score(doc=2687,freq=2.0), product of:
        0.062193166 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.045721713 = queryNorm
        0.10520181 = fieldWeight in 2687, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2687)
  0.14285715 = coord(1/7)

Theme: Semantisches Umfeld in Indexierung u. Retrieval

Search (5 results, page 1 of 1)

Authors

Languages

Types

Subjects