Search (7 results, page 1 of 1)

Järvelin, A.; Keskustalo, H.; Sormunen, E.; Saastamoinen, M.; Kettunen, K.: Information retrieval from historical newspaper collections in highly inflectional languages : a query expansion approach (2016) 0.02
```
0.018822905 = product of:
  0.09411452 = sum of:
    0.09411452 = weight(_text_:index in 3223) [ClassicSimilarity], result of:
      0.09411452 = score(doc=3223,freq=6.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.418113 = fieldWeight in 3223, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3223)
  0.2 = coord(1/5)
```
Abstract

The aim of the study was to test whether query expansion by approximate string matching methods is beneficial in retrieval from historical newspaper collections in a language rich with compounds and inflectional forms (Finnish). First, approximate string matching methods were used to generate lists of index words most similar to contemporary query terms in a digitized newspaper collection from the 1800s. Top index word variants were categorized to estimate the appropriate query expansion ranges in the retrieval test. Second, the effectiveness of approximate string matching methods, automatically generated inflectional forms, and their combinations were measured in a Cranfield-style test. Finally, a detailed topic-level analysis of test results was conducted. In the index of historical newspaper collection the occurrences of a word typically spread to many linguistic and historical variants along with optical character recognition (OCR) errors. All query expansion methods improved the baseline results. Extensive expansion of around 30 variants for each query word was required to achieve the highest performance improvement. Query expansion based on approximate string matching was superior to using the inflectional forms of the query words, showing that coverage of the different types of variation is more important than precision in handling one type of variation.
Radev, D.R.; Joseph, M.T.; Gibson, B.; Muthukrishnan, P.: ¬A bibliometric and network analysis of the field of computational linguistics (2016) 0.02
```
0.015214371 = product of:
  0.07607185 = sum of:
    0.07607185 = weight(_text_:index in 2764) [ClassicSimilarity], result of:
      0.07607185 = score(doc=2764,freq=2.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.33795667 = fieldWeight in 2764, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2764)
  0.2 = coord(1/5)
```
Abstract

The ACL Anthology is a large collection of research papers in computational linguistics. Citation data were obtained using text extraction from a collection of PDF files with significant manual postprocessing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. We analyzed the networks of paper citations, author citations, and author collaborations in an attempt to identify the most central papers and authors. The analysis includes general network statistics, PageRank, metrics across publication years and venues, the impact factor and h-index, as well as other measures.
Vlachidis, A.; Binding, C.; Tudhope, D.; May, K.: Excavating grey literature : a case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources (2010) 0.01
```
0.008693925 = product of:
  0.043469626 = sum of:
    0.043469626 = weight(_text_:index in 3948) [ClassicSimilarity], result of:
      0.043469626 = score(doc=3948,freq=2.0), product of:
        0.2250935 = queryWeight, product of:
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.051511593 = queryNorm
        0.1931181 = fieldWeight in 3948, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.369764 = idf(docFreq=1520, maxDocs=44218)
          0.03125 = fieldNorm(doc=3948)
  0.2 = coord(1/5)
```
Abstract

Purpose - This paper sets out to discuss the use of information extraction (IE), a natural language-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware "rich" indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. Design/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Findings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as "Grey Literature", from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.01

0.008374932 = product of:
  0.04187466 = sum of:
    0.04187466 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
      0.04187466 = score(doc=563,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.23214069 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
  0.2 = coord(1/5)

Date: 10. 1.2013 19:22:47

Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.01
```
0.008374932 = product of:
  0.04187466 = sum of:
    0.04187466 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
      0.04187466 = score(doc=1848,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.23214069 = fieldWeight in 1848, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.046875 = fieldNorm(doc=1848)
  0.2 = coord(1/5)
```
Abstract

The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.

Fóris, A.: Network theory and terminology (2013) 0.01

0.00697911 = product of:
  0.03489555 = sum of:
    0.03489555 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
      0.03489555 = score(doc=1365,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.19345059 = fieldWeight in 1365, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1365)
  0.2 = coord(1/5)

Date: 2. 9.2014 21:22:48

Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.00

0.004885377 = product of:
  0.024426885 = sum of:
    0.024426885 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
      0.024426885 = score(doc=3807,freq=2.0), product of:
        0.18038483 = queryWeight, product of:
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.051511593 = queryNorm
        0.1354154 = fieldWeight in 3807, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5018296 = idf(docFreq=3622, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3807)
  0.2 = coord(1/5)

Date: 20. 1.2015 18:30:22

Search (7 results, page 1 of 1)

Authors

Types

Themes