Search (17 results, page 1 of 1)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.12

0.11977936 = product of:
  0.23955873 = sum of:
    0.2207295 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.2207295 = score(doc=563,freq=2.0), product of:
        0.3927445 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046325076 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.018829225 = product of:
      0.03765845 = sum of:
        0.03765845 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.03765845 = score(doc=563,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.02
```
0.019734079 = product of:
  0.039468158 = sum of:
    0.028484445 = weight(_text_:social in 3807) [ClassicSimilarity], result of:
      0.028484445 = score(doc=3807,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.15419927 = fieldWeight in 3807, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3807)
    0.010983714 = product of:
      0.021967428 = sum of:
        0.021967428 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
          0.021967428 = score(doc=3807,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.1354154 = fieldWeight in 3807, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3807)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Research limitations/implications In total, 42 definitions were identified spanning a period of 11 years. This represented the first use of KM through the estimated apex of terms used. From 2006 onwards definitions were used in repetition, and all definitions that were considered to repeat were therefore subsequently excluded as not being unique instances. All definitions listed are by no means complete and exhaustive. The definitions are viewed outside the scope and context in which they were originally formulated and then used to review the key concepts in the definitions themselves. Social implications When the authors refer to the aforementioned discussion of KM content as well as the presentation of the method followed in this paper, the authors may have a few implications for future research in KM. First the research validates ideas presented by the OECD in 2005 pertaining to KM. It also validates that through the evolution of KM, the authors ended with a description of KM that may be seen as a standardised description. If the authors as academics and practitioners, for example, refer to KM as the same construct and/or idea, it has the potential to speculatively, distinguish between what KM may or may not be. Originality/value By simplifying the term used to define KM, by focusing on the most common definitions, the paper assist in refocusing KM by reconsidering the dimensions that is the most common in how it has been defined over time. This would hopefully assist in reigniting discussions about KM and how it may be used to the benefit of an organisation.

Date

20. 1.2015 18:30:22
Lian, T.; Yu, C.; Wang, W.; Yuan, Q.; Hou, Z.: Doctoral dissertations on tourism in China : a co-word analysis (2016) 0.02
```
0.017620182 = product of:
  0.07048073 = sum of:
    0.07048073 = weight(_text_:social in 3178) [ClassicSimilarity], result of:
      0.07048073 = score(doc=3178,freq=6.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.3815443 = fieldWeight in 3178, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3178)
  0.25 = coord(1/4)
```
Abstract

The aim of this paper is to map the foci of research in doctoral dissertations on tourism in China. In the paper, coword analysis is applied, with keywords coming from six public dissertation databases, i.e. CDFD, Wanfang Data, NLC, CALIS, ISTIC, and NSTL, as well as some university libraries providing doctoral dissertations on tourism. Altogether we have examined 928 doctoral dissertations on tourism written between 1989 and 2013. Doctoral dissertations on tourism in China involve 36 first level disciplines and 102 secondary level disciplines. We collect the top 68 keywords of practical significance in tourism which are mentioned at least four times or more. These keywords are classified into 12 categories based on co-word analysis, including cluster analysis, strategic diagrams analysis, and social network analysis. According to the strategic diagram of the 12 categories, we find the mature and immature areas in tourism study. From social networks, we can see the social network maps of original co-occurrence matrix and k-cores analysis of binary matrix. The paper provides valuable insight into the study of tourism by analyzing doctoral dissertations on tourism in China.
Vechtomova, O.: ¬A method for automatic extraction of multiword units representing business aspects from user reviews (2014) 0.02
```
0.015684258 = product of:
  0.06273703 = sum of:
    0.06273703 = product of:
      0.12547407 = sum of:
        0.12547407 = weight(_text_:aspects in 1304) [ClassicSimilarity], result of:
          0.12547407 = score(doc=1304,freq=8.0), product of:
            0.20938325 = queryWeight, product of:
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.046325076 = queryNorm
            0.5992555 = fieldWeight in 1304, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.046875 = fieldNorm(doc=1304)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The article describes a semi-supervised approach to extracting multiword aspects of user-written reviews that belong to a given category. The method starts with a small set of seed words, representing the target category, and calculates distributional similarity between the candidate and seed words. We compare 3 distributional similarity measures (Lin's, Weeds's, and balAPinc), and a document retrieval function, BM25, adapted as a word similarity measure. We then introduce a method for identifying multiword aspects by using a combination of syntactic rules and a co-occurrence association measure. Finally, we describe a method for ranking multiword aspects by the likelihood of belonging to the target aspect category. The task used for evaluation is extraction of restaurant dish names from a corpus of restaurant reviews.
Vasalou, A.; Gill, A.J.; Mazanderani, F.; Papoutsi, C.; Joinson, A.: Privacy dictionary : a new resource for the automated content analysis of privacy (2011) 0.01
```
0.01220762 = product of:
  0.04883048 = sum of:
    0.04883048 = weight(_text_:social in 4915) [ClassicSimilarity], result of:
      0.04883048 = score(doc=4915,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.26434162 = fieldWeight in 4915, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046875 = fieldNorm(doc=4915)
  0.25 = coord(1/4)
```
Abstract

This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacy-related texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.
Altmann, E.G.; Cristadoro, G.; Esposti, M.D.: On the origin of long-range correlations in texts (2012) 0.01
```
0.01220762 = product of:
  0.04883048 = sum of:
    0.04883048 = weight(_text_:social in 330) [ClassicSimilarity], result of:
      0.04883048 = score(doc=330,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.26434162 = fieldWeight in 330, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046875 = fieldNorm(doc=330)
  0.25 = coord(1/4)
```
Abstract

The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
AL-Smadi, M.; Jaradat, Z.; AL-Ayyoub, M.; Jararweh, Y.: Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features (2017) 0.01
```
0.01220762 = product of:
  0.04883048 = sum of:
    0.04883048 = weight(_text_:social in 5095) [ClassicSimilarity], result of:
      0.04883048 = score(doc=5095,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.26434162 = fieldWeight in 5095, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046875 = fieldNorm(doc=5095)
  0.25 = coord(1/4)
```
Abstract

The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.
Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 2123) [ClassicSimilarity], result of:
      0.040692065 = score(doc=2123,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 2123, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2123)
  0.25 = coord(1/4)
```
Abstract

Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 2335) [ClassicSimilarity], result of:
      0.040692065 = score(doc=2335,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 2335, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2335)
  0.25 = coord(1/4)
```
Abstract

Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
Helbig, H.: Knowledge representation and the semantics of natural language (2014) 0.01
```
0.010173016 = product of:
  0.040692065 = sum of:
    0.040692065 = weight(_text_:social in 2396) [ClassicSimilarity], result of:
      0.040692065 = score(doc=2396,freq=2.0), product of:
        0.1847249 = queryWeight, product of:
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.046325076 = queryNorm
        0.22028469 = fieldWeight in 2396, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9875789 = idf(docFreq=2228, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2396)
  0.25 = coord(1/4)
```
Abstract

Natural Language is not only the most important means of communication between human beings, it is also used over historical periods for the preservation of cultural achievements and their transmission from one generation to the other. During the last few decades, the flod of digitalized information has been growing tremendously. This tendency will continue with the globalisation of information societies and with the growing importance of national and international computer networks. This is one reason why the theoretical understanding and the automated treatment of communication processes based on natural language have such a decisive social and economic impact. In this context, the semantic representation of knowledge originally formulated in natural language plays a central part, because it connects all components of natural language processing systems, be they the automatic understanding of natural language (analysis), the rational reasoning over knowledge bases, or the generation of natural language expressions from formal representations. This book presents a method for the semantic representation of natural language expressions (texts, sentences, phrases, etc.) which can be used as a universal knowledge representation paradigm in the human sciences, like linguistics, cognitive psychology, or philosophy of language, as well as in computational linguistics and in artificial intelligence. It is also an attempt to close the gap between these disciplines, which to a large extent are still working separately.
Anguiano Peña, G.; Naumis Peña, C.: Method for selecting specialized terms from a general language corpus (2015) 0.01
```
0.007842129 = product of:
  0.031368516 = sum of:
    0.031368516 = product of:
      0.06273703 = sum of:
        0.06273703 = weight(_text_:aspects in 2196) [ClassicSimilarity], result of:
          0.06273703 = score(doc=2196,freq=2.0), product of:
            0.20938325 = queryWeight, product of:
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.046325076 = queryNorm
            0.29962775 = fieldWeight in 2196, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.046875 = fieldNorm(doc=2196)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Among the many aspects studied by library and information science are linguistic phenomena associated with document content analysis, for purposes of both information organization and retrieval. To this end, terms used in scientific and technical language must be recovered and their area of domain and behavior studied. Through language, society controls the knowledge available to people. Document content analysis, in this case of scientific texts, facilitates gathering knowledge of lexical units and their major applications and separating such specialized terms from the general language, to create indexing languages. The model presented here or other lexicographic resources with similar characteristics may be useful in the near future, in computer-assisted indexing or as corpora monitors, with respect to new text analyses or specialized corpora. Thus, using techniques for document content analysis of a lexicographically labeled general language corpus proposed herein, components which enable the extraction of lexical units from specialized language may be obtained and characterized.
Multi-source, multilingual information extraction and summarization (2013) 0.01
```
0.0065351077 = product of:
  0.026140431 = sum of:
    0.026140431 = product of:
      0.052280862 = sum of:
        0.052280862 = weight(_text_:aspects in 978) [ClassicSimilarity], result of:
          0.052280862 = score(doc=978,freq=2.0), product of:
            0.20938325 = queryWeight, product of:
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.046325076 = queryNorm
            0.2496898 = fieldWeight in 978, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.0390625 = fieldNorm(doc=978)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams---in general, in multiple languages---containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought---names vs. events,--- and the nature of the sources---news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.
Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.01
```
0.0065351077 = product of:
  0.026140431 = sum of:
    0.026140431 = product of:
      0.052280862 = sum of:
        0.052280862 = weight(_text_:aspects in 2688) [ClassicSimilarity], result of:
          0.052280862 = score(doc=2688,freq=2.0), product of:
            0.20938325 = queryWeight, product of:
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.046325076 = queryNorm
            0.2496898 = fieldWeight in 2688, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5198684 = idf(docFreq=1308, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2688)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.

Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.01

0.006276408 = product of:
  0.025105633 = sum of:
    0.025105633 = product of:
      0.050211266 = sum of:
        0.050211266 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
          0.050211266 = score(doc=1490,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.30952093 = fieldWeight in 1490, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1490)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2015 9:30:24

Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.00
```
0.004707306 = product of:
  0.018829225 = sum of:
    0.018829225 = product of:
      0.03765845 = sum of:
        0.03765845 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
          0.03765845 = score(doc=1848,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.23214069 = fieldWeight in 1848, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1848)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.

Fóris, A.: Network theory and terminology (2013) 0.00

0.0039227554 = product of:
  0.015691021 = sum of:
    0.015691021 = product of:
      0.031382043 = sum of:
        0.031382043 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
          0.031382043 = score(doc=1365,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.19345059 = fieldWeight in 1365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1365)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 2. 9.2014 21:22:48

Rötzer, F.: KI-Programm besser als Menschen im Verständnis natürlicher Sprache (2018) 0.00

0.003138204 = product of:
  0.012552816 = sum of:
    0.012552816 = product of:
      0.025105633 = sum of:
        0.025105633 = weight(_text_:22 in 4217) [ClassicSimilarity], result of:
          0.025105633 = score(doc=4217,freq=2.0), product of:
            0.16222252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046325076 = queryNorm
            0.15476047 = fieldWeight in 4217, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=4217)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 1.2018 11:32:44

Search (17 results, page 1 of 1)

Authors

Languages

Types

Themes

Subjects

Classifications