Search (24 results, page 1 of 2)

  • × theme_ss:"Computerlinguistik"
  • × year_i:[2010 TO 2020}
  1. Fóris, A.: Network theory and terminology (2013) 0.03
    0.027774185 = product of:
      0.083322555 = sum of:
        0.083322555 = product of:
          0.124983825 = sum of:
            0.095468804 = weight(_text_:network in 1365) [ClassicSimilarity], result of:
              0.095468804 = score(doc=1365,freq=8.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.492033 = fieldWeight in 1365, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1365)
            0.029515022 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
              0.029515022 = score(doc=1365,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.19345059 = fieldWeight in 1365, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1365)
          0.6666667 = coord(2/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The paper aims to present the relations of network theory and terminology. The model of scale-free networks, which has been recently developed and widely applied since, can be effectively used in terminology research as well. Operation based on the principle of networks is a universal characteristic of complex systems. Networks are governed by general laws. The model of scale-free networks can be viewed as a statistical-probability model, and it can be described with mathematical tools. Its main feature is that "everything is connected to everything else," that is, every node is reachable (in a few steps) starting from any other node; this phenomena is called "the small world phenomenon." The existence of a linguistic network and the general laws of the operation of networks enable us to place issues of language use in the complex system of relations that reveal the deeper connection s between phenomena with the help of networks embedded in each other. The realization of the metaphor that language also has a network structure is the basis of the classification methods of the terminological system, and likewise of the ways of creating terminology databases, which serve the purpose of providing easy and versatile accessibility to specialised knowledge.
    Date
    2. 9.2014 21:22:48
  2. Snajder, J.; Almic, P.: Modeling semantic compositionality of Croatian multiword expressions (2015) 0.02
    0.020671291 = product of:
      0.062013872 = sum of:
        0.062013872 = product of:
          0.093020804 = sum of:
            0.057281278 = weight(_text_:network in 2920) [ClassicSimilarity], result of:
              0.057281278 = score(doc=2920,freq=2.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.29521978 = fieldWeight in 2920, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2920)
            0.035739526 = weight(_text_:29 in 2920) [ClassicSimilarity], result of:
              0.035739526 = score(doc=2920,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23319192 = fieldWeight in 2920, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2920)
          0.6666667 = coord(2/3)
      0.33333334 = coord(1/3)
    
    Abstract
    A distinguishing feature of many multiword expressions (MWEs) is their semantic non-compositionality. Determining the semantic compositionality of MWEs is important for many natural language processing tasks. We address the task of modeling semantic compositionality of Croatian MWEs. We adopt a composition-based approach within the distributional semantics framework. We build and evaluate models based on Latent Semantic Analysis and the recently proposed neural network-based Skip-gram model, and experiment with different composition functions. We show that the compositionality scores predicted by the Skip-gram additive models correlate well with human judgments (=0.50). When framed as a classification task, the model achieves an accuracy of 0.64.
    Date
    29. 4.2016 12:42:17
  3. Radev, D.R.; Joseph, M.T.; Gibson, B.; Muthukrishnan, P.: ¬A bibliometric and network analysis of the field of computational linguistics (2016) 0.01
    0.012861087 = product of:
      0.03858326 = sum of:
        0.03858326 = product of:
          0.11574978 = sum of:
            0.11574978 = weight(_text_:network in 2764) [ClassicSimilarity], result of:
              0.11574978 = score(doc=2764,freq=6.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.59655833 = fieldWeight in 2764, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2764)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The ACL Anthology is a large collection of research papers in computational linguistics. Citation data were obtained using text extraction from a collection of PDF files with significant manual postprocessing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. We analyzed the networks of paper citations, author citations, and author collaborations in an attempt to identify the most central papers and authors. The analysis includes general network statistics, PageRank, metrics across publication years and venues, the impact factor and h-index, as well as other measures.
  4. Agarwal, B.; Ramampiaro, H.; Langseth, H.; Ruocco, M.: ¬A deep network model for paraphrase detection in short text messages (2018) 0.01
    0.010607645 = product of:
      0.031822935 = sum of:
        0.031822935 = product of:
          0.095468804 = sum of:
            0.095468804 = weight(_text_:network in 5043) [ClassicSimilarity], result of:
              0.095468804 = score(doc=5043,freq=8.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.492033 = fieldWeight in 5043, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5043)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper is concerned with paraphrase detection, i.e., identifying sentences that are semantically identical. The ability to detect similar sentences written in natural language is crucial for several applications, such as text mining, text summarization, plagiarism detection, authorship authentication and question answering. Recognizing this importance, we study in particular how to address the challenges with detecting paraphrases in user generated short texts, such as Twitter, which often contain language irregularity and noise, and do not necessarily contain as much semantic information as longer clean texts. We propose a novel deep neural network-based approach that relies on coarse-grained sentence modelling using a convolutional neural network (CNN) and a recurrent neural network (RNN) model, combined with a specific fine-grained word-level similarity matching model. More specifically, we develop a new architecture, called DeepParaphrase, which enables to create an informative semantic representation of each sentence by (1) using CNN to extract the local region information in form of important n-grams from the sentence, and (2) applying RNN to capture the long-term dependency information. In addition, we perform a comparative study on state-of-the-art approaches within paraphrase detection. An important insight from this study is that existing paraphrase approaches perform well when applied on clean texts, but they do not necessarily deliver good performance against noisy texts, and vice versa. In contrast, our evaluation has shown that the proposed DeepParaphrase-based approach achieves good results in both types of texts, thus making it more robust and generic than the existing approaches.
  5. Lian, T.; Yu, C.; Wang, W.; Yuan, Q.; Hou, Z.: Doctoral dissertations on tourism in China : a co-word analysis (2016) 0.01
    0.007500738 = product of:
      0.022502214 = sum of:
        0.022502214 = product of:
          0.06750664 = sum of:
            0.06750664 = weight(_text_:network in 3178) [ClassicSimilarity], result of:
              0.06750664 = score(doc=3178,freq=4.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.34791988 = fieldWeight in 3178, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3178)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The aim of this paper is to map the foci of research in doctoral dissertations on tourism in China. In the paper, coword analysis is applied, with keywords coming from six public dissertation databases, i.e. CDFD, Wanfang Data, NLC, CALIS, ISTIC, and NSTL, as well as some university libraries providing doctoral dissertations on tourism. Altogether we have examined 928 doctoral dissertations on tourism written between 1989 and 2013. Doctoral dissertations on tourism in China involve 36 first level disciplines and 102 secondary level disciplines. We collect the top 68 keywords of practical significance in tourism which are mentioned at least four times or more. These keywords are classified into 12 categories based on co-word analysis, including cluster analysis, strategic diagrams analysis, and social network analysis. According to the strategic diagram of the 12 categories, we find the mature and immature areas in tourism study. From social networks, we can see the social network maps of original co-occurrence matrix and k-cores analysis of binary matrix. The paper provides valuable insight into the study of tourism by analyzing doctoral dissertations on tourism in China.
  6. Snajder, J.: Distributional semantics of multi-word expressions (2013) 0.01
    0.006618432 = product of:
      0.019855294 = sum of:
        0.019855294 = product of:
          0.059565883 = sum of:
            0.059565883 = weight(_text_:29 in 2868) [ClassicSimilarity], result of:
              0.059565883 = score(doc=2868,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.38865322 = fieldWeight in 2868, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2868)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    29. 4.2016 12:04:50
  7. Engerer, V.: Informationswissenschaft und Linguistik. : kurze Geschichte eines fruchtbaren interdisziplinäaren Verhäaltnisses in drei Akten (2012) 0.01
    0.006618432 = product of:
      0.019855294 = sum of:
        0.019855294 = product of:
          0.059565883 = sum of:
            0.059565883 = weight(_text_:29 in 3376) [ClassicSimilarity], result of:
              0.059565883 = score(doc=3376,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.38865322 = fieldWeight in 3376, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3376)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    19. 2.2017 13:29:08
  8. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention Is all you need (2017) 0.01
    0.0063645868 = product of:
      0.01909376 = sum of:
        0.01909376 = product of:
          0.057281278 = sum of:
            0.057281278 = weight(_text_:network in 970) [ClassicSimilarity], result of:
              0.057281278 = score(doc=970,freq=2.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.29521978 = fieldWeight in 970, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.046875 = fieldNorm(doc=970)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
  9. Clark, M.; Kim, Y.; Kruschwitz, U.; Song, D.; Albakour, D.; Dignum, S.; Beresi, U.C.; Fasli, M.; Roeck, A De: Automatically structuring domain knowledge from text : an overview of current research (2012) 0.01
    0.0056159254 = product of:
      0.016847776 = sum of:
        0.016847776 = product of:
          0.050543327 = sum of:
            0.050543327 = weight(_text_:29 in 2738) [ClassicSimilarity], result of:
              0.050543327 = score(doc=2738,freq=4.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.3297832 = fieldWeight in 2738, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2738)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    29. 1.2016 18:29:51
  10. Levin, M.; Krawczyk, S.; Bethard, S.; Jurafsky, D.: Citation-based bootstrapping for large-scale author disambiguation (2012) 0.01
    0.0053038225 = product of:
      0.015911467 = sum of:
        0.015911467 = product of:
          0.047734402 = sum of:
            0.047734402 = weight(_text_:network in 246) [ClassicSimilarity], result of:
              0.047734402 = score(doc=246,freq=2.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.2460165 = fieldWeight in 246, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=246)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first "bootstrap" stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.
  11. Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.01
    0.0053038225 = product of:
      0.015911467 = sum of:
        0.015911467 = product of:
          0.047734402 = sum of:
            0.047734402 = weight(_text_:network in 2688) [ClassicSimilarity], result of:
              0.047734402 = score(doc=2688,freq=2.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.2460165 = fieldWeight in 2688, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2688)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.
  12. Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.01
    0.0053038225 = product of:
      0.015911467 = sum of:
        0.015911467 = product of:
          0.047734402 = sum of:
            0.047734402 = weight(_text_:network in 4675) [ClassicSimilarity], result of:
              0.047734402 = score(doc=4675,freq=2.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.2460165 = fieldWeight in 4675, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4675)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
  13. Lezius, W.: Morphy - Morphologie und Tagging für das Deutsche (2013) 0.01
    0.005247115 = product of:
      0.015741345 = sum of:
        0.015741345 = product of:
          0.047224034 = sum of:
            0.047224034 = weight(_text_:22 in 1490) [ClassicSimilarity], result of:
              0.047224034 = score(doc=1490,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.30952093 = fieldWeight in 1490, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1490)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    22. 3.2015 9:30:24
  14. Stoykova, V.; Petkova, E.: Automatic extraction of mathematical terms for precalculus (2012) 0.00
    0.0046329014 = product of:
      0.013898704 = sum of:
        0.013898704 = product of:
          0.041696113 = sum of:
            0.041696113 = weight(_text_:29 in 156) [ClassicSimilarity], result of:
              0.041696113 = score(doc=156,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.27205724 = fieldWeight in 156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=156)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    29. 5.2012 10:17:08
  15. Rayson, P.; Piao, S.; Sharoff, S.; Evert, S.; Moiron, B.V.: Multiword expressions : hard going or plain sailing? (2015) 0.00
    0.0046329014 = product of:
      0.013898704 = sum of:
        0.013898704 = product of:
          0.041696113 = sum of:
            0.041696113 = weight(_text_:29 in 2918) [ClassicSimilarity], result of:
              0.041696113 = score(doc=2918,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.27205724 = fieldWeight in 2918, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2918)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    29. 4.2016 12:05:56
  16. Babik, W.: Keywords as linguistic tools in information and knowledge organization (2017) 0.00
    0.0046329014 = product of:
      0.013898704 = sum of:
        0.013898704 = product of:
          0.041696113 = sum of:
            0.041696113 = weight(_text_:29 in 3510) [ClassicSimilarity], result of:
              0.041696113 = score(doc=3510,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.27205724 = fieldWeight in 3510, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3510)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Source
    Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
  17. Schöneberg, U.; Sperber, W.: POS tagging and its applications for mathematics (2014) 0.00
    0.003971059 = product of:
      0.011913176 = sum of:
        0.011913176 = product of:
          0.035739526 = sum of:
            0.035739526 = weight(_text_:29 in 1748) [ClassicSimilarity], result of:
              0.035739526 = score(doc=1748,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23319192 = fieldWeight in 1748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1748)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    29. 3.2015 19:34:37
  18. Geißler, S.: Maschinelles Lernen und NLP : Reif für die industrielle Anwendung! (2019) 0.00
    0.003971059 = product of:
      0.011913176 = sum of:
        0.011913176 = product of:
          0.035739526 = sum of:
            0.035739526 = weight(_text_:29 in 3547) [ClassicSimilarity], result of:
              0.035739526 = score(doc=3547,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23319192 = fieldWeight in 3547, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3547)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    2. 9.2019 19:29:24
  19. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.00
    0.0039353366 = product of:
      0.011806009 = sum of:
        0.011806009 = product of:
          0.035418026 = sum of:
            0.035418026 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.035418026 = score(doc=563,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    10. 1.2013 19:22:47
  20. Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.00
    0.0039353366 = product of:
      0.011806009 = sum of:
        0.011806009 = product of:
          0.035418026 = sum of:
            0.035418026 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
              0.035418026 = score(doc=1848,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23214069 = fieldWeight in 1848, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1848)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.