Search (131 results, page 3 of 7)

Whitelock, P.; Kilby, K.: Linguistic and computational techniques in machine translation system design : 2nd ed (1995) 0.01

0.006618432 = product of:
  0.019855294 = sum of:
    0.019855294 = product of:
      0.059565883 = sum of:
        0.059565883 = weight(_text_:29 in 1750) [ClassicSimilarity], result of:
          0.059565883 = score(doc=1750,freq=2.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.38865322 = fieldWeight in 1750, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=1750)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 29. 3.1996 18:28:09

Rau, L.F.: Conceptual information extraction and retrieval from natural language input (198) 0.01

0.006618432 = product of:
  0.019855294 = sum of:
    0.019855294 = product of:
      0.059565883 = sum of:
        0.059565883 = weight(_text_:29 in 1955) [ClassicSimilarity], result of:
          0.059565883 = score(doc=1955,freq=2.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.38865322 = fieldWeight in 1955, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=1955)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 16. 8.1998 13:29:20

Sheremet'eva, S.O.: Teoreticheskie i metodologicheskie problemy inzhenernoi lingvistiki (1998) 0.01

0.006618432 = product of:
  0.019855294 = sum of:
    0.019855294 = product of:
      0.059565883 = sum of:
        0.059565883 = weight(_text_:29 in 6316) [ClassicSimilarity], result of:
          0.059565883 = score(doc=6316,freq=2.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.38865322 = fieldWeight in 6316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=6316)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 6. 3.1999 13:56:29

Liu, S.; Liu, F.; Yu, C.; Meng, W.: ¬An effective approach to document retrieval via utilizing WordNet and recognizing phrases (2004) 0.01

0.006618432 = product of:
  0.019855294 = sum of:
    0.019855294 = product of:
      0.059565883 = sum of:
        0.059565883 = weight(_text_:29 in 4078) [ClassicSimilarity], result of:
          0.059565883 = score(doc=4078,freq=2.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.38865322 = fieldWeight in 4078, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=4078)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 10.10.2005 10:29:08

Snajder, J.: Distributional semantics of multi-word expressions (2013) 0.01

0.006618432 = product of:
  0.019855294 = sum of:
    0.019855294 = product of:
      0.059565883 = sum of:
        0.059565883 = weight(_text_:29 in 2868) [ClassicSimilarity], result of:
          0.059565883 = score(doc=2868,freq=2.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.38865322 = fieldWeight in 2868, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=2868)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 29. 4.2016 12:04:50

Engerer, V.: Informationswissenschaft und Linguistik. : kurze Geschichte eines fruchtbaren interdisziplinäaren Verhäaltnisses in drei Akten (2012) 0.01

0.006618432 = product of:
  0.019855294 = sum of:
    0.019855294 = product of:
      0.059565883 = sum of:
        0.059565883 = weight(_text_:29 in 3376) [ClassicSimilarity], result of:
          0.059565883 = score(doc=3376,freq=2.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.38865322 = fieldWeight in 3376, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=3376)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 19. 2.2017 13:29:08

Hutchins, J.: From first conception to first demonstration : the nascent years of machine translation, 1947-1954. A chronology (1997) 0.01

0.006558894 = product of:
  0.019676682 = sum of:
    0.019676682 = product of:
      0.059030045 = sum of:
        0.059030045 = weight(_text_:22 in 1463) [ClassicSimilarity], result of:
          0.059030045 = score(doc=1463,freq=2.0), product of:
            0.15257138 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043569047 = queryNorm
            0.38690117 = fieldWeight in 1463, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1463)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 31. 7.1996 9:22:19

Kuhlmann, U.; Monnerjahn, P.: Sprache auf Knopfdruck : Sieben automatische Übersetzungsprogramme im Test (2000) 0.01

0.006558894 = product of:
  0.019676682 = sum of:
    0.019676682 = product of:
      0.059030045 = sum of:
        0.059030045 = weight(_text_:22 in 5428) [ClassicSimilarity], result of:
          0.059030045 = score(doc=5428,freq=2.0), product of:
            0.15257138 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043569047 = queryNorm
            0.38690117 = fieldWeight in 5428, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=5428)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: c't. 2000, H.22, S.220-229

Lezius, W.; Rapp, R.; Wettler, M.: ¬A morphology-system and part-of-speech tagger for German (1996) 0.01

0.006558894 = product of:
  0.019676682 = sum of:
    0.019676682 = product of:
      0.059030045 = sum of:
        0.059030045 = weight(_text_:22 in 1693) [ClassicSimilarity], result of:
          0.059030045 = score(doc=1693,freq=2.0), product of:
            0.15257138 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043569047 = queryNorm
            0.38690117 = fieldWeight in 1693, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1693)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 22. 3.2015 9:37:18

Yang, Y.; Wilbur, J.: Using corpus statistics to remove redundant words in text categorization (1996) 0.01
```
0.0063645868 = product of:
  0.01909376 = sum of:
    0.01909376 = product of:
      0.057281278 = sum of:
        0.057281278 = weight(_text_:network in 4199) [ClassicSimilarity], result of:
          0.057281278 = score(doc=4199,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.29521978 = fieldWeight in 4199, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=4199)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

This article studies aggressive word removal in text categorization to reduce the noice in free texts to enhance the computational efficiency of categorization. We use a novel stop word identification method to automatically generate domain specific stoplists which are much larger than a conventional domain-independent stoplist. In our tests with 3 categorization methods on text collections from different domains/applications, significant numbers of words were removed without sacrificing categorization effectiveness. In the test of the Expert Network method on CACM documents, for example, an 87% removal of unique qords reduced the vocabulary of documents from 8.002 distinct words to 1.045 words, which resulted in a 63% time savings and a 74% memory savings in the computation of category ranking, with a 10% precision improvement on average over not using word removal. It is evident in this study that automated word removal based on corpus statistics has a practical and significant impact on the computational tractability of categorization methods in large databases
Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.01
```
0.0063645868 = product of:
  0.01909376 = sum of:
    0.01909376 = product of:
      0.057281278 = sum of:
        0.057281278 = weight(_text_:network in 5480) [ClassicSimilarity], result of:
          0.057281278 = score(doc=5480,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.29521978 = fieldWeight in 5480, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I.: Attention Is all you need (2017) 0.01
```
0.0063645868 = product of:
  0.01909376 = sum of:
    0.01909376 = product of:
      0.057281278 = sum of:
        0.057281278 = weight(_text_:network in 970) [ClassicSimilarity], result of:
          0.057281278 = score(doc=970,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.29521978 = fieldWeight in 970, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=970)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.8 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

Clark, M.; Kim, Y.; Kruschwitz, U.; Song, D.; Albakour, D.; Dignum, S.; Beresi, U.C.; Fasli, M.; Roeck, A De: Automatically structuring domain knowledge from text : an overview of current research (2012) 0.01

0.0056159254 = product of:
  0.016847776 = sum of:
    0.016847776 = product of:
      0.050543327 = sum of:
        0.050543327 = weight(_text_:29 in 2738) [ClassicSimilarity], result of:
          0.050543327 = score(doc=2738,freq=4.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.3297832 = fieldWeight in 2738, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=2738)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Date: 29. 1.2016 18:29:51

Warner, J.: Analogies between linguistics and information theory (2007) 0.01
```
0.0053038225 = product of:
  0.015911467 = sum of:
    0.015911467 = product of:
      0.047734402 = sum of:
        0.047734402 = weight(_text_:network in 138) [ClassicSimilarity], result of:
          0.047734402 = score(doc=138,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.2460165 = fieldWeight in 138, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=138)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

An analogy is established between the syntagm and paradigm from Saussurean linguistics and the message and messages for selection from the information theory initiated by Claude Shannon. The analogy is pursued both as an end in itself and for its analytic value in understanding patterns of retrieval from full-text systems. The multivalency of individual words when isolated from their syntagm is contrasted with the relative stability of meaning of multiword sequences, when searching ordinary written discourse. The syntagm is understood as the linear sequence of oral and written language. Saussure's understanding of the word, as a unit that compels recognition by the mind, is endorsed, although not regarded as final. The lesser multivalency of multiword sequences is understood as the greater determination of signification by the extended syntagm. The paradigm is primarily understood as the network of associations a word acquires when considered apart from the syntagm. The restriction of information theory to expression or signals, and its focus on the combinatorial aspects of the message, is sustained. The message in the model of communication in information theory can include sequences of written language. Shannon's understanding of the written word, as a cohesive group of letters, with strong internal statistical influences, is added to the Saussurean conception. Sequences of more than one word are regarded as weakly correlated concatenations of cohesive units.
Levin, M.; Krawczyk, S.; Bethard, S.; Jurafsky, D.: Citation-based bootstrapping for large-scale author disambiguation (2012) 0.01
```
0.0053038225 = product of:
  0.015911467 = sum of:
    0.015911467 = product of:
      0.047734402 = sum of:
        0.047734402 = weight(_text_:network in 246) [ClassicSimilarity], result of:
          0.047734402 = score(doc=246,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.2460165 = fieldWeight in 246, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=246)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first "bootstrap" stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.
Rindflesch, T.C.; Fizsman, M.: The interaction of domain knowledge and linguistic structure in natural language processing : interpreting hypernymic propositions in biomedical text (2003) 0.01
```
0.0053038225 = product of:
  0.015911467 = sum of:
    0.015911467 = product of:
      0.047734402 = sum of:
        0.047734402 = weight(_text_:network in 2097) [ClassicSimilarity], result of:
          0.047734402 = score(doc=2097,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.2460165 = fieldWeight in 2097, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2097)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Interpretation of semantic propositions in free-text documents such as MEDLINE citations would provide valuable support for biomedical applications, and several approaches to semantic interpretation are being pursued in the biomedical informatics community. In this paper, we describe a methodology for interpreting linguistic structures that encode hypernymic propositions, in which a more specific concept is in a taxonomic relationship with a more general concept. In order to effectively process these constructions, we exploit underspecified syntactic analysis and structured domain knowledge from the Unified Medical Language System (UMLS). After introducing the syntactic processing on which our system depends, we focus on the UMLS knowledge that supports interpretation of hypernymic propositions. We first use semantic groups from the Semantic Network to ensure that the two concepts involved are compatible; hierarchical information in the Metathesaurus then determines which concept is more general and which more specific. A preliminary evaluation of a sample based on the semantic group Chemicals and Drugs provides 83% precision. An error analysis was conducted and potential solutions to the problems encountered are presented. The research discussed here serves as a paradigm for investigating the interaction between domain knowledge and linguistic structure in natural language processing, and could also make a contribution to research on automatic processing of discourse structure. Additional implications of the system we present include its integration in advanced semantic interpretation processors for biomedical text and its use for information extraction in specific domains. The approach has the potential to support a range of applications, including information retrieval and ontology engineering.
Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.01
```
0.0053038225 = product of:
  0.015911467 = sum of:
    0.015911467 = product of:
      0.047734402 = sum of:
        0.047734402 = weight(_text_:network in 2688) [ClassicSimilarity], result of:
          0.047734402 = score(doc=2688,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.2460165 = fieldWeight in 2688, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2688)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.
Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.01
```
0.0053038225 = product of:
  0.015911467 = sum of:
    0.015911467 = product of:
      0.047734402 = sum of:
        0.047734402 = weight(_text_:network in 4675) [ClassicSimilarity], result of:
          0.047734402 = score(doc=4675,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.2460165 = fieldWeight in 4675, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4675)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.
Soni, S.; Lerman, K.; Eisenstein, J.: Follow the leader : documents on the leading edge of semantic change get more citations (2021) 0.01
```
0.0053038225 = product of:
  0.015911467 = sum of:
    0.015911467 = product of:
      0.047734402 = sum of:
        0.047734402 = weight(_text_:network in 169) [ClassicSimilarity], result of:
          0.047734402 = score(doc=169,freq=2.0), product of:
            0.19402927 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.043569047 = queryNorm
            0.2460165 = fieldWeight in 169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0390625 = fieldNorm(doc=169)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)
```
Abstract

Diachronic word embeddings-vector representations of words over time-offer remarkable insights into the evolution of language and provide a tool for quantifying sociocultural change from text documents. Prior work has used such embeddings to identify shifts in the meaning of individual words. However, simply knowing that a word has changed in meaning is insufficient to identify the instances of word usage that convey the historical meaning or the newer meaning. In this study, we link diachronic word embeddings to documents, by situating those documents as leaders or laggards with respect to ongoing semantic changes. Specifically, we propose a novel method to quantify the degree of semantic progressiveness in each word usage, and then show how these usages can be aggregated to obtain scores for each document. We analyze two large collections of documents, representing legal opinions and scientific articles. Documents that are scored as semantically progressive receive a larger number of citations, indicating that they are especially influential. Our work thus provides a new technique for identifying lexical semantic leaders and demonstrates a new link between progressive use of language and influence in a citation network.

Chiba, K.; Kyojima, M.: Document transformation based on syntax-directed free translation (1995) 0.01

0.0052947453 = product of:
  0.015884236 = sum of:
    0.015884236 = product of:
      0.047652703 = sum of:
        0.047652703 = weight(_text_:29 in 4069) [ClassicSimilarity], result of:
          0.047652703 = score(doc=4069,freq=2.0), product of:
            0.15326229 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.043569047 = queryNorm
            0.31092256 = fieldWeight in 4069, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=4069)
      0.33333334 = coord(1/3)
  0.33333334 = coord(1/3)

Source: Electronic publishing. 8(1995) no.1, S.15-29

Search (131 results, page 3 of 7)

Authors

Years

Languages

Types

Themes

Subjects

Classifications