Search (274 results, page 1 of 14)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.56

0.55651474 = product of:
  0.6492672 = sum of:
    0.06184506 = product of:
      0.18553518 = sum of:
        0.18553518 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.18553518 = score(doc=562,freq=2.0), product of:
            0.3301232 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.038938753 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.18553518 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18553518 = score(doc=562,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18553518 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18553518 = score(doc=562,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.014989593 = weight(_text_:with in 562) [ClassicSimilarity], result of:
      0.014989593 = score(doc=562,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.18553518 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.18553518 = score(doc=562,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.015826989 = product of:
      0.031653978 = sum of:
        0.031653978 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.031653978 = score(doc=562,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.85714287 = coord(6/7)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.42

0.42402217 = product of:
  0.593631 = sum of:
    0.18553518 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.18553518 = score(doc=563,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.18553518 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.18553518 = score(doc=563,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.021198487 = weight(_text_:with in 563) [ClassicSimilarity], result of:
      0.021198487 = score(doc=563,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.22591603 = fieldWeight in 563, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.18553518 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.18553518 = score(doc=563,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.015826989 = product of:
      0.031653978 = sum of:
        0.031653978 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.031653978 = score(doc=563,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.71428573 = coord(5/7)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.35

0.35340035 = product of:
  0.6184506 = sum of:
    0.06184506 = product of:
      0.18553518 = sum of:
        0.18553518 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.18553518 = score(doc=862,freq=2.0), product of:
            0.3301232 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.038938753 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.18553518 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.18553518 = score(doc=862,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.18553518 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.18553518 = score(doc=862,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.18553518 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.18553518 = score(doc=862,freq=2.0), product of:
        0.3301232 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.038938753 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.5714286 = coord(4/7)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Meng, K.; Ba, Z.; Ma, Y.; Li, G.: ¬A network coupling approach to detecting hierarchical linkages between science and technology (2024) 0.04
```
0.042336486 = product of:
  0.1481777 = sum of:
    0.12697922 = weight(_text_:interactions in 1205) [ClassicSimilarity], result of:
      0.12697922 = score(doc=1205,freq=4.0), product of:
        0.22965278 = queryWeight, product of:
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.038938753 = queryNorm
        0.55291826 = fieldWeight in 1205, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.046875 = fieldNorm(doc=1205)
    0.021198487 = weight(_text_:with in 1205) [ClassicSimilarity], result of:
      0.021198487 = score(doc=1205,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.22591603 = fieldWeight in 1205, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=1205)
  0.2857143 = coord(2/7)
```
Abstract

Detecting science-technology hierarchical linkages is beneficial for understanding deep interactions between science and technology (S&T). Previous studies have mainly focused on linear linkages between S&T but ignored their structural linkages. In this paper, we propose a network coupling approach to inspect hierarchical interactions of S&T by integrating their knowledge linkages and structural linkages. S&T knowledge networks are first enhanced with bidirectional encoder representation from transformers (BERT) knowledge alignment, and then their hierarchical structures are identified based on K-core decomposition. Hierarchical coupling preferences and strengths of the S&T networks over time are further calculated based on similarities of coupling nodes' degree distribution and similarities of coupling edges' weight distribution. Extensive experimental results indicate that our approach is feasible and robust in identifying the coupling hierarchy with superior performance compared to other isomorphism and dissimilarity algorithms. Our research extends the mindset of S&T linkage measurement by identifying patterns and paths of the interaction of S&T hierarchical knowledge.

Griffiths, T.L.; Steyvers, M.: ¬A probabilistic approach to semantic representation (2002) 0.03

0.030465176 = product of:
  0.10662811 = sum of:
    0.02826465 = weight(_text_:with in 3671) [ClassicSimilarity], result of:
      0.02826465 = score(doc=3671,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.30122137 = fieldWeight in 3671, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=3671)
    0.07836346 = product of:
      0.15672693 = sum of:
        0.15672693 = weight(_text_:humans in 3671) [ClassicSimilarity], result of:
          0.15672693 = score(doc=3671,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.5964558 = fieldWeight in 3671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.0625 = fieldNorm(doc=3671)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Semantic networks produced from human data have statistical properties that cannot be easily captured by spatial representations. We explore a probabilistic approach to semantic representation that explicitly models the probability with which words occurin diffrent contexts, and hence captures the probabilistic relationships between words. We show that this representation has statistical properties consistent with the large-scale structure of semantic networks constructed by humans, and trace the origins of these properties.

Altmann, E.G.; Cristadoro, G.; Esposti, M.D.: On the origin of long-range correlations in texts (2012) 0.03

0.029936418 = product of:
  0.104777455 = sum of:
    0.08978786 = weight(_text_:interactions in 330) [ClassicSimilarity], result of:
      0.08978786 = score(doc=330,freq=2.0), product of:
        0.22965278 = queryWeight, product of:
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.038938753 = queryNorm
        0.39097226 = fieldWeight in 330, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.046875 = fieldNorm(doc=330)
    0.014989593 = weight(_text_:with in 330) [ClassicSimilarity], result of:
      0.014989593 = score(doc=330,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 330, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=330)
  0.2857143 = coord(2/7)

Abstract: The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.

Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.02
```
0.024947014 = product of:
  0.087314546 = sum of:
    0.074823216 = weight(_text_:interactions in 2123) [ClassicSimilarity], result of:
      0.074823216 = score(doc=2123,freq=2.0), product of:
        0.22965278 = queryWeight, product of:
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.038938753 = queryNorm
        0.3258102 = fieldWeight in 2123, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2123)
    0.012491328 = weight(_text_:with in 2123) [ClassicSimilarity], result of:
      0.012491328 = score(doc=2123,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1331223 = fieldWeight in 2123, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2123)
  0.2857143 = coord(2/7)
```
Abstract

Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.
Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; Agarwal, S.; Herbert-Voss, A.; Krueger, G.; Henighan, T.; Child, R.; Ramesh, A.; Ziegler, D.M.; Wu, J.; Winter, C.; Hesse, C.; Chen, M.; Sigler, E.; Litwin, M.; Gray, S.; Chess, B.; Clark, J.; Berner, C.; McCandlish, S.; Radford, A.; Sutskever, I.; Amodei, D.: Language models are few-shot learners (2020) 0.02
```
0.021542134 = product of:
  0.07539746 = sum of:
    0.019986123 = weight(_text_:with in 872) [ClassicSimilarity], result of:
      0.019986123 = score(doc=872,freq=8.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 872, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.03125 = fieldNorm(doc=872)
    0.05541134 = product of:
      0.11082268 = sum of:
        0.11082268 = weight(_text_:humans in 872) [ClassicSimilarity], result of:
          0.11082268 = score(doc=872,freq=4.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.42175797 = fieldWeight in 872, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.03125 = fieldNorm(doc=872)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general.

Natural language processing and speech technology : Results of the 3rd KONVENS Conference, Bielefeld, October 1996 (1996) 0.02

0.021074913 = product of:
  0.07376219 = sum of:
    0.014989593 = weight(_text_:with in 7291) [ClassicSimilarity], result of:
      0.014989593 = score(doc=7291,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 7291, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=7291)
    0.058772597 = product of:
      0.117545195 = sum of:
        0.117545195 = weight(_text_:humans in 7291) [ClassicSimilarity], result of:
          0.117545195 = score(doc=7291,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.44734186 = fieldWeight in 7291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.046875 = fieldNorm(doc=7291)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Content: Enthält u.a. die Beiträge: HILDEBRANDT, B. u.a.: Kognitive Modellierung von Sprach- und Bildverstehen; KELLER, F.: How do humans deal with ungrammatical input? Experimental evidence and computational modelling; MARX, J:: Die 'Computer-Talk-These' in der Sprachgenerierung: Hinweise zur Gestaltung natürlichsprachlicher Zustandsanzeigen in multimodalen Informationssystemen; SCHULTZ, T. u. H. SOLTAU: Automatische Identifizierung spontan gesprochener Sprachen mit neuronalen Netzen; WAUSCHKUHN, O.: Ein Werkzeug zur partiellen syntaktischen Analyse deutscher Textkorpora; LEZIUS, W., R. RAPP u. M. WETTLER: A morphology-system and part-of-speech tagger for German; KONRAD, K. u.a.: CLEARS: ein Werkzeug für Ausbildung und Forschung in der Computerlinguistik

McMahon, J.G.; Smith, F.J.: Improved statistical language model performance with automatic generated word hierarchies (1996) 0.02

0.02054439 = product of:
  0.07190536 = sum of:
    0.034975715 = weight(_text_:with in 3164) [ClassicSimilarity], result of:
      0.034975715 = score(doc=3164,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.3727424 = fieldWeight in 3164, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.109375 = fieldNorm(doc=3164)
    0.036929645 = product of:
      0.07385929 = sum of:
        0.07385929 = weight(_text_:22 in 3164) [ClassicSimilarity], result of:
          0.07385929 = score(doc=3164,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.5416616 = fieldWeight in 3164, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=3164)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Source: Computational linguistics. 22(1996) no.2, S.217-248

Ruge, G.: ¬A spreading activation network for automatic generation of thesaurus relationships (1991) 0.02

0.02054439 = product of:
  0.07190536 = sum of:
    0.034975715 = weight(_text_:with in 4506) [ClassicSimilarity], result of:
      0.034975715 = score(doc=4506,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.3727424 = fieldWeight in 4506, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.109375 = fieldNorm(doc=4506)
    0.036929645 = product of:
      0.07385929 = sum of:
        0.07385929 = weight(_text_:22 in 4506) [ClassicSimilarity], result of:
          0.07385929 = score(doc=4506,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.5416616 = fieldWeight in 4506, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4506)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 8.10.2000 11:52:22
Source: Library science with a slant to documentation. 28(1991) no.4, S.125-130

Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.02
```
0.019040734 = product of:
  0.06664257 = sum of:
    0.017665405 = weight(_text_:with in 1842) [ClassicSimilarity], result of:
      0.017665405 = score(doc=1842,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.18826336 = fieldWeight in 1842, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1842)
    0.048977163 = product of:
      0.097954325 = sum of:
        0.097954325 = weight(_text_:humans in 1842) [ClassicSimilarity], result of:
          0.097954325 = score(doc=1842,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.37278488 = fieldWeight in 1842, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.
Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.01
```
0.014049942 = product of:
  0.049174793 = sum of:
    0.009993061 = weight(_text_:with in 1264) [ClassicSimilarity], result of:
      0.009993061 = score(doc=1264,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.10649783 = fieldWeight in 1264, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.03125 = fieldNorm(doc=1264)
    0.03918173 = product of:
      0.07836346 = sum of:
        0.07836346 = weight(_text_:humans in 1264) [ClassicSimilarity], result of:
          0.07836346 = score(doc=1264,freq=2.0), product of:
            0.26276368 = queryWeight, product of:
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.038938753 = queryNorm
            0.2982279 = fieldWeight in 1264, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.7481275 = idf(docFreq=140, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)
```
Abstract

Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).

Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.01

0.013087479 = product of:
  0.045806177 = sum of:
    0.029979186 = weight(_text_:with in 1848) [ClassicSimilarity], result of:
      0.029979186 = score(doc=1848,freq=8.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.3194935 = fieldWeight in 1848, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=1848)
    0.015826989 = product of:
      0.031653978 = sum of:
        0.031653978 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
          0.031653978 = score(doc=1848,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.23214069 = fieldWeight in 1848, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1848)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.

Nait-Baha, L.; Jackiewicz, A.; Djioua, B.; Laublet, P.: Query reformulation for information retrieval on the Web using the point of view methodology : preliminary results (2001) 0.01
```
0.0128268385 = product of:
  0.08978786 = sum of:
    0.08978786 = weight(_text_:interactions in 249) [ClassicSimilarity], result of:
      0.08978786 = score(doc=249,freq=2.0), product of:
        0.22965278 = queryWeight, product of:
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.038938753 = queryNorm
        0.39097226 = fieldWeight in 249, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.046875 = fieldNorm(doc=249)
  0.14285715 = coord(1/7)
```
Abstract

The work we are presenting is devoted to the information collected on the WWW. By the term collected we mean the whole process of retrieving, extracting and presenting results to the user. This research is part of the RAP (Research, Analyze, Propose) project in which we propose to combine two methods: (i) query reformulation using linguistic markers according to a given point of view; and (ii) text semantic analysis by means of contextual exploration results (Descles, 1991). The general project architecture describing the interactions between the users, the RAP system and the WWW search engines is presented in Nait-Baha et al. (1998). We will focus this paper on showing how we use linguistic markers to reformulate the queries according to a given point of view

Wanner, L.: Lexical choice in text generation and machine translation (1996) 0.01

0.011739651 = product of:
  0.041088775 = sum of:
    0.019986123 = weight(_text_:with in 8521) [ClassicSimilarity], result of:
      0.019986123 = score(doc=8521,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 8521, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=8521)
    0.021102654 = product of:
      0.042205308 = sum of:
        0.042205308 = weight(_text_:22 in 8521) [ClassicSimilarity], result of:
          0.042205308 = score(doc=8521,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.30952093 = fieldWeight in 8521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=8521)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: Presents the state of the art in lexical choice research in text generation and machine translation. Discusses the existing implementations with respect to: the place of lexical choice in the overall generation rates; the information flow within the generation process and the consequences thereof for lexical choice; the internal organization of the lexical choice process; and the phenomena covered by lexical choice. Identifies possible future directions in lexical choice research
Date: 31. 7.1996 9:22:19

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.011739651 = product of:
  0.041088775 = sum of:
    0.019986123 = weight(_text_:with in 6752) [ClassicSimilarity], result of:
      0.019986123 = score(doc=6752,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 6752, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=6752)
    0.021102654 = product of:
      0.042205308 = sum of:
        0.042205308 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.042205308 = score(doc=6752,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15

Morris, V.: Automated language identification of bibliographic resources (2020) 0.01

0.011739651 = product of:
  0.041088775 = sum of:
    0.019986123 = weight(_text_:with in 5749) [ClassicSimilarity], result of:
      0.019986123 = score(doc=5749,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.21299566 = fieldWeight in 5749, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0625 = fieldNorm(doc=5749)
    0.021102654 = product of:
      0.042205308 = sum of:
        0.042205308 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
          0.042205308 = score(doc=5749,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.30952093 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: This article describes experiments in the use of machine learning techniques at the British Library to assign language codes to catalog records, in order to provide information about the language of content of the resources described. In the first phase of the project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records.
Date: 2. 3.2020 19:04:22

Stede, M.: Lexicalization in natural language generation (2002) 0.01
```
0.010689031 = product of:
  0.074823216 = sum of:
    0.074823216 = weight(_text_:interactions in 4245) [ClassicSimilarity], result of:
      0.074823216 = score(doc=4245,freq=2.0), product of:
        0.22965278 = queryWeight, product of:
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.038938753 = queryNorm
        0.3258102 = fieldWeight in 4245, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.8977947 = idf(docFreq=329, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4245)
  0.14285715 = coord(1/7)
```
Abstract

Natural language generation (NLG), the automatic production of text by Computers, is commonly seen as a process consisting of the following distinct phases: Obviously, choosing words is a central aspect of generatiog language. In which of the these phases it should take place is not entirely clear, however. The decision depends an various factors: what exactly is seen as an individual lexical item; how the relation between word meaning and background knowledge (concepts) is defined; how one accounts for the interactions between individual lexical choices in the Same sentence; what criteria are employed for choosing between similar words; whether or not output is required in one or more languages. This article surveys these issues and the answers that have been proposed in NLG research. For many applications of natural language processing, large scale lexical resources have become available in recent years, such as the WordNet database. In language generation, however, generic lexicons are not in use yet; rather, almost every generation project develops its own format for lexical representations. The reason is that the entries of a generation lexicon need their specific interfaces to the Input representations processed by the generator; lexical semantics in an NLG lexicon needs to be tailored to the Input. Ort the other hand, the large lexicons used for language analysis typically have only very limited semantic information at all. Yet the syntactic behavior of words remains the same regardless of the particular application; thus, it should be possible to build at least parts of generic NLG lexical entries automatically, which could then be used by different systems.

Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.01

0.010578708 = product of:
  0.037025474 = sum of:
    0.021198487 = weight(_text_:with in 75) [ClassicSimilarity], result of:
      0.021198487 = score(doc=75,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.22591603 = fieldWeight in 75, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=75)
    0.015826989 = product of:
      0.031653978 = sum of:
        0.031653978 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
          0.031653978 = score(doc=75,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.23214069 = fieldWeight in 75, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=75)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Abstract: A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
Date: 30.12.2001 19:01:22

Search (274 results, page 1 of 14)

Authors

Years

Languages

Types

Themes

Subjects

Classifications