Search (11 results, page 1 of 1)

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.05

0.048393354 = product of:
  0.09678671 = sum of:
    0.09678671 = sum of:
      0.03996552 = weight(_text_:2 in 6752) [ClassicSimilarity], result of:
        0.03996552 = score(doc=6752,freq=4.0), product of:
          0.1294644 = queryWeight, product of:
            2.4695914 = idf(docFreq=10170, maxDocs=44218)
            0.05242341 = queryNorm
          0.30869892 = fieldWeight in 6752, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            2.4695914 = idf(docFreq=10170, maxDocs=44218)
            0.0625 = fieldNorm(doc=6752)
      0.056821186 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
        0.056821186 = score(doc=6752,freq=2.0), product of:
          0.18357785 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05242341 = queryNorm
          0.30952093 = fieldWeight in 6752, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0625 = fieldNorm(doc=6752)
  0.5 = coord(1/2)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15
Source: Artificial intelligence. 85(1996) nos.1/2, S.101-134

Chowdhury, G.G.: Natural language processing and information retrieval : pt.1: basic issues; pt.2: major applications (1991) 0.02

0.0152961165 = product of:
  0.030592233 = sum of:
    0.030592233 = product of:
      0.061184466 = sum of:
        0.061184466 = weight(_text_:2 in 3313) [ClassicSimilarity], result of:
          0.061184466 = score(doc=3313,freq=6.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.47259682 = fieldWeight in 3313, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.078125 = fieldNorm(doc=3313)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: IASLIC bulletin. 36(1991) no.2, S.45-49 (pt.1); S.51-59 (pt.2)

Zimmermann, H.H.: Wortrelationierung in der Sprachtechnik : Stilhilfen, Retrievalhilfen, Übersetzungshilfen (1992) 0.01

0.014987071 = product of:
  0.029974142 = sum of:
    0.029974142 = product of:
      0.059948284 = sum of:
        0.059948284 = weight(_text_:2 in 1372) [ClassicSimilarity], result of:
          0.059948284 = score(doc=1372,freq=4.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.4630484 = fieldWeight in 1372, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.09375 = fieldNorm(doc=1372)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Series: Fortschritte in der Wissensorganisation; Bd.2
Source: Kognitive Ansätze zum Ordnen und Darstellen von Wissen. 2. Tagung der Deutschen ISKO Sektion einschl. der Vorträge des Workshops "Thesauri als Werkzeuge der Sprachtechnologie", Weilburg, 15.-18.10.1991

Inhaltserschließung von Massendaten : zur Wirksamkeit informationslinguistischer Verfahren am Beispiel des deutschen Patentinformationssystems (1987) 0.01

0.012363703 = product of:
  0.024727406 = sum of:
    0.024727406 = product of:
      0.049454812 = sum of:
        0.049454812 = weight(_text_:2 in 6764) [ClassicSimilarity], result of:
          0.049454812 = score(doc=6764,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.38199544 = fieldWeight in 6764, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.109375 = fieldNorm(doc=6764)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Isbn: 3-487-07839-2

Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006) 0.01

0.010653973 = product of:
  0.021307945 = sum of:
    0.021307945 = product of:
      0.04261589 = sum of:
        0.04261589 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
          0.04261589 = score(doc=1746,freq=2.0), product of:
            0.18357785 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05242341 = queryNorm
            0.23214069 = fieldWeight in 1746, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1746)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2015 9:17:30

Malone, L.C.; Driscoll, J.R.; Pepe, J.W.: Modeling the performance of an automated keywording system (1991) 0.01

0.007064973 = product of:
  0.014129946 = sum of:
    0.014129946 = product of:
      0.028259892 = sum of:
        0.028259892 = weight(_text_:2 in 6682) [ClassicSimilarity], result of:
          0.028259892 = score(doc=6682,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.2182831 = fieldWeight in 6682, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0625 = fieldNorm(doc=6682)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Information processing and management. 27(1991) nos.2/3, S.145-151

Fagan, J.L.: ¬The effectiveness of a nonsyntactic approach to automatic phrase indexing for document retrieval (1989) 0.01
```
0.006244613 = product of:
  0.012489226 = sum of:
    0.012489226 = product of:
      0.024978451 = sum of:
        0.024978451 = weight(_text_:2 in 1845) [ClassicSimilarity], result of:
          0.024978451 = score(doc=1845,freq=4.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.19293682 = fieldWeight in 1845, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1845)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

It may be possible to improve the quality of automatic indexing systems by using complex descriptors, for example, phrases, in addition to the simple descriptors (words or word stems) that are normally used in automatically constructed representations of document content. This study is directed toward the goal of developing effective methods of identifying phrases in natural language text from which good quality phrase descriptors can be constructed. The effectiveness of one method, a simple nonsyntactic phrase indexing procedure, has been tested on five experimental document collections. The results have been analyzed in order to identify the inadequacies of the procedure, and to determine what kinds of information about text structure are needed in order to construct phrase descriptors that are good indicators of document content. Two primary conclusions have been reached: (1) In the retrieval experiments, the nonsyntactic phrase construction procedure did not consistently yield substantial improvements in effectiveness. It is therefore not likely that phrase indexing of this kind will prove to be an important method of enhancing the performance of automatic document indexing and retrieval systems in operational environments. (2) Many of the shortcomings of the nonsyntactic approach can be overcome by incorporating syntactic information into the phrase construction process. However, a general syntactic analysis facility may be required, since many useful sources of phrases cannot be exploited if only a limited inventory of syntactic patterns can be recognized. Further research should be conducted into methods of incorporating automatic syntactic analysis into content analysis for document retrieval.

Source

Journal of the American Society for Information Science. 40(1989) no.2, S.115-132
Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.01
```
0.006244613 = product of:
  0.012489226 = sum of:
    0.012489226 = product of:
      0.024978451 = sum of:
        0.024978451 = weight(_text_:2 in 5816) [ClassicSimilarity], result of:
          0.024978451 = score(doc=5816,freq=4.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.19293682 = fieldWeight in 5816, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5816)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.
Goller, C.; Löning, J.; Will, T.; Wolff, W.: Automatic document classification : a thourough evaluation of various methods (2000) 0.01
```
0.0052987295 = product of:
  0.010597459 = sum of:
    0.010597459 = product of:
      0.021194918 = sum of:
        0.021194918 = weight(_text_:2 in 5480) [ClassicSimilarity], result of:
          0.021194918 = score(doc=5480,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.16371232 = fieldWeight in 5480, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.046875 = fieldNorm(doc=5480)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

(Automatic) document classification is generally defined as content-based assignment of one or more predefined categories to documents. Usually, machine learning, statistical pattern recognition, or neural network approaches are used to construct classifiers automatically. In this paper we thoroughly evaluate a wide variety of these methods on a document classification task for German text. We evaluate different feature construction and selection methods and various classifiers. Our main results are: (1) feature selection is necessary not only to reduce learning and classification time, but also to avoid overfitting (even for Support Vector Machines); (2) surprisingly, our morphological analysis does not improve classification quality compared to a letter 5-gram approach; (3) Support Vector Machines are significantly better than all other classification methods
Needham, R.M.; Sparck Jones, K.: Keywords and clumps (1985) 0.00
```
0.004371229 = product of:
  0.008742458 = sum of:
    0.008742458 = product of:
      0.017484916 = sum of:
        0.017484916 = weight(_text_:2 in 3645) [ClassicSimilarity], result of:
          0.017484916 = score(doc=3645,freq=4.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.13505578 = fieldWeight in 3645, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3645)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The selection that follows was chosen as it represents "a very early paper an the possibilities allowed by computers an documentation." In the early 1960s computers were being used to provide simple automatic indexing systems wherein keywords were extracted from documents. The problem with such systems was that they lacked vocabulary control, thus documents related in subject matter were not always collocated in retrieval. To improve retrieval by improving recall is the raison d'être of vocabulary control tools such as classifications and thesauri. The question arose whether it was possible by automatic means to construct classes of terms, which when substituted, one for another, could be used to improve retrieval performance? One of the first theoretical approaches to this question was initiated by R. M. Needham and Karen Sparck Jones at the Cambridge Language Research Institute in England.t The question was later pursued using experimental methodologies by Sparck Jones, who, as a Senior Research Associate in the Computer Laboratory at the University of Cambridge, has devoted her life's work to research in information retrieval and automatic naturai language processing. Based an the principles of numerical taxonomy, automatic classification techniques start from the premise that two objects are similar to the degree that they share attributes in common. When these two objects are keywords, their similarity is measured in terms of the number of documents they index in common. Step 1 in automatic classification is to compute mathematically the degree to which two terms are similar. Step 2 is to group together those terms that are "most similar" to each other, forming equivalence classes of intersubstitutable terms. The technique for forming such classes varies and is the factor that characteristically distinguishes different approaches to automatic classification. The technique used by Needham and Sparck Jones, that of clumping, is described in the selection that follows. Questions that must be asked are whether the use of automatically generated classes really does improve retrieval performance and whether there is a true eco nomic advantage in substituting mechanical for manual labor. Several years after her work with clumping, Sparck Jones was to observe that while it was not wholly satisfactory in itself, it was valuable in that it stimulated research into automatic classification. To this it might be added that it was valuable in that it introduced to libraryl information science the methods of numerical taxonomy, thus stimulating us to think again about the fundamental nature and purpose of classification. In this connection it might be useful to review how automatically derived classes differ from those of manually constructed classifications: 1) the manner of their derivation is purely a posteriori, the ultimate operationalization of the principle of literary warrant; 2) the relationship between members forming such classes is essentially statistical; the members of a given class are similar to each other not because they possess the class-defining characteristic but by virtue of sharing a family resemblance; and finally, 3) automatically derived classes are not related meaningfully one to another, that is, they are not ordered in traditional hierarchical and precedence relationships.
Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.00
```
0.0035324865 = product of:
  0.007064973 = sum of:
    0.007064973 = product of:
      0.014129946 = sum of:
        0.014129946 = weight(_text_:2 in 1264) [ClassicSimilarity], result of:
          0.014129946 = score(doc=1264,freq=2.0), product of:
            0.1294644 = queryWeight, product of:
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.05242341 = queryNorm
            0.10914155 = fieldWeight in 1264, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.4695914 = idf(docFreq=10170, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).

Search (11 results, page 1 of 1)

Authors

Years

Languages

Types