Search (90 results, page 1 of 5)

Becker, D.: Automated language processing (1981) 0.08

0.0763061 = product of:
  0.3052244 = sum of:
    0.3052244 = weight(_text_:becker in 287) [ClassicSimilarity], result of:
      0.3052244 = score(doc=287,freq=2.0), product of:
        0.25693014 = queryWeight, product of:
          6.7201533 = idf(docFreq=144, maxDocs=44218)
          0.03823278 = queryNorm
        1.1879665 = fieldWeight in 287, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.7201533 = idf(docFreq=144, maxDocs=44218)
          0.125 = fieldNorm(doc=287)
  0.25 = coord(1/4)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07

0.0658593 = product of:
  0.0878124 = sum of:
    0.036434274 = product of:
      0.18217137 = sum of:
        0.18217137 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.18217137 = score(doc=562,freq=2.0), product of:
            0.32413796 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03823278 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.2 = coord(1/5)
    0.035838082 = weight(_text_:data in 562) [ClassicSimilarity], result of:
      0.035838082 = score(doc=562,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.29644224 = fieldWeight in 562, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.015540041 = product of:
      0.031080082 = sum of:
        0.031080082 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.031080082 = score(doc=562,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Source: Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK

Basili, R.; Pazienza, M.T.; Velardi, P.: ¬An empirical symbolic approach to natural language processing (1996) 0.03

0.03425208 = product of:
  0.06850416 = sum of:
    0.04778411 = weight(_text_:data in 6753) [ClassicSimilarity], result of:
      0.04778411 = score(doc=6753,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3952563 = fieldWeight in 6753, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=6753)
    0.020720055 = product of:
      0.04144011 = sum of:
        0.04144011 = weight(_text_:22 in 6753) [ClassicSimilarity], result of:
          0.04144011 = score(doc=6753,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.30952093 = fieldWeight in 6753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6753)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Describes and evaluates the results of a large scale lexical learning system, ARISTO-LEX, that uses a combination of probabilisitc and knowledge based methods for the acquisition of selectional restrictions of words in sublanguages. Presents experimental data obtained from different corpora in different doamins and languages, and shows that the acquired lexical data not only have practical applications in natural language processing, but they are useful for a comparative analysis of sublanguages
Date: 6. 3.1997 16:22:15

Liddy, E.D.: Natural language processing for information retrieval and knowledge discovery (1998) 0.02

0.02384748 = product of:
  0.04769496 = sum of:
    0.02956491 = weight(_text_:data in 2345) [ClassicSimilarity], result of:
      0.02956491 = score(doc=2345,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.24455236 = fieldWeight in 2345, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2345)
    0.01813005 = product of:
      0.0362601 = sum of:
        0.0362601 = weight(_text_:22 in 2345) [ClassicSimilarity], result of:
          0.0362601 = score(doc=2345,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.2708308 = fieldWeight in 2345, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2345)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.02

0.020440696 = product of:
  0.04088139 = sum of:
    0.02534135 = weight(_text_:data in 75) [ClassicSimilarity], result of:
      0.02534135 = score(doc=75,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.2096163 = fieldWeight in 75, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=75)
    0.015540041 = product of:
      0.031080082 = sum of:
        0.031080082 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
          0.031080082 = score(doc=75,freq=2.0), product of:
            0.13388468 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03823278 = queryNorm
            0.23214069 = fieldWeight in 75, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=75)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
Date: 30.12.2001 19:01:22

Arsenault, C.: Aggregation consistency and frequency of Chinese words and characters (2006) 0.02
```
0.015838344 = product of:
  0.063353375 = sum of:
    0.063353375 = weight(_text_:data in 609) [ClassicSimilarity], result of:
      0.063353375 = score(doc=609,freq=18.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.52404076 = fieldWeight in 609, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=609)
  0.25 = coord(1/4)
```
Abstract

Purpose - Aims to measure syllable aggregation consistency of Romanized Chinese data in the title fields of bibliographic records. Also aims to verify if the term frequency distributions satisfy conventional bibliometric laws. Design/methodology/approach - Uses Cooper's interindexer formula to evaluate aggregation consistency within and between two sets of Chinese bibliographic data. Compares the term frequency distributions of polysyllabic words and monosyllabic characters (for vernacular and Romanized data) with the Lotka and the generalised Zipf theoretical distributions. The fits are tested with the Kolmogorov-Smirnov test. Findings - Finds high internal aggregation consistency within each data set but some aggregation discrepancy between sets. Shows that word (polysyllabic) distributions satisfy Lotka's law but that character (monosyllabic) distributions do not abide by the law. Research limitations/implications - The findings are limited to only two sets of bibliographic data (for aggregation consistency analysis) and to one set of data for the frequency distribution analysis. Only two bibliometric distributions are tested. Internal consistency within each database remains fairly high. Therefore the main argument against syllable aggregation does not appear to hold true. The analysis revealed that Chinese words and characters behave differently in terms of frequency distribution but that there is no noticeable difference between vernacular and Romanized data. The distribution of Romanized characters exhibits the worst case in terms of fit to either Lotka's or Zipf's laws, which indicates that Romanized data in aggregated form appear to be a preferable option. Originality/value - Provides empirical data on consistency and distribution of Romanized Chinese titles in bibliographic records.
Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.01
```
0.014932535 = product of:
  0.05973014 = sum of:
    0.05973014 = weight(_text_:data in 392) [ClassicSimilarity], result of:
      0.05973014 = score(doc=392,freq=16.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.49407038 = fieldWeight in 392, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=392)
  0.25 = coord(1/4)
```
Abstract

Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
Ruge, G.: Experiments on linguistically-based term associations (1992) 0.01
```
0.014630836 = product of:
  0.058523346 = sum of:
    0.058523346 = weight(_text_:data in 1810) [ClassicSimilarity], result of:
      0.058523346 = score(doc=1810,freq=6.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.48408815 = fieldWeight in 1810, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=1810)
  0.25 = coord(1/4)
```
Abstract

Describes the hyperterm system REALIST (REtrieval Aids by LInguistic and STatistics) and describes its semantic component. The semantic component of REALIST generates semantic term relations such synonyms. It takes as input a free text data base and generates as output term pairs that are semantically related with respect to their meanings in the data base. In the 1st step an automatic syntactic analysis provides linguistical knowledge about the terms of the data base. In the 2nd step this knowledge is compared by statistical similarity computation. Various experiments with different similarity measures are described
Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.01
```
0.012931953 = product of:
  0.051727813 = sum of:
    0.051727813 = weight(_text_:data in 3682) [ClassicSimilarity], result of:
      0.051727813 = score(doc=3682,freq=12.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.4278775 = fieldWeight in 3682, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3682)
  0.25 = coord(1/4)
```
Abstract

Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.

Theme

Data Mining

McKelvie, D.; Brew, C.; Thompson, H.S.: Uisng SGML as a basis for data-intensive natural language processing (1998) 0.01

0.012670675 = product of:
  0.0506827 = sum of:
    0.0506827 = weight(_text_:data in 3147) [ClassicSimilarity], result of:
      0.0506827 = score(doc=3147,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.4192326 = fieldWeight in 3147, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.09375 = fieldNorm(doc=3147)
  0.25 = coord(1/4)

Owei, V.; Higa, K.: ¬A paradigm for natural language explanation of database queries : a semantic data model approach (1994) 0.01
```
0.011946027 = product of:
  0.04778411 = sum of:
    0.04778411 = weight(_text_:data in 8189) [ClassicSimilarity], result of:
      0.04778411 = score(doc=8189,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3952563 = fieldWeight in 8189, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=8189)
  0.25 = coord(1/4)
```
Abstract

An interface that provides the user with automatic feedback in the form of an explanation of how the database management system interprets user specified queries. Proposes an approach that exploits the rich semantics of graphical semantic data models to construct a restricted natural language explanation of database queries that are specified in a very high level declarative form. These interpretations of the specified query represent the system's 'understanding' of the query, and are returned to the user for validation

Fox, C.: Lexical analysis and stoplists (1992) 0.01

0.011946027 = product of:
  0.04778411 = sum of:
    0.04778411 = weight(_text_:data in 3502) [ClassicSimilarity], result of:
      0.04778411 = score(doc=3502,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3952563 = fieldWeight in 3502, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3502)
  0.25 = coord(1/4)

Abstract: Lexical analysis is a fundamental operation in both query processing and automatic indexing, and filtering stoplist words is an important step in the automatic indexing process. Presents basic algorithms and data structures for lexical analysis, and shows how stoplist word removal can be efficiently incorporated into lexical analysis
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Benoit, G.: Data discretization for novel relationship discovery in information retrieval (2002) 0.01

0.011946027 = product of:
  0.04778411 = sum of:
    0.04778411 = weight(_text_:data in 5197) [ClassicSimilarity], result of:
      0.04778411 = score(doc=5197,freq=4.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3952563 = fieldWeight in 5197, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=5197)
  0.25 = coord(1/4)

Abstract: A sample of 600 Dialog and Swiss-Prot full text records in genetics and molecular biology were parsed and term frequencies calculated to provide data for a test of Benoit's visualization model for retrieval. A retrieved set is displayed graphically allowing for manipulation of document and concept relationships in real time, which hopefully will reveal unanticipated relationships.

Niemi, T.; Jämsen , J.: ¬A query language for discovering semantic associations, part I : approach and formal definition of query primitives (2007) 0.01
```
0.011805206 = product of:
  0.047220822 = sum of:
    0.047220822 = weight(_text_:data in 591) [ClassicSimilarity], result of:
      0.047220822 = score(doc=591,freq=10.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.39059696 = fieldWeight in 591, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=591)
  0.25 = coord(1/4)
```
Abstract

In contemporary query languages, the user is responsible for navigation among semantically related data. Because of the huge amount of data and the complex structural relationships among data in modern applications, it is unrealistic to suppose that the user could know completely the content and structure of the available information. There are several query languages whose purpose is to facilitate navigation in unknown structures of databases. However, the background assumption of these languages is that the user knows how data are related to each other semantically in the structure at hand. So far only little attention has been paid to how unknown semantic associations among available data can be discovered. We address this problem in this article. A semantic association between two entities can be constructed if a sequence of relationships expressed explicitly in a database can be found that connects these entities to each other. This sequence may contain several other entities through which the original entities are connected to each other indirectly. We introduce an expressive and declarative query language for discovering semantic associations. Our query language is able, for example, to discover semantic associations between entities for which only some of the characteristics are known. Further, it integrates the manipulation of semantic associations with the manipulation of documents that may contain information on entities in semantic associations.
Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.01
```
0.011805206 = product of:
  0.047220822 = sum of:
    0.047220822 = weight(_text_:data in 3488) [ClassicSimilarity], result of:
      0.047220822 = score(doc=3488,freq=10.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.39059696 = fieldWeight in 3488, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
  0.25 = coord(1/4)
```
Abstract

There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.

Source

Semantic keyword-based search on structured data sources: COST Action IC1302. Second International KEYSTONE Conference, IKC 2016, Cluj-Napoca, Romania, September 8-9, 2016, Revised Selected Papers. Eds.: A. Calì, A. et al
French, J.C.; Powell, A.L.; Schulman, E.: Using clustering strategies for creating authority files (2000) 0.01
```
0.010973128 = product of:
  0.04389251 = sum of:
    0.04389251 = weight(_text_:data in 4811) [ClassicSimilarity], result of:
      0.04389251 = score(doc=4811,freq=6.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3630661 = fieldWeight in 4811, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=4811)
  0.25 = coord(1/4)
```
Abstract

As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. Authority work, the need to discover and reconcile variant forms of strings in bibliographical entries, will become more critical in the future. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. We investigate a number of approximate string matching techniques that have traditionally been used to help with this problem. We then introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms. We demonstrate the utility of these approaches using data from the Astrophysics Data System and show how we can reduce the human effort involved in the creation of authority files
Beitzel, S.M.; Jensen, E.C.; Chowdhury, A.; Grossman, D.; Frieder, O; Goharian, N.: Fusion of effective retrieval strategies in the same information retrieval system (2004) 0.01
```
0.010973128 = product of:
  0.04389251 = sum of:
    0.04389251 = weight(_text_:data in 2502) [ClassicSimilarity], result of:
      0.04389251 = score(doc=2502,freq=6.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.3630661 = fieldWeight in 2502, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=2502)
  0.25 = coord(1/4)
```
Abstract

Prior efforts have shown that under certain situations retrieval effectiveness may be improved via the use of data fusion techniques. Although these improvements have been observed from the fusion of result sets from several distinct information retrieval systems, it has often been thought that fusing different document retrieval strategies in a single information retrieval system will lead to similar improvements. In this study, we show that this is not the case. We hold constant systemic differences such as parsing, stemming, phrase processing, and relevance feedback, and fuse result sets generated from highly effective retrieval strategies in the same information retrieval system. From this, we show that data fusion of highly effective retrieval strategies alone shows little or no improvement in retrieval effectiveness. Furthermore, we present a detailed analysis of the performance of modern data fusion approaches, and demonstrate the reasons why they do not perform weIl when applied to this problem. Detailed results and analyses are included to support our conclusions.

Ruge, G.; Schwarz, C.: Term association and computational linguistics (1991) 0.01

0.010558897 = product of:
  0.042235587 = sum of:
    0.042235587 = weight(_text_:data in 2310) [ClassicSimilarity], result of:
      0.042235587 = score(doc=2310,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34936053 = fieldWeight in 2310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=2310)
  0.25 = coord(1/4)

Abstract: Most systems for term associations are statistically based. In general they exploit term co-occurrences. A critical overview about statistical approaches in this field is given. A new approach on the basis of a linguistic analysis for large amounts of textual data is outlined

Roberts, C.W.; Popping, R.: Computer-supported content analysis : some recent developments (1993) 0.01

0.010558897 = product of:
  0.042235587 = sum of:
    0.042235587 = weight(_text_:data in 4236) [ClassicSimilarity], result of:
      0.042235587 = score(doc=4236,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34936053 = fieldWeight in 4236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=4236)
  0.25 = coord(1/4)

Abstract: Presents an overview of some recent developments in the clause-based content analysis of linguistic data. Introduces network analysis of evaluative texts, for the analysis of cognitive maps, and linguistic content analysis. Focuses on the types of substantive inferences afforded by the three approaches

Griffith, C.: FREESTYLE: LEXIS-NEXIS goes natural (1994) 0.01

0.010558897 = product of:
  0.042235587 = sum of:
    0.042235587 = weight(_text_:data in 2512) [ClassicSimilarity], result of:
      0.042235587 = score(doc=2512,freq=2.0), product of:
        0.120893985 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03823278 = queryNorm
        0.34936053 = fieldWeight in 2512, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=2512)
  0.25 = coord(1/4)

Abstract: Describes FREESTYLE, the associative language search engine, developed by Mead Data Central for its LEXIS/NEXIS online service. The special feature of the associative language in FREESTYLE allows users to enter search descriptions in plain English

Search (90 results, page 1 of 5)

Authors

Years

Types

Themes