Search (67 results, page 2 of 4)

Snajder, J.; Dalbelo Basic, B.D.; Tadic, M.: Automatic acquisition of inflectional lexica for morphological normalisation (2008) 0.01
```
0.006334501 = product of:
  0.015836252 = sum of:
    0.009138121 = weight(_text_:a in 2910) [ClassicSimilarity], result of:
      0.009138121 = score(doc=2910,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1709182 = fieldWeight in 2910, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2910)
    0.0066981306 = product of:
      0.013396261 = sum of:
        0.013396261 = weight(_text_:information in 2910) [ClassicSimilarity], result of:
          0.013396261 = score(doc=2910,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16457605 = fieldWeight in 2910, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2910)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Due to natural language morphology, words can take on various morphological forms. Morphological normalisation - often used in information retrieval and text mining systems - conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.

Source

Information processing and management. 44(2008) no.5, S.1720-1731

Type

a
Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.01
```
0.006203569 = product of:
  0.015508923 = sum of:
    0.0076151006 = weight(_text_:a in 5769) [ClassicSimilarity], result of:
      0.0076151006 = score(doc=5769,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.14243183 = fieldWeight in 5769, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.007893822 = product of:
      0.015787644 = sum of:
        0.015787644 = weight(_text_:information in 5769) [ClassicSimilarity], result of:
          0.015787644 = score(doc=5769,freq=8.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.19395474 = fieldWeight in 5769, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms

Source

Journal of the American Society for Information Science and technology. 52(2001) no.4, S.283-296

Type

a

Souza, R.R.; Raghavan, K.S.: ¬A methodology for noun phrase-based automatic indexing (2006) 0.01

0.005948606 = product of:
  0.014871514 = sum of:
    0.008173384 = weight(_text_:a in 173) [ClassicSimilarity], result of:
      0.008173384 = score(doc=173,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 173, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=173)
    0.0066981306 = product of:
      0.013396261 = sum of:
        0.013396261 = weight(_text_:information in 173) [ClassicSimilarity], result of:
          0.013396261 = score(doc=173,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16457605 = fieldWeight in 173, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=173)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: The scholarly community is increasingly employing the Web both for publication of scholarly output and for locating and accessing relevant scholarly literature. Organization of this vast body of digital information assumes significance in this context. The sheer volume of digital information to be handled makes traditional indexing and knowledge representation strategies ineffective and impractical. It is, therefore, worth exploring new approaches. An approach being discussed considers the intrinsic semantics of texts of documents. Based on the hypothesis that noun phrases in a text are semantically rich in terms of their ability to represent the subject content of the document, this approach seeks to identify and extract noun phrases instead of single keywords, and use them as descriptors. This paper presents a methodology that has been developed for extracting noun phrases from Portuguese texts. The results of an experiment carried out to test the adequacy of the methodology are also presented.
Type: a

Anderson, J.D.; Pérez-Carballo, J.: ¬The nature of indexing: how humans and machines analyze messages and texts for retrieval : Part II: Machine indexing, and the allocation of human versus machine effort (2001) 0.01

0.00588199 = product of:
  0.014704974 = sum of:
    0.0068111527 = weight(_text_:a in 368) [ClassicSimilarity], result of:
      0.0068111527 = score(doc=368,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12739488 = fieldWeight in 368, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=368)
    0.007893822 = product of:
      0.015787644 = sum of:
        0.015787644 = weight(_text_:information in 368) [ClassicSimilarity], result of:
          0.015787644 = score(doc=368,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.19395474 = fieldWeight in 368, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=368)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information processing and management. 37(2001) no.2, S.255-277
Type: a

Bunk, T.: Deskriptoren Stoppwortlisten und kryptische Zeichen (2008) 0.01

0.00588199 = product of:
  0.014704974 = sum of:
    0.0068111527 = weight(_text_:a in 2471) [ClassicSimilarity], result of:
      0.0068111527 = score(doc=2471,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12739488 = fieldWeight in 2471, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=2471)
    0.007893822 = product of:
      0.015787644 = sum of:
        0.015787644 = weight(_text_:information in 2471) [ClassicSimilarity], result of:
          0.015787644 = score(doc=2471,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.19395474 = fieldWeight in 2471, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=2471)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information - Wissenschaft und Praxis. 59(2008) H.5, S.285-292
Type: a

Stock, W.G.: Textwortmethode (2000) 0.01

0.00588199 = product of:
  0.014704974 = sum of:
    0.0068111527 = weight(_text_:a in 3408) [ClassicSimilarity], result of:
      0.0068111527 = score(doc=3408,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12739488 = fieldWeight in 3408, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=3408)
    0.007893822 = product of:
      0.015787644 = sum of:
        0.015787644 = weight(_text_:information in 3408) [ClassicSimilarity], result of:
          0.015787644 = score(doc=3408,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.19395474 = fieldWeight in 3408, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=3408)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Auf dem Weg zur Informationskultur: Wa(h)re Information? Festschrift für Norbert Henrichs zum 65. Geburtstag, Hrsg.: T.A. Schröder
Type: a

Dolamic, L.; Savoy, J.: When stopword lists make the difference (2009) 0.01

0.005513504 = product of:
  0.01378376 = sum of:
    0.008258085 = weight(_text_:a in 3319) [ClassicSimilarity], result of:
      0.008258085 = score(doc=3319,freq=6.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1544581 = fieldWeight in 3319, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3319)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 3319) [ClassicSimilarity], result of:
          0.011051352 = score(doc=3319,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 3319, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3319)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: In this brief communication, we evaluate the use of two stopword lists for the English language (one comprising 571 words and another with 9) and compare them with a search approach accounting for all word forms. We show that through implementing the original Okapi form or certain ones derived from the Divergence from Randomness (DFR) paradigm, significantly lower performance levels may result when using short or no stopword lists. For other DFR models and a revised Okapi implementation, performance differences between approaches using short or long stopword lists or no list at all are usually not statistically significant. Similar conclusions can be drawn when using other natural languages such as French, Hindi, or Persian.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.200-203
Type: a

Dolamic, L.; Savoy, J.: Indexing and searching strategies for the Russian language (2009) 0.01
```
0.005278751 = product of:
  0.013196876 = sum of:
    0.0076151006 = weight(_text_:a in 3301) [ClassicSimilarity], result of:
      0.0076151006 = score(doc=3301,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.14243183 = fieldWeight in 3301, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3301)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 3301) [ClassicSimilarity], result of:
          0.011163551 = score(doc=3301,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 3301, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3301)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

This paper describes and evaluates various stemming and indexing strategies for the Russian language. We design and evaluate two stemming approaches, a light and a more aggressive one, and compare these stemmers to the Snowball stemmer, to no stemming, and also to a language-independent approach (n-gram). To evaluate the suggested stemming strategies we apply various probabilistic information retrieval (IR) models, including the Okapi, the Divergence from Randomness (DFR), a statistical language model (LM), as well as two vector-space approaches, namely, the classical tf idf scheme and the dtu-dtn model. We find that the vector-space dtu-dtn and the DFR models tend to result in better retrieval effectiveness than the Okapi, LM, or tf idf models, while only the latter two IR approaches result in statistically significant performance differences. Ignoring stemming generally reduces the MAP by more than 50%, and these differences are always significant. When applying an n-gram approach, performance differences are usually lower than an approach involving stemming. Finally, our light stemmer tends to perform best, although performance differences between the light, aggressive, and Snowball stemmers are not statistically significant.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2540-2547

Type

a

Roberts, D.; Souter, C.: ¬The automation of controlled vocabulary subject indexing of medical journal articles (2000) 0.01

0.0051638708 = product of:
  0.012909677 = sum of:
    0.008173384 = weight(_text_:a in 711) [ClassicSimilarity], result of:
      0.008173384 = score(doc=711,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 711, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=711)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 711) [ClassicSimilarity], result of:
          0.009472587 = score(doc=711,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 711, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=711)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: This article discusses the possibility of the automation of sophisticated subject indexing of medical journal articles. Approaches to subject descriptor assignment in information retrieval research are usually either based upon the manual descriptors in the database or generation of search parameters from the text of the article. The principles of the Medline indexing system are described, followed by a summary of a pilot project, based upon the Amed database. The results suggest that a more extended study, based upon Medline, should encompass various components: Extraction of 'concept strings' from titles and abstracts of records, based upon linguistic features characteristic of medical literature. Use of the Unified Medical Language System (UMLS) for identification of controlled vocabulary descriptors. Coordination of descriptors, utilising features of the Medline indexing system. The emphasis should be on system manipulation of data, based upon input, available resources and specifically designed rules.
Type: a

Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.00
```
0.004915534 = product of:
  0.012288835 = sum of:
    0.008341924 = weight(_text_:a in 896) [ClassicSimilarity], result of:
      0.008341924 = score(doc=896,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15602624 = fieldWeight in 896, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=896)
    0.003946911 = product of:
      0.007893822 = sum of:
        0.007893822 = weight(_text_:information in 896) [ClassicSimilarity], result of:
          0.007893822 = score(doc=896,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.09697737 = fieldWeight in 896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=896)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

This paper deals with Swedish full text retrieval and the problem of morphological variation of query terms in the document database. The effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Three of five tested combinations involved indexing strategies that used conflation, in the form of normalization. Further, two of these three combinations used indexing strategies that employed compound splitting. Normalization and compound splitting were performed by SWETWOL, a morphological analyzer for the Swedish language. A fourth combination attempted to group related terms by right hand truncation of query terms. The four combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. The five combinations were evaluated under six different user scenarios, where each scenario simulated a certain user type. The four alternative combinations outperformed the baseline, for each user scenario. The truncation combination had the best performance under each user scenario. The main conclusion of the paper is that normalization and right hand truncation (performed by a search expert) enhanced retrieval effectiveness in comparison to the baseline. The performance of the three combinations of indexing strategies with query terms based on normalization was not far below the performance of the truncation combination.

Source

Information processing and management. 43(2007) no.1, S.81-102

Type

a
Witschel, H.F.: Terminology extraction and automatic indexing : comparison and qualitative evaluation of methods (2005) 0.00
```
0.004915534 = product of:
  0.012288835 = sum of:
    0.008341924 = weight(_text_:a in 1842) [ClassicSimilarity], result of:
      0.008341924 = score(doc=1842,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15602624 = fieldWeight in 1842, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1842)
    0.003946911 = product of:
      0.007893822 = sum of:
        0.007893822 = weight(_text_:information in 1842) [ClassicSimilarity], result of:
          0.007893822 = score(doc=1842,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.09697737 = fieldWeight in 1842, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1842)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Many terminology engineering processes involve the task of automatic terminology extraction: before the terminology of a given domain can be modelled, organised or standardised, important concepts (or terms) of this domain have to be identified and fed into terminological databases. These serve in further steps as a starting point for compiling dictionaries, thesauri or maybe even terminological ontologies for the domain. For the extraction of the initial concepts, extraction methods are needed that operate on specialised language texts. On the other hand, many machine learning or information retrieval applications require automatic indexing techniques. In Machine Learning applications concerned with the automatic clustering or classification of texts, often feature vectors are needed that describe the contents of a given text briefly but meaningfully. These feature vectors typically consist of a fairly small set of index terms together with weights indicating their importance. Short but meaningful descriptions of document contents as provided by good index terms are also useful to humans: some knowledge management applications (e.g. topic maps) use them as a set of basic concepts (topics). The author believes that the tasks of terminology extraction and automatic indexing have much in common and can thus benefit from the same set of basic algorithms. It is the goal of this paper to outline some methods that may be used in both contexts, but also to find the discriminating factors between the two tasks that call for the variation of parameters or application of different techniques. The discussion of these methods will be based on statistical, syntactical and especially morphological properties of (index) terms. The paper is concluded by the presentation of some qualitative and quantitative results comparing statistical and morphological methods.

Type

a

Niggemann, E.: Wer suchet, der findet? : Verbesserung der inhaltlichen Suchmöglichkeiten im Informationssystem Der Deutschen Bibliothek (2006) 0.00

0.0049073496 = product of:
  0.012268374 = sum of:
    0.0067426977 = weight(_text_:a in 5812) [ClassicSimilarity], result of:
      0.0067426977 = score(doc=5812,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12611452 = fieldWeight in 5812, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5812)
    0.005525676 = product of:
      0.011051352 = sum of:
        0.011051352 = weight(_text_:information in 5812) [ClassicSimilarity], result of:
          0.011051352 = score(doc=5812,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13576832 = fieldWeight in 5812, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5812)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Elektronische Bibliothekskataloge und Bibliografien haben ihr Monopol bei der Suche nach Büchern, Aufsätzen, musikalischen Werken u. a. verloren. Globale Suchmaschinen sind starke Konkurrenten, und Bibliotheken müssen heute so planen, dass ihre Dienstleistungen auch morgen noch interessant sind. Die Deutsche Bibliothek (DDB) wird ihre traditionelle Katalogrecherche zu einem globalen, netzbasierten Informationssystem erweitern, das die Vorteile der neutralen, qualitätsbasierten Katalogsuche mit den Vorteilen moderner Suchmaschinen zu verbinden sucht. Dieser Beitrag beschäftigt sich mit der Verbesserung der inhaltlichen Suchmöglichkeiten im Informationssystem Der Deutschen Bibliothek. Weitere Entwicklungsstränge sollen nur kurz im Ausblick angerissen werden.
Source: Information und Sprache: Beiträge zu Informationswissenschaft, Computerlinguistik, Bibliothekswesen und verwandten Fächern. Festschrift für Harald H. Zimmermann. Herausgegeben von Ilse Harms, Heinz-Dirk Luckhardt und Hans W. Giessen
Type: a

Hauer, M.: Neue Qualitäten in Bibliotheken : Durch Content-Ergänzung, maschinelle Indexierung und modernes Information Retrieval können Recherchen in Bibliothekskatalogen deutlich verbessert werden (2004) 0.00

0.0047055925 = product of:
  0.011763981 = sum of:
    0.005448922 = weight(_text_:a in 886) [ClassicSimilarity], result of:
      0.005448922 = score(doc=886,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.10191591 = fieldWeight in 886, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=886)
    0.006315058 = product of:
      0.012630116 = sum of:
        0.012630116 = weight(_text_:information in 886) [ClassicSimilarity], result of:
          0.012630116 = score(doc=886,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1551638 = fieldWeight in 886, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=886)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Type: a

Schneider, A.: Moderne Retrievalverfahren in klassischen bibliotheksbezogenen Anwendungen : Projekte und Perspektiven (2008) 0.00
```
0.0044313995 = product of:
  0.011078498 = sum of:
    0.002724461 = weight(_text_:a in 4031) [ClassicSimilarity], result of:
      0.002724461 = score(doc=4031,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.050957955 = fieldWeight in 4031, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=4031)
    0.008354037 = product of:
      0.016708074 = sum of:
        0.016708074 = weight(_text_:information in 4031) [ClassicSimilarity], result of:
          0.016708074 = score(doc=4031,freq=14.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.20526241 = fieldWeight in 4031, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=4031)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Die vorliegende Arbeit beschäftigt sich mit modernen Retrievalverfahren in klassischen bibliotheksbezogenen Anwendungen. Wie die Verbindung der beiden gegensätzlich scheinenden Wortgruppen im Titel zeigt, werden in der Arbeit Aspekte aus der Informatik bzw. Informationswissenschaft mit Aspekten aus der Bibliothekstradition verknüpft. Nach einer kurzen Schilderung der Ausgangslage, der so genannten Informationsflut, im ersten Kapitel stellt das zweite Kapitel eine Einführung in die Theorie des Information Retrieval dar. Im Einzelnen geht es um die Grundlagen von Information Retrieval und Information-Retrieval-Systemen sowie um die verschiedenen Möglichkeiten der Informationserschließung. Hier werden Formal- und Sacherschließung, Indexierung und automatische Indexierung behandelt. Des Weiteren werden im Rahmen der Theorie des Information Retrieval unterschiedliche Information-Retrieval-Modelle und die Evaluation durch Retrievaltests vorgestellt. Nach der Theorie folgt im dritten Kapitel die Praxis des Information Retrieval. Es werden die organisationsinterne Anwendung, die Anwendung im Informations- und Dokumentationsbereich sowie die Anwendung im Bibliotheksbereich unterschieden. Die organisationsinterne Anwendung wird durch das Beispiel der Datenbank KURS zur Aus- und Weiterbildung veranschaulicht. Die Anwendung im Bibliotheksbereich bezieht sich in erster Linie auf den OPAC als Kompromiss zwischen bibliothekarischer Indexierung und Endnutzeranforderungen und auf seine Anreicherung (sog. Catalogue Enrichment), um das Retrieval zu verbessern. Der Bibliotheksbereich wird ausführlicher behandelt, indem ein Rückblick auf abgeschlossene Projekte zu Informations- und Indexierungssystemen aus den Neunziger Jahren (OSIRIS, MILOS I und II, KASCADE) sowie ein Einblick in aktuelle Projekte gegeben werden. In den beiden folgenden Kapiteln wird je ein aktuelles Projekt zur Verbesserung des Retrievals durch Kataloganreicherung, automatische Erschließung und fortschrittliche Retrievalverfahren präsentiert: das Suchportal dandelon.com und das 180T-Projekt des Hochschulbibliothekszentrums des Landes Nordrhein-Westfalen. Hierbei werden jeweils Projektziel, Projektpartner, Projektorganisation, Projektverlauf und die verwendete Technologie vorgestellt. Die Projekte unterscheiden sich insofern, dass in dem einen Fall eine große Verbundzentrale die Projektkoordination übernimmt, im anderen Fall jede einzelne teilnehmende Bibliothek selbst für die Durchführung verantwortlich ist. Im sechsten und letzten Kapitel geht es um das Fazit und die Perspektiven. Es werden sowohl die beiden beschriebenen Projekte bewertet als auch ein Ausblick auf Entwicklungen bezüglich des Bibliothekskatalogs gegeben. Diese Veröffentlichung geht zurück auf eine Master-Arbeit im postgradualen Fernstudiengang Master of Arts (Library and Information Science) an der Humboldt-Universität zu Berlin.

Mielke, B.: Wider einige gängige Ansichten zur juristischen Informationserschließung (2002) 0.00

0.004313929 = product of:
  0.0107848225 = sum of:
    0.004086692 = weight(_text_:a in 2145) [ClassicSimilarity], result of:
      0.004086692 = score(doc=2145,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.07643694 = fieldWeight in 2145, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2145)
    0.0066981306 = product of:
      0.013396261 = sum of:
        0.013396261 = weight(_text_:information in 2145) [ClassicSimilarity], result of:
          0.013396261 = score(doc=2145,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16457605 = fieldWeight in 2145, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2145)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information und Mobilität: Optimierung und Vermeidung von Mobilität durch Information. Proceedings des 8. Internationalen Symposiums für Informationswissenschaft (ISI 2002), 7.-10.10.2002, Regensburg. Hrsg.: Rainer Hammwöhner, Christian Wolff, Christa Womser-Hacker
Type: a

Nohr, H.: Theorie des Information Retrieval II : Automatische Indexierung (2004) 0.00

0.003594941 = product of:
  0.008987352 = sum of:
    0.0034055763 = weight(_text_:a in 8) [ClassicSimilarity], result of:
      0.0034055763 = score(doc=8,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.06369744 = fieldWeight in 8, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=8)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 8) [ClassicSimilarity], result of:
          0.011163551 = score(doc=8,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 8, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=8)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Grundlagen der praktischen Information und Dokumentation. 5., völlig neu gefaßte Ausgabe. 2 Bde. Hrsg. von R. Kuhlen, Th. Seeger u. D. Strauch. Begründet von Klaus Laisiepen, Ernst Lutterbeck, Karl-Heinrich Meyer-Uhlenried. Bd.1: Handbuch zur Einführung in die Informationswissenschaft und -praxis
Type: a

Ladewig, C.; Henkes, M.: Verfahren zur automatischen inhaltlichen Erschließung von elektronischen Texten : ASPECTIX (2001) 0.00

0.003529194 = product of:
  0.008822985 = sum of:
    0.004086692 = weight(_text_:a in 5794) [ClassicSimilarity], result of:
      0.004086692 = score(doc=5794,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.07643694 = fieldWeight in 5794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=5794)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 5794) [ClassicSimilarity], result of:
          0.009472587 = score(doc=5794,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 5794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5794)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: nfd Information - Wissenschaft und Praxis. 52(2001) H.3, S.159-164
Type: a

Rapke, K.: Automatische Indexierung von Volltexten für die Gruner+Jahr Pressedatenbank (2001) 0.00

0.003529194 = product of:
  0.008822985 = sum of:
    0.004086692 = weight(_text_:a in 6386) [ClassicSimilarity], result of:
      0.004086692 = score(doc=6386,freq=2.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.07643694 = fieldWeight in 6386, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=6386)
    0.0047362936 = product of:
      0.009472587 = sum of:
        0.009472587 = weight(_text_:information in 6386) [ClassicSimilarity], result of:
          0.009472587 = score(doc=6386,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.116372846 = fieldWeight in 6386, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=6386)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: nfd Information - Wissenschaft und Praxis. 52(2001) H.5, S.251-262
Type: a

Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.00
```
0.0035052493 = product of:
  0.008763123 = sum of:
    0.0048162127 = weight(_text_:a in 601) [ClassicSimilarity], result of:
      0.0048162127 = score(doc=601,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.090081796 = fieldWeight in 601, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=601)
    0.003946911 = product of:
      0.007893822 = sum of:
        0.007893822 = weight(_text_:information in 601) [ClassicSimilarity], result of:
          0.007893822 = score(doc=601,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.09697737 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=601)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.

Source

Journal of the American Society for Information Science and technology. 53(2002) no.8, S.653-677

Type

a
Rädler, K.: In Bibliothekskatalogen "googlen" : Integration von Inhaltsverzeichnissen, Volltexten und WEB-Ressourcen in Bibliothekskataloge (2004) 0.00
```
0.0035052493 = product of:
  0.008763123 = sum of:
    0.0048162127 = weight(_text_:a in 2432) [ClassicSimilarity], result of:
      0.0048162127 = score(doc=2432,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.090081796 = fieldWeight in 2432, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2432)
    0.003946911 = product of:
      0.007893822 = sum of:
        0.007893822 = weight(_text_:information in 2432) [ClassicSimilarity], result of:
          0.007893822 = score(doc=2432,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.09697737 = fieldWeight in 2432, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2432)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Ausgangslage Die Katalog-Recherchen über Internet, also von außerhalb der Bibliothek, nehmen erwartungsgemäß stark zu bzw. sind mittlerweile die Regel. Damit ist natürlich das Bedürfnis und die Notwendigkeit gewachsen, über den Titel hinaus zusätzliche inhaltliche Informationen zu erhalten, die es erlauben, die Zweckmäßigkeit wesentlich besser abschätzen zu können, eine Bestellung vorzunehmen oder vielleicht auch 50 km in die Bibliothek zu fahren, um ein Buch zu entleihen. Dieses Informationsdefizit wird zunehmend als gravierender Mangel erfahren. Inhaltsverzeichnisse referieren den Inhalt kurz und prägnant. Sie sind die erste Stelle, welche zur Relevanz-Beurteilung herangezogen wird. Fast alle relevanten Terme einer Fachbuchpublikation finden sich bereits dort. Andererseits wird immer deutlicher, dass die dem bibliothekarischen Paradigma entsprechende intellektuelle Indexierung der einzelnen dokumentarischen Einheiten mit den engsten umfassenden dokumentationssprachlichen Termen (Schlagwörter, Klassen) zwar eine notwendige, aber keinesfalls hinreichende Methode darstellt, das teuer erworbene Bibliotheksgut Information für den Benutzer in seiner spezifischen Problemstellung zu aktivieren und als Informationsdienstleistung anbieten zu können. Informationen zu sehr speziellen Fragestellungen, die oft nur in kürzeren Abschnitten (Kapitel) erörtert werden, sind derzeit nur indirekt, mit großem Zeitaufwand und oft überhaupt nicht auffindbar. Sie liegen sozusagen brach. Die Tiefe der intellektuellen Indexierung bis in einzelne inhaltliche Details zu erweitern, ist aus personellen und damit auch finanziellen Gesichtspunkten nicht vertretbar. Bibliotheken fallen deshalb in der Wahrnehmung von Informationssuchenden immer mehr zurück. Die enorme Informationsvielfalt liegt hinter dem Informations- bzw. Recherchehorizont der bibliographischen Aufnahmen im Katalog.

Location

A

Type

a

Search (67 results, page 2 of 4)

Authors

Languages

Types

Themes

Subjects

Classifications