Search (266 results, page 1 of 14)

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.03

0.03227781 = product of:
  0.053796347 = sum of:
    0.011641062 = product of:
      0.05820531 = sum of:
        0.05820531 = weight(_text_:problem in 530) [ClassicSimilarity], result of:
          0.05820531 = score(doc=530,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.3282676 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.2 = coord(1/5)
    0.022345824 = weight(_text_:of in 530) [ClassicSimilarity], result of:
      0.022345824 = score(doc=530,freq=16.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.34207192 = fieldWeight in 530, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.019809462 = product of:
      0.039618924 = sum of:
        0.039618924 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.039618924 = score(doc=530,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.03
```
0.025467014 = product of:
  0.063667536 = sum of:
    0.018716287 = weight(_text_:of in 2161) [ClassicSimilarity], result of:
      0.018716287 = score(doc=2161,freq=22.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.28651062 = fieldWeight in 2161, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2161)
    0.04495125 = product of:
      0.0899025 = sum of:
        0.0899025 = weight(_text_:mind in 2161) [ClassicSimilarity], result of:
          0.0899025 = score(doc=2161,freq=2.0), product of:
            0.2607373 = queryWeight, product of:
              6.241566 = idf(docFreq=233, maxDocs=44218)
              0.04177434 = queryNorm
            0.34480107 = fieldWeight in 2161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.241566 = idf(docFreq=233, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2161)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1799-1816

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.03

0.025122102 = product of:
  0.04187017 = sum of:
    0.0117592495 = product of:
      0.058796246 = sum of:
        0.058796246 = weight(_text_:problem in 1794) [ClassicSimilarity], result of:
          0.058796246 = score(doc=1794,freq=4.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.33160037 = fieldWeight in 1794, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.2 = coord(1/5)
    0.015961302 = weight(_text_:of in 1794) [ClassicSimilarity], result of:
      0.015961302 = score(doc=1794,freq=16.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.24433708 = fieldWeight in 1794, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.0141496165 = product of:
      0.028299233 = sum of:
        0.028299233 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.028299233 = score(doc=1794,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
Date: 11. 9.2000 19:53:22
Source: Journal of the American Society for Information Science. 49(1998) no.10, S.888-902

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.02

0.017917141 = product of:
  0.044792853 = sum of:
    0.024983391 = weight(_text_:of in 5291) [ClassicSimilarity], result of:
      0.024983391 = score(doc=5291,freq=20.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.38244802 = fieldWeight in 5291, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.019809462 = product of:
      0.039618924 = sum of:
        0.039618924 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.039618924 = score(doc=5291,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
Date: 22. 7.2006 17:32:00
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767

Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.02

0.0170663 = product of:
  0.04266575 = sum of:
    0.01646295 = product of:
      0.08231475 = sum of:
        0.08231475 = weight(_text_:problem in 5074) [ClassicSimilarity], result of:
          0.08231475 = score(doc=5074,freq=4.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.46424055 = fieldWeight in 5074, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5074)
      0.2 = coord(1/5)
    0.026202802 = weight(_text_:of in 5074) [ClassicSimilarity], result of:
      0.026202802 = score(doc=5074,freq=22.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.40111488 = fieldWeight in 5074, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5074)
  0.4 = coord(2/5)

Abstract: The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.02

0.015834233 = product of:
  0.03958558 = sum of:
    0.011286346 = weight(_text_:of in 1952) [ClassicSimilarity], result of:
      0.011286346 = score(doc=1952,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.17277241 = fieldWeight in 1952, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=1952)
    0.028299233 = product of:
      0.056598466 = sum of:
        0.056598466 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.056598466 = score(doc=1952,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 16. 8.1998 12:51:22
Source: Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02

0.015834233 = product of:
  0.03958558 = sum of:
    0.011286346 = weight(_text_:of in 4157) [ClassicSimilarity], result of:
      0.011286346 = score(doc=4157,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.17277241 = fieldWeight in 4157, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=4157)
    0.028299233 = product of:
      0.056598466 = sum of:
        0.056598466 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.056598466 = score(doc=4157,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Tsareva, P.V.: Algoritmy dlya raspoznavaniya pozitivnykh i negativnykh vkhozdenii deskriptorov v tekst i protsedura avtomaticheskoi klassifikatsii tekstov (1999) 0.02

0.015834233 = product of:
  0.03958558 = sum of:
    0.011286346 = weight(_text_:of in 374) [ClassicSimilarity], result of:
      0.011286346 = score(doc=374,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.17277241 = fieldWeight in 374, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=374)
    0.028299233 = product of:
      0.056598466 = sum of:
        0.056598466 = weight(_text_:22 in 374) [ClassicSimilarity], result of:
          0.056598466 = score(doc=374,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.38690117 = fieldWeight in 374, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=374)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 1. 4.2002 10:22:41
Footnote: Übers. des Titels: Algorithms for selection of positive and negative descriptors from text and automated text indexing

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.02

0.015834233 = product of:
  0.03958558 = sum of:
    0.011286346 = weight(_text_:of in 2759) [ClassicSimilarity], result of:
      0.011286346 = score(doc=2759,freq=2.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.17277241 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.028299233 = product of:
      0.056598466 = sum of:
        0.056598466 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.056598466 = score(doc=2759,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 1. 2.2016 18:25:22

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.02

0.015311283 = product of:
  0.038278207 = sum of:
    0.01563882 = weight(_text_:of in 4709) [ClassicSimilarity], result of:
      0.01563882 = score(doc=4709,freq=6.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.23940048 = fieldWeight in 4709, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=4709)
    0.022639386 = product of:
      0.045278773 = sum of:
        0.045278773 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.045278773 = score(doc=4709,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: Proposes automatic linguistic knowledge acquisition from sublanguage corpora. The system combines existing linguistic knowledge and human intervention with corpus based techniques. The algorithm involves a gradual approximation which works to converge linguistic knowledge gradually towards desirable results. The 1st experiment revealed the characteristic of this algorithm and the others proved the effectiveness of this algorithm for a real corpus
Date: 31. 7.1996 9:22:19

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.014990156 = product of:
  0.03747539 = sum of:
    0.017665926 = weight(_text_:of in 5001) [ClassicSimilarity], result of:
      0.017665926 = score(doc=5001,freq=10.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2704316 = fieldWeight in 5001, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.019809462 = product of:
      0.039618924 = sum of:
        0.039618924 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.039618924 = score(doc=5001,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Alexander, M.: Retrieving digital data with fuzzy matching (1997) 0.01

0.01416828 = product of:
  0.0354207 = sum of:
    0.0133040715 = product of:
      0.066520356 = sum of:
        0.066520356 = weight(_text_:problem in 151) [ClassicSimilarity], result of:
          0.066520356 = score(doc=151,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.375163 = fieldWeight in 151, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0625 = fieldNorm(doc=151)
      0.2 = coord(1/5)
    0.02211663 = weight(_text_:of in 151) [ClassicSimilarity], result of:
      0.02211663 = score(doc=151,freq=12.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.33856338 = fieldWeight in 151, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=151)
  0.4 = coord(2/5)

Abstract: In 1993 the British Library established a programme of activities entitled Initiatives for Access (IFA) to identify and develop computer applications based on the new technologies emerging in the aereas of digital and network service. Discusses the problem of the effective retrieval of digital data after its capture focusing on the product Excalibur EFS which looks at the way information is sorted at its fundamental level and identifies patterns in numbers. Looks at the benefits of Excalibur and outlines other experiments in progress as part of the IFA programme

Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.01

0.014163372 = product of:
  0.03540843 = sum of:
    0.0127690425 = weight(_text_:of in 6752) [ClassicSimilarity], result of:
      0.0127690425 = score(doc=6752,freq=4.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.19546966 = fieldWeight in 6752, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=6752)
    0.022639386 = product of:
      0.045278773 = sum of:
        0.045278773 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
          0.045278773 = score(doc=6752,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.30952093 = fieldWeight in 6752, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Abstract: AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
Date: 6. 3.1997 16:22:15

Ward, M.L.: ¬The future of the human indexer (1996) 0.01
```
0.013958423 = product of:
  0.034896057 = sum of:
    0.01791652 = weight(_text_:of in 7244) [ClassicSimilarity], result of:
      0.01791652 = score(doc=7244,freq=14.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.2742677 = fieldWeight in 7244, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=7244)
    0.016979538 = product of:
      0.033959076 = sum of:
        0.033959076 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
          0.033959076 = score(doc=7244,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.23214069 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)

Date

9. 2.1997 18:44:22

Source

Journal of librarianship and information science. 28(1996) no.4, S.217-225

Clavel, G.; Walther, F.; Walther, J.: Indexation automatique de fonds bibliotheconomiques (1993) 0.01

0.013594754 = product of:
  0.033986885 = sum of:
    0.011641062 = product of:
      0.05820531 = sum of:
        0.05820531 = weight(_text_:problem in 6610) [ClassicSimilarity], result of:
          0.05820531 = score(doc=6610,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.3282676 = fieldWeight in 6610, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6610)
      0.2 = coord(1/5)
    0.022345824 = weight(_text_:of in 6610) [ClassicSimilarity], result of:
      0.022345824 = score(doc=6610,freq=16.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.34207192 = fieldWeight in 6610, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6610)
  0.4 = coord(2/5)

Abstract: A discussion of developments to date in the field of computerized indexing, based on presentations given at a seminar held at the Institute of Policy Studies in Paris in Nov 91. The methods tested so far, based on a linguistic approach, whether using natural language or special thesauri, encounter the same central problem - they are only successful when applied to collections of similar types of documents covering very specific subject areas. Despite this, the search for some sort of universal indexing metalanguage continues. In the end, computerized indexing works best when used in conjunction with manual indexing - ideally in the hands of a trained library science professional, who can extract the maximum value from a collection of documents for a particular user population

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01
```
0.013556954 = product of:
  0.033892386 = sum of:
    0.022572692 = weight(_text_:of in 1442) [ClassicSimilarity], result of:
      0.022572692 = score(doc=1442,freq=50.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.34554482 = fieldWeight in 1442, product of:
          7.071068 = tf(freq=50.0), with freq of:
            50.0 = termFreq=50.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.011319693 = product of:
      0.022639386 = sum of:
        0.022639386 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
          0.022639386 = score(doc=1442,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.15476047 = fieldWeight in 1442, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Benson, A.C.: Image descriptions and their relational expressions : a review of the literature and the issues (2015) 0.01
```
0.013374514 = product of:
  0.033436283 = sum of:
    0.009978054 = product of:
      0.04989027 = sum of:
        0.04989027 = weight(_text_:problem in 1867) [ClassicSimilarity], result of:
          0.04989027 = score(doc=1867,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.28137225 = fieldWeight in 1867, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=1867)
      0.2 = coord(1/5)
    0.02345823 = weight(_text_:of in 1867) [ClassicSimilarity], result of:
      0.02345823 = score(doc=1867,freq=24.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.3591007 = fieldWeight in 1867, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1867)
  0.4 = coord(2/5)
```
Abstract

Purpose - The purpose of this paper is to survey the treatment of relationships, relationship expressions and the ways in which they manifest themselves in image descriptions. Design/methodology/approach - The term "relationship" is construed in the broadest possible way to include spatial relationships ("to the right of"), temporal ("in 1936," "at noon"), meronymic ("part of"), and attributive ("has color," "has dimension"). The intentions of these vaguely delimited categories with image information, image creation, and description in libraries and archives is complex and in need of explanation. Findings - The review brings into question many generally held beliefs about the relationship problem such as the belief that the semantics of relationships are somehow embedded in the relationship term itself and that image search and retrieval solutions can be found through refinement of word-matching systems. Originality/value - This review has no hope of systematically examining all evidence in all disciplines pertaining to this topic. It instead focusses on a general description of a theoretical treatment in Library and Information Science.

Source

Journal of documentation. 71(2015) no.1, S.143-164
Ahlgren, P.; Kekäläinen, J.: Indexing strategies for Swedish full text retrieval under different user scenarios (2007) 0.01
```
0.013149628 = product of:
  0.03287407 = sum of:
    0.0117592495 = product of:
      0.058796246 = sum of:
        0.058796246 = weight(_text_:problem in 896) [ClassicSimilarity], result of:
          0.058796246 = score(doc=896,freq=4.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.33160037 = fieldWeight in 896, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=896)
      0.2 = coord(1/5)
    0.02111482 = weight(_text_:of in 896) [ClassicSimilarity], result of:
      0.02111482 = score(doc=896,freq=28.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.32322758 = fieldWeight in 896, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=896)
  0.4 = coord(2/5)
```
Abstract

This paper deals with Swedish full text retrieval and the problem of morphological variation of query terms in the document database. The effects of combination of indexing strategies with query terms on retrieval effectiveness were studied. Three of five tested combinations involved indexing strategies that used conflation, in the form of normalization. Further, two of these three combinations used indexing strategies that employed compound splitting. Normalization and compound splitting were performed by SWETWOL, a morphological analyzer for the Swedish language. A fourth combination attempted to group related terms by right hand truncation of query terms. The four combinations were compared to each other and to a baseline combination, where no attempt was made to counteract the problem of morphological variation of query terms in the document database. The five combinations were evaluated under six different user scenarios, where each scenario simulated a certain user type. The four alternative combinations outperformed the baseline, for each user scenario. The truncation combination had the best performance under each user scenario. The main conclusion of the paper is that normalization and right hand truncation (performed by a search expert) enhanced retrieval effectiveness in comparison to the baseline. The performance of the three combinations of indexing strategies with query terms based on normalization was not far below the performance of the truncation combination.
Milstead, J.L.: Thesauri in a full-text world (1998) 0.01
```
0.012797958 = product of:
  0.031994894 = sum of:
    0.017845279 = weight(_text_:of in 2337) [ClassicSimilarity], result of:
      0.017845279 = score(doc=2337,freq=20.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.27317715 = fieldWeight in 2337, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.0141496165 = product of:
      0.028299233 = sum of:
        0.028299233 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.028299233 = score(doc=2337,freq=2.0), product of:
            0.14628662 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04177434 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future

Date

22. 9.1997 19:16:05

Imprint

Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science

Source

Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
Correa, C.A.; Kobashi, N.Y.: ¬A hybrid model of automatic indexing based on paraconsitent logic 0.01
```
0.012556955 = product of:
  0.031392388 = sum of:
    0.009978054 = product of:
      0.04989027 = sum of:
        0.04989027 = weight(_text_:problem in 3537) [ClassicSimilarity], result of:
          0.04989027 = score(doc=3537,freq=2.0), product of:
            0.17731056 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.04177434 = queryNorm
            0.28137225 = fieldWeight in 3537, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=3537)
      0.2 = coord(1/5)
    0.021414334 = weight(_text_:of in 3537) [ClassicSimilarity], result of:
      0.021414334 = score(doc=3537,freq=20.0), product of:
        0.06532493 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.04177434 = queryNorm
        0.32781258 = fieldWeight in 3537, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=3537)
  0.4 = coord(2/5)
```
Abstract

In the processes of information organization, information retrieval and information visualization one point in common can be found: they are strongly connected by the procedures associated to the indexing of texts or documents. Indexing is an essential component of text analysis, and the indexing process have equal importance both for retrieval and for the visualization of information. In this context, it is worth mentioning the solutions that use automatic indexing. The researches that proposes solutions for automatic indexing are based on different theoretical assumptions such as: statistics, linguistics and controlled vocabulary (Leiva 1999). Most solutions developed hybrid models combining these theoretical assumptions. Other solutions to the problem of automatic indexing are based on theories that allow the treatment of uncertainty, imprecision and vagueness. The aim of this paper is to argue the theoretical potential for use in hybrid models of automatic indexing, the paraconsistent logic, a non-classical logic, with capacity to handle situations that involve uncertainty, imprecision and vagueness.

Source

Paradigms and conceptual systems in knowledge organization: Proceedings of the Eleventh International ISKO conference, Rome, 23-26 February 2010, ed. Claudio Gnoli, Indeks, Frankfurt M

Search (266 results, page 1 of 14)

Authors

Years

Languages

Types

Themes

Subjects

Classifications