Search (128 results, page 1 of 7)

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03

0.033043582 = product of:
  0.14869611 = sum of:
    0.10820941 = weight(_text_:processing in 402) [ClassicSimilarity], result of:
      0.10820941 = score(doc=402,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.7156181 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
    0.040486705 = product of:
      0.08097341 = sum of:
        0.08097341 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.08097341 = score(doc=402,freq=2.0), product of:
            0.13080442 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037353165 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Source: Information processing and management. 22(1986) no.6, S.465-476

Fox, C.: Lexical analysis and stoplists (1992) 0.02

0.022397658 = product of:
  0.10078946 = sum of:
    0.04668475 = weight(_text_:data in 3502) [ClassicSimilarity], result of:
      0.04668475 = score(doc=3502,freq=4.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.3952563 = fieldWeight in 3502, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=3502)
    0.054104704 = weight(_text_:processing in 3502) [ClassicSimilarity], result of:
      0.054104704 = score(doc=3502,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.35780904 = fieldWeight in 3502, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0625 = fieldNorm(doc=3502)
  0.22222222 = coord(2/9)

Abstract: Lexical analysis is a fundamental operation in both query processing and automatic indexing, and filtering stoplist words is an important step in the automatic indexing process. Presents basic algorithms and data structures for lexical analysis, and shows how stoplist word removal can be efficiently incorporated into lexical analysis
Source: Information retrieval: data structures and algorithms. Ed.: W.B. Frakes u. R. Baeza-Yates

Milstead, J.L.: Thesauri in a full-text world (1998) 0.02

0.022366492 = product of:
  0.067099474 = sum of:
    0.02063194 = weight(_text_:data in 2337) [ClassicSimilarity], result of:
      0.02063194 = score(doc=2337,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.17468026 = fieldWeight in 2337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.03381544 = weight(_text_:processing in 2337) [ClassicSimilarity], result of:
      0.03381544 = score(doc=2337,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.22363065 = fieldWeight in 2337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.012652095 = product of:
      0.02530419 = sum of:
        0.02530419 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.02530419 = score(doc=2337,freq=2.0), product of:
            0.13080442 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037353165 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Date: 22. 9.1997 19:16:05
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.02

0.021296859 = product of:
  0.095835865 = sum of:
    0.028884713 = weight(_text_:data in 5236) [ClassicSimilarity], result of:
      0.028884713 = score(doc=5236,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.24455236 = fieldWeight in 5236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5236)
    0.066951156 = weight(_text_:processing in 5236) [ClassicSimilarity], result of:
      0.066951156 = score(doc=5236,freq=4.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.4427661 = fieldWeight in 5236, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5236)
  0.22222222 = coord(2/9)

Abstract: The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.

Advances in intelligent retrieval: Proc. of a conference ... Wadham College, Oxford, 16.-17.4.1985 (1986) 0.02
```
0.020533392 = product of:
  0.09240027 = sum of:
    0.03501356 = weight(_text_:data in 1384) [ClassicSimilarity], result of:
      0.03501356 = score(doc=1384,freq=4.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.29644224 = fieldWeight in 1384, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1384)
    0.057386704 = weight(_text_:processing in 1384) [ClassicSimilarity], result of:
      0.057386704 = score(doc=1384,freq=4.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.3795138 = fieldWeight in 1384, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.046875 = fieldNorm(doc=1384)
  0.22222222 = coord(2/9)
```
Content

Enthält die Beiträge: ADDIS, T.: Extended relational analysis: a design approach to knowledge-based systems; PARKINSON, D.: Supercomputers and non-numeric processing; McGREGOR, D.R. u. J.R. MALONE: An architectural approach to advances in information retrieval; ALLEN, M.J. u. O.S. HARRISON: Word processing and information retrieval: some practical problems; MURTAGH, F.: Clustering and nearest neighborhood searching; ENSER, P.G.B.: Experimenting with the automatic classification of books; TESKEY, N. u. Z. RAZAK: An analysis of ranking for free text retrieval systems; ZARRI, G.P.: Interactive information retrieval: an artificial intelligence approach to deal with biographical data; HANCOX, P. u. F. SMITH: A case system processor for the PRECIS indexing language; ROUAULT, J.: Linguistic methods in information retrieval systems; ARAGON-RAMIREZ, V. u. C.D. PAICE: Design of a system for the online elucidation of natural language search statements; BROOKS, H.M., P.J. DANIELS u. N.J. BELKIN: Problem descriptions and user models: developing an intelligent interface for document retrieval systems; BLACK, W.J., P. HARGREAVES u. P.B. MAYES: HEADS: a cataloguing advisory system; BELL, D.A.: An architecture for integrating data, knowledge, and information bases

Short, M.: Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels (2019) 0.02

0.020520551 = product of:
  0.09234248 = sum of:
    0.06345777 = weight(_text_:cataloging in 5481) [ClassicSimilarity], result of:
      0.06345777 = score(doc=5481,freq=4.0), product of:
        0.14721331 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.037353165 = queryNorm
        0.43106002 = fieldWeight in 5481, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5481)
    0.028884713 = weight(_text_:data in 5481) [ClassicSimilarity], result of:
      0.028884713 = score(doc=5481,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.24455236 = fieldWeight in 5481, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5481)
  0.22222222 = coord(2/9)

Abstract: This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
Source: Cataloging and classification quarterly. 57(2019) no.5, S.315-336
Theme: Data Mining

Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.02

0.020491786 = product of:
  0.092213035 = sum of:
    0.04487142 = weight(_text_:cataloging in 1139) [ClassicSimilarity], result of:
      0.04487142 = score(doc=1139,freq=2.0), product of:
        0.14721331 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.037353165 = queryNorm
        0.30480546 = fieldWeight in 1139, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
    0.047341615 = weight(_text_:processing in 1139) [ClassicSimilarity], result of:
      0.047341615 = score(doc=1139,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.3130829 = fieldWeight in 1139, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1139)
  0.22222222 = coord(2/9)

Abstract: In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
Source: Cataloging and classification quarterly. 60(2022) no.8, p.807-835

Milstead, J.L.: Methodologies for subject analysis in bibliographic databases (1992) 0.02

0.01959795 = product of:
  0.08819077 = sum of:
    0.040849157 = weight(_text_:data in 2311) [ClassicSimilarity], result of:
      0.040849157 = score(doc=2311,freq=4.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.34584928 = fieldWeight in 2311, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2311)
    0.047341615 = weight(_text_:processing in 2311) [ClassicSimilarity], result of:
      0.047341615 = score(doc=2311,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.3130829 = fieldWeight in 2311, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2311)
  0.22222222 = coord(2/9)

Abstract: The goal of the study was to determine the state of the art of subject analysis as applied to large bibliographic data bases. The intent was to gather and evaluate information, casting it in a form that could be applied by management. There was no attempt to determine actual costs or trade-offs among costs and possible benefits. Commercial automatic indexing packages were also reviewed. The overall conclusion was that data base producers should begin working seriously on upgrading their thesauri and codifying their indexing policies as a means of moving toward development of machine aids to indexing, but that fully automatic indexing is not yet ready for wholesale implementation
Source: Information processing and management. 28(1992) no.3, S.407-431

Polity, Y.: Vers une ergonomie linguistique (1994) 0.02

0.019359069 = product of:
  0.08711581 = sum of:
    0.0330111 = weight(_text_:data in 36) [ClassicSimilarity], result of:
      0.0330111 = score(doc=36,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.2794884 = fieldWeight in 36, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0625 = fieldNorm(doc=36)
    0.054104704 = weight(_text_:processing in 36) [ClassicSimilarity], result of:
      0.054104704 = score(doc=36,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.35780904 = fieldWeight in 36, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0625 = fieldNorm(doc=36)
  0.22222222 = coord(2/9)

Abstract: Analyzed a special type of man-mchine interaction, that of searching an information system with natural language. A model for full text processing for information retrieval was proposed that considered the system's users and how they employ information. Describes how INIST (the National Institute for Scientific and Technical Information) is developing computer assisted indexing as an aid to improving relevance when retrieving information from bibliographic data banks

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.02

0.018814243 = product of:
  0.08466409 = sum of:
    0.066951156 = weight(_text_:processing in 530) [ClassicSimilarity], result of:
      0.066951156 = score(doc=530,freq=4.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.4427661 = fieldWeight in 530, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.017712934 = product of:
      0.035425868 = sum of:
        0.035425868 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.035425868 = score(doc=530,freq=2.0), product of:
            0.13080442 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037353165 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.02

0.016939184 = product of:
  0.076226324 = sum of:
    0.028884713 = weight(_text_:data in 6681) [ClassicSimilarity], result of:
      0.028884713 = score(doc=6681,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.24455236 = fieldWeight in 6681, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6681)
    0.047341615 = weight(_text_:processing in 6681) [ClassicSimilarity], result of:
      0.047341615 = score(doc=6681,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.3130829 = fieldWeight in 6681, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6681)
  0.22222222 = coord(2/9)

Abstract: Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system
Source: Information processing and management. 27(1991) no.1, S.43-54

SIGIR'92 : Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (1992) 0.02
```
0.016675735 = product of:
  0.07504081 = sum of:
    0.051370002 = weight(_text_:germany in 6671) [ClassicSimilarity], result of:
      0.051370002 = score(doc=6671,freq=2.0), product of:
        0.22275731 = queryWeight, product of:
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.037353165 = queryNorm
        0.23060973 = fieldWeight in 6671, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.963546 = idf(docFreq=308, maxDocs=44218)
          0.02734375 = fieldNorm(doc=6671)
    0.023670807 = weight(_text_:processing in 6671) [ClassicSimilarity], result of:
      0.023670807 = score(doc=6671,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.15654145 = fieldWeight in 6671, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.02734375 = fieldNorm(doc=6671)
  0.22222222 = coord(2/9)
```
Abstract

The conference was organized by the Royal School of Librarianship in Copenhagen and was held in cooperation with AICA-GLIR (Italy), BCS-IRSG (UK), DD (Denmark), GI (Germany), INRIA (France). It had support from Apple Computer, Denmark. The volume contains the 32 papers and reports on the two panel sessions, moderated by W.B. Croft, and R. Kovetz, respectively

Content

HARMAN, D.: Relevance feedback revisited; AALBERSBERG, I.J.: Incremental relevance feedback; TAGUE-SUTCLIFFE, J.: Measuring the informativeness of a retrieval process; LEWIS, D.D.: An evaluation of phrasal and clustered representations on a text categorization task; BLOSSEVILLE, M.J., G. HÉBRAIL, M.G. MONTEIL u. N. PÉNOT: Automatic document classification: natural language processing, statistical analysis, and expert system techniques used together; MASAND, B., G. LINOFF u. D. WALTZ: Classifying news stories using memory based reasoning; KEEN, E.M.: Term position ranking: some new test results; CROUCH, C.J. u. B. YANG: Experiments in automatic statistical thesaurus construction; GREFENSTETTE, G.: Use of syntactic context to produce term association lists for text retrieval; ANICK, P.G. u. R.A. FLYNN: Versioning of full-text information retrieval system; BURKOWSKI, F.J.: Retrieval activities in a database consisting of heterogeneous collections; DEERWESTER, S.C., K. WACLENA u. M. LaMAR: A textual object management system; NIE, J.-Y.:Towards a probabilistic modal logic for semantic-based information retrieval; WANG, A.W., S.K.M. WONG u. Y.Y. YAO: An analysis of vector space models based on computational geometry; BARTELL, B.T., G.W. COTTRELL u. R.K. BELEW: Latent semantic indexing is an optimal special case of multidimensional scaling; GLAVITSCH, U. u. P. SCHÄUBLE: A system for retrieving speech documents; MARGULIS, E.L.: N-Poisson document modelling; HESS, M.: An incrementally extensible document retrieval system based on linguistics and logical principles; COOPER, W.S., F.C. GEY u. D.P. DABNEY: Probabilistic retrieval based on staged logistic regression; FUHR, N.: Integration of probabilistic fact and text retrieval; CROFT, B., L.A. SMITH u. H. TURTLE: A loosely-coupled integration of a text retrieval system and an object-oriented database system; DUMAIS, S.T. u. J. NIELSEN: Automating the assignement of submitted manuscripts to reviewers; GOST, M.A. u. M. MASOTTI: Design of an OPAC database to permit different subject searching accesses; ROBERTSON, A.M. u. P. WILLETT: Searching for historical word forms in a database of 17th century English text using spelling correction methods; FAX, E.A., Q.F. CHEN u. L.S. HEATH: A faster algorithm for constructing minimal perfect hash functions; MOFFAT, A. u. J. ZOBEL: Parameterised compression for sparse bitmaps; GRANDI, F., P. TIBERIO u. P. Zezula: Frame-sliced patitioned parallel signature files; ALLEN, B.: Cognitive differences in end user searching of a CD-ROM index; SONNENWALD, D.H.: Developing a theory to guide the process of designing information retrieval systems; CUTTING, D.R., J.O. PEDERSEN, D. KARGER, u. J.W. TUKEY: Scatter/ Gather: a cluster-based approach to browsing large document collections; CHALMERS, M. u. P. CHITSON: Bead: Explorations in information visualization; WILLIAMSON, C. u. B. SHNEIDERMAN: The dynamic HomeFinder: evaluating dynamic queries in a real-estate information exploring system

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.01

0.014792904 = product of:
  0.06656807 = sum of:
    0.04126388 = weight(_text_:data in 2759) [ClassicSimilarity], result of:
      0.04126388 = score(doc=2759,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.34936053 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.02530419 = product of:
      0.05060838 = sum of:
        0.05060838 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.05060838 = score(doc=2759,freq=2.0), product of:
            0.13080442 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037353165 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.01

0.014519301 = product of:
  0.06533685 = sum of:
    0.024758326 = weight(_text_:data in 1167) [ClassicSimilarity], result of:
      0.024758326 = score(doc=1167,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.2096163 = fieldWeight in 1167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1167)
    0.040578526 = weight(_text_:processing in 1167) [ClassicSimilarity], result of:
      0.040578526 = score(doc=1167,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.26835677 = fieldWeight in 1167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.046875 = fieldNorm(doc=1167)
  0.22222222 = coord(2/9)

Abstract: The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.

Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.01

0.014048787 = product of:
  0.06321954 = sum of:
    0.038461216 = weight(_text_:cataloging in 1871) [ClassicSimilarity], result of:
      0.038461216 = score(doc=1871,freq=2.0), product of:
        0.14721331 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.037353165 = queryNorm
        0.26126182 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
    0.024758326 = weight(_text_:data in 1871) [ClassicSimilarity], result of:
      0.024758326 = score(doc=1871,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.2096163 = fieldWeight in 1871, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
  0.22222222 = coord(2/9)

Abstract: Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.

Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.01

0.014048787 = product of:
  0.06321954 = sum of:
    0.038461216 = weight(_text_:cataloging in 720) [ClassicSimilarity], result of:
      0.038461216 = score(doc=720,freq=2.0), product of:
        0.14721331 = queryWeight, product of:
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.037353165 = queryNorm
        0.26126182 = fieldWeight in 720, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9411201 = idf(docFreq=2334, maxDocs=44218)
          0.046875 = fieldNorm(doc=720)
    0.024758326 = weight(_text_:data in 720) [ClassicSimilarity], result of:
      0.024758326 = score(doc=720,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.2096163 = fieldWeight in 720, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=720)
  0.22222222 = coord(2/9)

Source: Cataloging and classification quarterly. 59(2021) no.8, p.815-834
Theme: Data Mining

Chowdhury, G.G.: Natural language processing and information retrieval : pt.1: basic issues; pt.2: major applications (1991) 0.01

0.013015569 = product of:
  0.11714012 = sum of:
    0.11714012 = weight(_text_:processing in 3313) [ClassicSimilarity], result of:
      0.11714012 = score(doc=3313,freq=6.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.7746793 = fieldWeight in 3313, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.078125 = fieldNorm(doc=3313)
  0.11111111 = coord(1/9)

Abstract: Reviews the basic issues and procedures involved in natural language processing of textual material for final use in information retrieval. Covers: natural language processing; natural language understanding; syntactic and semantic analysis; parsing; knowledge bases and knowledge representation

Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description (2014) 0.01
```
0.012099418 = product of:
  0.05444738 = sum of:
    0.02063194 = weight(_text_:data in 1873) [ClassicSimilarity], result of:
      0.02063194 = score(doc=1873,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.17468026 = fieldWeight in 1873, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1873)
    0.03381544 = weight(_text_:processing in 1873) [ClassicSimilarity], result of:
      0.03381544 = score(doc=1873,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.22363065 = fieldWeight in 1873, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1873)
  0.22222222 = coord(2/9)
```
Abstract

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Li, X.; Zhang, A.; Li, C.; Ouyang, J.; Cai, Y.: Exploring coherent topics by topic modeling with term weighting (2018) 0.01
```
0.012099418 = product of:
  0.05444738 = sum of:
    0.02063194 = weight(_text_:data in 5045) [ClassicSimilarity], result of:
      0.02063194 = score(doc=5045,freq=2.0), product of:
        0.118112594 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.037353165 = queryNorm
        0.17468026 = fieldWeight in 5045, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5045)
    0.03381544 = weight(_text_:processing in 5045) [ClassicSimilarity], result of:
      0.03381544 = score(doc=5045,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.22363065 = fieldWeight in 5045, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5045)
  0.22222222 = coord(2/9)
```
Abstract

Topic models often produce unexplainable topics that are filled with noisy words. The reason is that words in topic modeling have equal weights. High frequency words dominate the top topic word lists, but most of them are meaningless words, e.g., domain-specific stopwords. To address this issue, in this paper we aim to investigate how to weight words, and then develop a straightforward but effective term weighting scheme, namely entropy weighting (EW). The proposed EW scheme is based on conditional entropy measured by word co-occurrences. Compared with existing term weighting schemes, the highlight of EW is that it can automatically reward informative words. For more robust word weight, we further suggest a combination form of EW (CEW) with two existing weighting schemes. Basically, our CEW assigns meaningless words lower weights and informative words higher weights, leading to more coherent topics during topic modeling inference. We apply CEW to Dirichlet multinomial mixture and latent Dirichlet allocation, and evaluate it by topic quality, document clustering and classification tasks on 8 real world data sets. Experimental results show that weighting words can effectively improve the topic modeling performance over both short texts and normal long texts. More importantly, the proposed CEW significantly outperforms the existing term weighting schemes, since it further considers which words are informative.

Source

Information processing and management. 54(2018) no.6, S.1345-1358

Jones, K.P.: Natural-language processing and automatic indexing : a reply (1990) 0.01

0.012023267 = product of:
  0.10820941 = sum of:
    0.10820941 = weight(_text_:processing in 394) [ClassicSimilarity], result of:
      0.10820941 = score(doc=394,freq=2.0), product of:
        0.15121111 = queryWeight, product of:
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.037353165 = queryNorm
        0.7156181 = fieldWeight in 394, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.048147 = idf(docFreq=2097, maxDocs=44218)
          0.125 = fieldNorm(doc=394)
  0.11111111 = coord(1/9)

Search (128 results, page 1 of 7)

Authors

Years

Languages

Types

Themes