Search (77 results, page 1 of 4)

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.03
```
0.032367557 = product of:
  0.12947023 = sum of:
    0.0946141 = weight(_text_:case in 3627) [ClassicSimilarity], result of:
      0.0946141 = score(doc=3627,freq=10.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.54307353 = fieldWeight in 3627, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.034856133 = weight(_text_:studies in 3627) [ClassicSimilarity], result of:
      0.034856133 = score(doc=3627,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.22043361 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
  0.25 = coord(2/8)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.03

0.027867611 = product of:
  0.111470446 = sum of:
    0.08462543 = weight(_text_:case in 2759) [ClassicSimilarity], result of:
      0.08462543 = score(doc=2759,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.48573974 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.026845016 = product of:
      0.05369003 = sum of:
        0.05369003 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.05369003 = score(doc=2759,freq=2.0), product of:
            0.13876937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03962768 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.25 = coord(2/8)

Date: 1. 2.2016 18:25:22

Husevag, A.-S.R.: Named entities in indexing : a case study of TV subtitles and metadata records (2016) 0.02

0.016484251 = product of:
  0.065937005 = sum of:
    0.023624292 = weight(_text_:libraries in 3105) [ClassicSimilarity], result of:
      0.023624292 = score(doc=3105,freq=2.0), product of:
        0.13017908 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03962768 = queryNorm
        0.18147534 = fieldWeight in 3105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3105)
    0.042312715 = weight(_text_:case in 3105) [ClassicSimilarity], result of:
      0.042312715 = score(doc=3105,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.24286987 = fieldWeight in 3105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3105)
  0.25 = coord(2/8)

Source: Proceedings of the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016) co-located with the 20th International Conference on Theory and Practice of Digital Libraries 2016 (TPDL 2016), Hannover, Germany, September 9, 2016. Edi. by Philipp Mayr et al. [http://ceur-ws.org/Vol-1676/=urn:nbn:de:0074-1676-5]

Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.01
```
0.014620107 = product of:
  0.058480427 = sum of:
    0.023624292 = weight(_text_:libraries in 601) [ClassicSimilarity], result of:
      0.023624292 = score(doc=601,freq=2.0), product of:
        0.13017908 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03962768 = queryNorm
        0.18147534 = fieldWeight in 601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.0390625 = fieldNorm(doc=601)
    0.034856133 = weight(_text_:studies in 601) [ClassicSimilarity], result of:
      0.034856133 = score(doc=601,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.22043361 = fieldWeight in 601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0390625 = fieldNorm(doc=601)
  0.25 = coord(2/8)
```
Abstract

This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.01

0.01296638 = product of:
  0.05186552 = sum of:
    0.03307401 = weight(_text_:libraries in 5001) [ClassicSimilarity], result of:
      0.03307401 = score(doc=5001,freq=2.0), product of:
        0.13017908 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03962768 = queryNorm
        0.25406548 = fieldWeight in 5001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.018791512 = product of:
      0.037583023 = sum of:
        0.037583023 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.037583023 = score(doc=5001,freq=2.0), product of:
            0.13876937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03962768 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.25 = coord(2/8)

Date: 14. 3.1996 13:22:21
Source: Special libraries. 74(1983) no.1, S. 56-60

Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.01
```
0.012285966 = product of:
  0.049143866 = sum of:
    0.027884906 = weight(_text_:studies in 1264) [ClassicSimilarity], result of:
      0.027884906 = score(doc=1264,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.17634688 = fieldWeight in 1264, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03125 = fieldNorm(doc=1264)
    0.02125896 = product of:
      0.04251792 = sum of:
        0.04251792 = weight(_text_:area in 1264) [ClassicSimilarity], result of:
          0.04251792 = score(doc=1264,freq=2.0), product of:
            0.1952553 = queryWeight, product of:
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03962768 = queryNorm
            0.21775553 = fieldWeight in 1264, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.01
```
0.010565204 = product of:
  0.08452163 = sum of:
    0.08452163 = weight(_text_:studies in 5074) [ClassicSimilarity], result of:
      0.08452163 = score(doc=5074,freq=6.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.53452307 = fieldWeight in 5074, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5074)
  0.125 = coord(1/8)
```
Abstract

The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies
Salton, G.: Automatic processing of foreign language documents (1985) 0.01
```
0.010039598 = product of:
  0.04015839 = sum of:
    0.018899433 = weight(_text_:libraries in 3650) [ClassicSimilarity], result of:
      0.018899433 = score(doc=3650,freq=2.0), product of:
        0.13017908 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03962768 = queryNorm
        0.14518027 = fieldWeight in 3650, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03125 = fieldNorm(doc=3650)
    0.02125896 = product of:
      0.04251792 = sum of:
        0.04251792 = weight(_text_:area in 3650) [ClassicSimilarity], result of:
          0.04251792 = score(doc=3650,freq=2.0), product of:
            0.1952553 = queryWeight, product of:
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03962768 = queryNorm
            0.21775553 = fieldWeight in 3650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.927245 = idf(docFreq=870, maxDocs=44218)
              0.03125 = fieldNorm(doc=3650)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

The attempt to computerize a process, such as indexing, abstracting, classifying, or retrieving information, begins with an analysis of the process into its intellectual and nonintellectual components. That part of the process which is amenable to computerization is mechanical or algorithmic. What is not is intellectual or creative and requires human intervention. Gerard Salton has been an innovator, experimenter, and promoter in the area of mechanized information systems since the early 1960s. He has been particularly ingenious at analyzing the process of information retrieval into its algorithmic components. He received a doctorate in applied mathematics from Harvard University before moving to the computer science department at Cornell, where he developed a prototype automatic retrieval system called SMART. Working with this system he and his students contributed for over a decade to our theoretical understanding of the retrieval process. On a more practical level, they have contributed design criteria for operating retrieval systems. The following selection presents one of the early descriptions of the SMART system; it is valuable as it shows the direction automatic retrieval methods were to take beyond simple word-matching techniques. These include various word normalization techniques to improve recall, for instance, the separation of words into stems and affixes; the correlation and clustering, using statistical association measures, of related terms; and the identification, using a concept thesaurus, of synonymous, broader, narrower, and sibling terms. They include, as weIl, techniques, both linguistic and statistical, to deal with the thorny problem of how to automatically extract from texts index terms that consist of more than one word. They include weighting techniques and various documentrequest matching algorithms. Significant among the latter are those which produce a retrieval output of citations ranked in relevante order. During the 1970s, Salton and his students went an to further refine these various techniques, particularly the weighting and statistical association measures. Many of their early innovations seem commonplace today. Some of their later techniques are still ahead of their time and await technological developments for implementation. The particular focus of the selection that follows is an the evaluation of a particular component of the SMART system, a multilingual thesaurus. By mapping English language expressions and their German equivalents to a common concept number, the thesaurus permitted the automatic processing of German language documents against English language queries and vice versa. The results of the evaluation, as it turned out, were somewhat inconclusive. However, this SMART experiment suggested in a bold and optimistic way how one might proceed to answer such complex questions as What is meant by retrieval language compatability? How it is to be achieved, and how evaluated?

Imprint

Littleton, CO : Libraries Unlimited
Banerjee, K.; Johnson, M.: Improving access to archival collections with automated entity extraction (2015) 0.01
```
0.008975883 = product of:
  0.071807064 = sum of:
    0.071807064 = weight(_text_:case in 2144) [ClassicSimilarity], result of:
      0.071807064 = score(doc=2144,freq=4.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.41216385 = fieldWeight in 2144, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
  0.125 = coord(1/8)
```
Abstract

The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise remain unknown. Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with automated extraction of access points, we discuss using publically accessible API's to extract entities (i.e. people, places, concepts, etc.) from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with recommendations about how this method can be used in archives as well as for other library applications.

Gibb, F.; Smart, G.: Knowledge-based indexing : the view from SIMPR (1991) 0.01

0.008268503 = product of:
  0.06614802 = sum of:
    0.06614802 = weight(_text_:libraries in 4424) [ClassicSimilarity], result of:
      0.06614802 = score(doc=4424,freq=2.0), product of:
        0.13017908 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.03962768 = queryNorm
        0.50813097 = fieldWeight in 4424, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.109375 = fieldNorm(doc=4424)
  0.125 = coord(1/8)

Source: Libraries and expert systems. Ed. C. MacDonald et al

Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.01

0.0074047255 = product of:
  0.059237804 = sum of:
    0.059237804 = weight(_text_:case in 5236) [ClassicSimilarity], result of:
      0.059237804 = score(doc=5236,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.34001783 = fieldWeight in 5236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5236)
  0.125 = coord(1/8)

Pirkola, A.: Morphological typology of languages for IR (2001) 0.01
```
0.0073941024 = product of:
  0.05915282 = sum of:
    0.05915282 = weight(_text_:studies in 4476) [ClassicSimilarity], result of:
      0.05915282 = score(doc=4476,freq=4.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.37408823 = fieldWeight in 4476, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.046875 = fieldNorm(doc=4476)
  0.125 = coord(1/8)
```
Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Samstag-Schnock, U.; Meadow, C.T.: PBS: an ecomical natural language query interpreter (1993) 0.01

0.0069712265 = product of:
  0.055769812 = sum of:
    0.055769812 = weight(_text_:studies in 5091) [ClassicSimilarity], result of:
      0.055769812 = score(doc=5091,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.35269377 = fieldWeight in 5091, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0625 = fieldNorm(doc=5091)
  0.125 = coord(1/8)

Abstract: Reports on the design and implementation of the information searching and retrieval software, PBS (Parsing, Boolean recognition, Stemming) for the front end OAK 2, a new version of OAK developed at Toronto Univ. OAK 2 is a research tool for user behaviour studies. PBS receives natural language search statements from an end user and identifies search facets and implied Boolean logic operators

Advances in intelligent retrieval: Proc. of a conference ... Wadham College, Oxford, 16.-17.4.1985 (1986) 0.01
```
0.0063469075 = product of:
  0.05077526 = sum of:
    0.05077526 = weight(_text_:case in 1384) [ClassicSimilarity], result of:
      0.05077526 = score(doc=1384,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.29144385 = fieldWeight in 1384, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=1384)
  0.125 = coord(1/8)
```
Content

Enthält die Beiträge: ADDIS, T.: Extended relational analysis: a design approach to knowledge-based systems; PARKINSON, D.: Supercomputers and non-numeric processing; McGREGOR, D.R. u. J.R. MALONE: An architectural approach to advances in information retrieval; ALLEN, M.J. u. O.S. HARRISON: Word processing and information retrieval: some practical problems; MURTAGH, F.: Clustering and nearest neighborhood searching; ENSER, P.G.B.: Experimenting with the automatic classification of books; TESKEY, N. u. Z. RAZAK: An analysis of ranking for free text retrieval systems; ZARRI, G.P.: Interactive information retrieval: an artificial intelligence approach to deal with biographical data; HANCOX, P. u. F. SMITH: A case system processor for the PRECIS indexing language; ROUAULT, J.: Linguistic methods in information retrieval systems; ARAGON-RAMIREZ, V. u. C.D. PAICE: Design of a system for the online elucidation of natural language search statements; BROOKS, H.M., P.J. DANIELS u. N.J. BELKIN: Problem descriptions and user models: developing an intelligent interface for document retrieval systems; BLACK, W.J., P. HARGREAVES u. P.B. MAYES: HEADS: a cataloguing advisory system; BELL, D.A.: An architecture for integrating data, knowledge, and information bases
Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.01
```
0.0063469075 = product of:
  0.05077526 = sum of:
    0.05077526 = weight(_text_:case in 161) [ClassicSimilarity], result of:
      0.05077526 = score(doc=161,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.29144385 = fieldWeight in 161, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=161)
  0.125 = coord(1/8)
```
Abstract

Presents the Semantic Vector Space Model (SVSM), a text representation and searching technique based on the combination of Vector Space Model (VSM) with heuristic syntax parsing and distributed representation of semantic case structures. Both document and queries are represented as semantic matrices. A search mechanism is designed to compute the similarity between 2 semantic matrices to predict relevancy. A prototype system was built to implement this model by modifying the SMART system and using the Xerox Part of Speech tagged as the pre-processor of the indexing. The prototype system was used in an experimental study to evaluate this technique in terms of precision, recall, and effectiveness of relevance ranking. Results show that if documents and queries were too short, the technique was less effective than VSM. But with longer documents and queires, especially when original docuemtns were used as queries, the system based on this technique was found be performance better than SMART
Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.01
```
0.0063469075 = product of:
  0.05077526 = sum of:
    0.05077526 = weight(_text_:case in 3187) [ClassicSimilarity], result of:
      0.05077526 = score(doc=3187,freq=2.0), product of:
        0.1742197 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.03962768 = queryNorm
        0.29144385 = fieldWeight in 3187, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=3187)
  0.125 = coord(1/8)
```
Abstract

The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
Clavel, G.; Walther, F.; Walther, J.: Indexation automatique de fonds bibliotheconomiques (1993) 0.01
```
0.006099823 = product of:
  0.048798583 = sum of:
    0.048798583 = weight(_text_:studies in 6610) [ClassicSimilarity], result of:
      0.048798583 = score(doc=6610,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.30860704 = fieldWeight in 6610, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6610)
  0.125 = coord(1/8)
```
Abstract

A discussion of developments to date in the field of computerized indexing, based on presentations given at a seminar held at the Institute of Policy Studies in Paris in Nov 91. The methods tested so far, based on a linguistic approach, whether using natural language or special thesauri, encounter the same central problem - they are only successful when applied to collections of similar types of documents covering very specific subject areas. Despite this, the search for some sort of universal indexing metalanguage continues. In the end, computerized indexing works best when used in conjunction with manual indexing - ideally in the hands of a trained library science professional, who can extract the maximum value from a collection of documents for a particular user population

Krutulis, J.D.; Jacob, E.K.: ¬A theoretical model for the study of emergent structure in adaptive information networks (1995) 0.01

0.006099823 = product of:
  0.048798583 = sum of:
    0.048798583 = weight(_text_:studies in 3353) [ClassicSimilarity], result of:
      0.048798583 = score(doc=3353,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.30860704 = fieldWeight in 3353, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3353)
  0.125 = coord(1/8)

Imprint: Alberta : Alberta University, School of Library and Information Studies

Lepsky, K.; Müller, T.; Wille, J.: Metadata improvement for image information retrieval (2010) 0.01
```
0.006099823 = product of:
  0.048798583 = sum of:
    0.048798583 = weight(_text_:studies in 4995) [ClassicSimilarity], result of:
      0.048798583 = score(doc=4995,freq=2.0), product of:
        0.15812531 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.03962768 = queryNorm
        0.30860704 = fieldWeight in 4995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
  0.125 = coord(1/8)
```
Abstract

This paper discusses the goals and results of the research project Perseus-a as an attempt to improve information retrieval of digital images by automatically connecting them with text-based descriptions. The development uses the image collection of prometheus, the distributed digital image archive for research and studies, the articles of the digitized Reallexikon zur Deutschen Kunstgeschichte, art historical terminological resources and classification data, and an open source system for linguistic and statistic automatic indexing called lingo.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01

0.0053690034 = product of:
  0.042952027 = sum of:
    0.042952027 = product of:
      0.085904054 = sum of:
        0.085904054 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.085904054 = score(doc=402,freq=2.0), product of:
            0.13876937 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03962768 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.125 = coord(1/8)

Source: Information processing and management. 22(1986) no.6, S.465-476

Search (77 results, page 1 of 4)

Authors

Years

Languages

Types

Themes

Classifications