Search (73 results, page 1 of 4)

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.04
```
0.038080614 = product of:
  0.13328214 = sum of:
    0.097399764 = weight(_text_:case in 3627) [ClassicSimilarity], result of:
      0.097399764 = score(doc=3627,freq=10.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.54307353 = fieldWeight in 3627, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.03588238 = weight(_text_:studies in 3627) [ClassicSimilarity], result of:
      0.03588238 = score(doc=3627,freq=2.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.22043361 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
  0.2857143 = coord(2/7)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.03

0.0327864 = product of:
  0.1147524 = sum of:
    0.087116994 = weight(_text_:case in 2759) [ClassicSimilarity], result of:
      0.087116994 = score(doc=2759,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.48573974 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.0276354 = product of:
      0.0552708 = sum of:
        0.0552708 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.0552708 = score(doc=2759,freq=2.0), product of:
            0.14285508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04079441 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 1. 2.2016 18:25:22

Husevag, A.-S.R.: Named entities in indexing : a case study of TV subtitles and metadata records (2016) 0.02

0.019393813 = product of:
  0.06787834 = sum of:
    0.024319848 = weight(_text_:libraries in 3105) [ClassicSimilarity], result of:
      0.024319848 = score(doc=3105,freq=2.0), product of:
        0.13401186 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.04079441 = queryNorm
        0.18147534 = fieldWeight in 3105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3105)
    0.043558497 = weight(_text_:case in 3105) [ClassicSimilarity], result of:
      0.043558497 = score(doc=3105,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.24286987 = fieldWeight in 3105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3105)
  0.2857143 = coord(2/7)

Source: Proceedings of the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016) co-located with the 20th International Conference on Theory and Practice of Digital Libraries 2016 (TPDL 2016), Hannover, Germany, September 9, 2016. Edi. by Philipp Mayr et al. [http://ceur-ws.org/Vol-1676/=urn:nbn:de:0074-1676-5]

Jones, S.; Paynter, G.W.: Automatic extractionof document keyphrases for use in digital libraries : evaluations and applications (2002) 0.02
```
0.017200638 = product of:
  0.060202226 = sum of:
    0.024319848 = weight(_text_:libraries in 601) [ClassicSimilarity], result of:
      0.024319848 = score(doc=601,freq=2.0), product of:
        0.13401186 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.04079441 = queryNorm
        0.18147534 = fieldWeight in 601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.0390625 = fieldNorm(doc=601)
    0.03588238 = weight(_text_:studies in 601) [ClassicSimilarity], result of:
      0.03588238 = score(doc=601,freq=2.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.22043361 = fieldWeight in 601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0390625 = fieldNorm(doc=601)
  0.2857143 = coord(2/7)
```
Abstract

This article describes an evaluation of the Kea automatic keyphrase extraction algorithm. Document keyphrases are conventionally used as concise descriptors of document content, and are increasingly used in novel ways, including document clustering, searching and browsing interfaces, and retrieval engines. However, it is costly and time consuming to manually assign keyphrases to documents, motivating the development of tools that automatically perform this function. Previous studies have evaluated Kea's performance by measuring its ability to identify author keywords and keyphrases, but this methodology has a number of well-known limitations. The results presented in this article are based on evaluations by human assessors of the quality and appropriateness of Kea keyphrases. The results indicate that, in general, Kea produces keyphrases that are rated positively by human assessors. However, typical Kea settings can degrade performance, particularly those relating to keyphrase length and domain specificity. We found that for some settings, Kea's performance is better than that of similar systems, and that Kea's ranking of extracted keyphrases is effective. We also determined that author-specified keyphrases appear to exhibit an inherent ranking, and that they are rated highly and therefore suitable for use in training and evaluation of automatic keyphrasing systems.

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.02

0.01525502 = product of:
  0.053392567 = sum of:
    0.03404779 = weight(_text_:libraries in 5001) [ClassicSimilarity], result of:
      0.03404779 = score(doc=5001,freq=2.0), product of:
        0.13401186 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.04079441 = queryNorm
        0.25406548 = fieldWeight in 5001, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.019344779 = product of:
      0.038689557 = sum of:
        0.038689557 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.038689557 = score(doc=5001,freq=2.0), product of:
            0.14285508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04079441 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.2857143 = coord(2/7)

Date: 14. 3.1996 13:22:21
Source: Special libraries. 74(1983) no.1, S. 56-60

Gomez, I.: Coping with the problem of subject classification diversity (1996) 0.01
```
0.012430022 = product of:
  0.087010145 = sum of:
    0.087010145 = weight(_text_:studies in 5074) [ClassicSimilarity], result of:
      0.087010145 = score(doc=5074,freq=6.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.53452307 = fieldWeight in 5074, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5074)
  0.14285715 = coord(1/7)
```
Abstract

The delimination of a research field in bibliometric studies presents the problem of the diversity of subject classifications used in the sources of input and output data. Classification of documents according the thematic codes or keywords is the most accurate method, mainly used is specialized bibliographic or patent databases. Classification of journals in disciplines presents lower specifity, and some shortcomings as the change over time of both journals and disciplines and the increasing interdisciplinarity of research. Standardization of subject classifications emerges as an important point in bibliometric studies in order to allow international comparisons, although flexibility is needed to meet the needs of local studies
Banerjee, K.; Johnson, M.: Improving access to archival collections with automated entity extraction (2015) 0.01
```
0.010560175 = product of:
  0.073921226 = sum of:
    0.073921226 = weight(_text_:case in 2144) [ClassicSimilarity], result of:
      0.073921226 = score(doc=2144,freq=4.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.41216385 = fieldWeight in 2144, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=2144)
  0.14285715 = coord(1/7)
```
Abstract

The complexity and diversity of archival resources make constructing rich metadata records time consuming and expensive, which in turn limits access to these valuable materials. However, significant automation of the metadata creation process would dramatically reduce the cost of providing access points, improve access to individual resources, and establish connections between resources that would otherwise remain unknown. Using a case study at Oregon Health & Science University as a lens to examine the conceptual and technical challenges associated with automated extraction of access points, we discuss using publically accessible API's to extract entities (i.e. people, places, concepts, etc.) from digital and digitized objects. We describe why Linked Open Data is not well suited for a use case such as ours. We conclude with recommendations about how this method can be used in archives as well as for other library applications.

Gibb, F.; Smart, G.: Knowledge-based indexing : the view from SIMPR (1991) 0.01

0.00972794 = product of:
  0.06809558 = sum of:
    0.06809558 = weight(_text_:libraries in 4424) [ClassicSimilarity], result of:
      0.06809558 = score(doc=4424,freq=2.0), product of:
        0.13401186 = queryWeight, product of:
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.04079441 = queryNorm
        0.50813097 = fieldWeight in 4424, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2850544 = idf(docFreq=4499, maxDocs=44218)
          0.109375 = fieldNorm(doc=4424)
  0.14285715 = coord(1/7)

Source: Libraries and expert systems. Ed. C. MacDonald et al

Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.01

0.0087117 = product of:
  0.0609819 = sum of:
    0.0609819 = weight(_text_:case in 5236) [ClassicSimilarity], result of:
      0.0609819 = score(doc=5236,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.34001783 = fieldWeight in 5236, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5236)
  0.14285715 = coord(1/7)

Pirkola, A.: Morphological typology of languages for IR (2001) 0.01
```
0.008699203 = product of:
  0.06089442 = sum of:
    0.06089442 = weight(_text_:studies in 4476) [ClassicSimilarity], result of:
      0.06089442 = score(doc=4476,freq=4.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.37408823 = fieldWeight in 4476, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.046875 = fieldNorm(doc=4476)
  0.14285715 = coord(1/7)
```
Abstract

This paper presents a morphological classification of languages from the IR perspective. Linguistic typology research has shown that the morphological complexity of every language in the world can be described by two variables, index of synthesis and index of fusion. These variables provide a theoretical basis for IR research handling morphological issues. A common theoretical framework is needed in particular because of the increasing significance of cross-language retrieval research and CLIR systems processing different languages. The paper elaborates the linguistic morphological typology for the purposes of IR research. It studies how the indexes of synthesis and fusion could be used as practical tools in mono- and cross-lingual IR research. The need for semantic and syntactic typologies is discussed. The paper also reviews studies made in different languages on the effects of morphology and stemming in IR.

Samstag-Schnock, U.; Meadow, C.T.: PBS: an ecomical natural language query interpreter (1993) 0.01

0.008201688 = product of:
  0.05741181 = sum of:
    0.05741181 = weight(_text_:studies in 5091) [ClassicSimilarity], result of:
      0.05741181 = score(doc=5091,freq=2.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.35269377 = fieldWeight in 5091, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0625 = fieldNorm(doc=5091)
  0.14285715 = coord(1/7)

Abstract: Reports on the design and implementation of the information searching and retrieval software, PBS (Parsing, Boolean recognition, Stemming) for the front end OAK 2, a new version of OAK developed at Toronto Univ. OAK 2 is a research tool for user behaviour studies. PBS receives natural language search statements from an end user and identifies search facets and implied Boolean logic operators

Advances in intelligent retrieval: Proc. of a conference ... Wadham College, Oxford, 16.-17.4.1985 (1986) 0.01
```
0.0074671716 = product of:
  0.0522702 = sum of:
    0.0522702 = weight(_text_:case in 1384) [ClassicSimilarity], result of:
      0.0522702 = score(doc=1384,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.29144385 = fieldWeight in 1384, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=1384)
  0.14285715 = coord(1/7)
```
Content

Enthält die Beiträge: ADDIS, T.: Extended relational analysis: a design approach to knowledge-based systems; PARKINSON, D.: Supercomputers and non-numeric processing; McGREGOR, D.R. u. J.R. MALONE: An architectural approach to advances in information retrieval; ALLEN, M.J. u. O.S. HARRISON: Word processing and information retrieval: some practical problems; MURTAGH, F.: Clustering and nearest neighborhood searching; ENSER, P.G.B.: Experimenting with the automatic classification of books; TESKEY, N. u. Z. RAZAK: An analysis of ranking for free text retrieval systems; ZARRI, G.P.: Interactive information retrieval: an artificial intelligence approach to deal with biographical data; HANCOX, P. u. F. SMITH: A case system processor for the PRECIS indexing language; ROUAULT, J.: Linguistic methods in information retrieval systems; ARAGON-RAMIREZ, V. u. C.D. PAICE: Design of a system for the online elucidation of natural language search statements; BROOKS, H.M., P.J. DANIELS u. N.J. BELKIN: Problem descriptions and user models: developing an intelligent interface for document retrieval systems; BLACK, W.J., P. HARGREAVES u. P.B. MAYES: HEADS: a cataloguing advisory system; BELL, D.A.: An architecture for integrating data, knowledge, and information bases
Liu, G.Z.: Semantic vector space model : implementation and evaluation (1997) 0.01
```
0.0074671716 = product of:
  0.0522702 = sum of:
    0.0522702 = weight(_text_:case in 161) [ClassicSimilarity], result of:
      0.0522702 = score(doc=161,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.29144385 = fieldWeight in 161, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=161)
  0.14285715 = coord(1/7)
```
Abstract

Presents the Semantic Vector Space Model (SVSM), a text representation and searching technique based on the combination of Vector Space Model (VSM) with heuristic syntax parsing and distributed representation of semantic case structures. Both document and queries are represented as semantic matrices. A search mechanism is designed to compute the similarity between 2 semantic matrices to predict relevancy. A prototype system was built to implement this model by modifying the SMART system and using the Xerox Part of Speech tagged as the pre-processor of the indexing. The prototype system was used in an experimental study to evaluate this technique in terms of precision, recall, and effectiveness of relevance ranking. Results show that if documents and queries were too short, the technique was less effective than VSM. But with longer documents and queires, especially when original docuemtns were used as queries, the system based on this technique was found be performance better than SMART
Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.01
```
0.0074671716 = product of:
  0.0522702 = sum of:
    0.0522702 = weight(_text_:case in 3187) [ClassicSimilarity], result of:
      0.0522702 = score(doc=3187,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.29144385 = fieldWeight in 3187, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.046875 = fieldNorm(doc=3187)
  0.14285715 = coord(1/7)
```
Abstract

The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
Clavel, G.; Walther, F.; Walther, J.: Indexation automatique de fonds bibliotheconomiques (1993) 0.01
```
0.007176476 = product of:
  0.05023533 = sum of:
    0.05023533 = weight(_text_:studies in 6610) [ClassicSimilarity], result of:
      0.05023533 = score(doc=6610,freq=2.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.30860704 = fieldWeight in 6610, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6610)
  0.14285715 = coord(1/7)
```
Abstract

A discussion of developments to date in the field of computerized indexing, based on presentations given at a seminar held at the Institute of Policy Studies in Paris in Nov 91. The methods tested so far, based on a linguistic approach, whether using natural language or special thesauri, encounter the same central problem - they are only successful when applied to collections of similar types of documents covering very specific subject areas. Despite this, the search for some sort of universal indexing metalanguage continues. In the end, computerized indexing works best when used in conjunction with manual indexing - ideally in the hands of a trained library science professional, who can extract the maximum value from a collection of documents for a particular user population

Krutulis, J.D.; Jacob, E.K.: ¬A theoretical model for the study of emergent structure in adaptive information networks (1995) 0.01

0.007176476 = product of:
  0.05023533 = sum of:
    0.05023533 = weight(_text_:studies in 3353) [ClassicSimilarity], result of:
      0.05023533 = score(doc=3353,freq=2.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.30860704 = fieldWeight in 3353, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3353)
  0.14285715 = coord(1/7)

Imprint: Alberta : Alberta University, School of Library and Information Studies

Lepsky, K.; Müller, T.; Wille, J.: Metadata improvement for image information retrieval (2010) 0.01
```
0.007176476 = product of:
  0.05023533 = sum of:
    0.05023533 = weight(_text_:studies in 4995) [ClassicSimilarity], result of:
      0.05023533 = score(doc=4995,freq=2.0), product of:
        0.1627809 = queryWeight, product of:
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.04079441 = queryNorm
        0.30860704 = fieldWeight in 4995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9902744 = idf(docFreq=2222, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
  0.14285715 = coord(1/7)
```
Abstract

This paper discusses the goals and results of the research project Perseus-a as an attempt to improve information retrieval of digital images by automatically connecting them with text-based descriptions. The development uses the image collection of prometheus, the distributed digital image archive for research and studies, the articles of the digitized Reallexikon zur Deutschen Kunstgeschichte, art historical terminological resources and classification data, and an open source system for linguistic and statistic automatic indexing called lingo.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01

0.0063166628 = product of:
  0.044216637 = sum of:
    0.044216637 = product of:
      0.08843327 = sum of:
        0.08843327 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.08843327 = score(doc=402,freq=2.0), product of:
            0.14285508 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.04079441 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Source: Information processing and management. 22(1986) no.6, S.465-476

Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.01
```
0.0062226425 = product of:
  0.043558497 = sum of:
    0.043558497 = weight(_text_:case in 2161) [ClassicSimilarity], result of:
      0.043558497 = score(doc=2161,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.24286987 = fieldWeight in 2161, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2161)
  0.14285715 = coord(1/7)
```
Abstract

Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.
Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.01
```
0.0062226425 = product of:
  0.043558497 = sum of:
    0.043558497 = weight(_text_:case in 3667) [ClassicSimilarity], result of:
      0.043558497 = score(doc=3667,freq=2.0), product of:
        0.17934912 = queryWeight, product of:
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.04079441 = queryNorm
        0.24286987 = fieldWeight in 3667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.3964143 = idf(docFreq=1480, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
  0.14285715 = coord(1/7)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.

Search (73 results, page 1 of 4)

Authors

Years

Languages

Types

Themes