Search (55 results, page 1 of 3)

Taylor, S.L.: Integrating natural language understanding with document structure analysis (1994) 0.05

0.052838933 = product of:
  0.21135573 = sum of:
    0.10511641 = weight(_text_:supported in 1794) [ClassicSimilarity], result of:
      0.10511641 = score(doc=1794,freq=2.0), product of:
        0.22949564 = queryWeight, product of:
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.03875087 = queryNorm
        0.45803228 = fieldWeight in 1794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1794)
    0.10623932 = weight(_text_:cooperative in 1794) [ClassicSimilarity], result of:
      0.10623932 = score(doc=1794,freq=2.0), product of:
        0.23071818 = queryWeight, product of:
          5.953884 = idf(docFreq=311, maxDocs=44218)
          0.03875087 = queryNorm
        0.46047226 = fieldWeight in 1794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.953884 = idf(docFreq=311, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1794)
  0.25 = coord(2/8)

Abstract: Document understanding, the interpretation of a document from its image form, is a technology area which benefits greatly from the integration of natural language processing with image processing. Develops a prototype of an Intelligent Document Understanding System (IDUS) which employs several technologies: image processing, optical character recognition, document structure analysis and text understanding in a cooperative fashion. Discusses those areas of research during development of IDUS where it is found that the most benefit from the integration of natural language processing and image processing occured: document structure analysis, OCR correction, and text analysis. Discusses 2 applications which are supported by IDUS: text retrieval and automatic generation of hypertext links

Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.03
```
0.028966932 = product of:
  0.11586773 = sum of:
    0.07508315 = weight(_text_:supported in 3627) [ClassicSimilarity], result of:
      0.07508315 = score(doc=3627,freq=2.0), product of:
        0.22949564 = queryWeight, product of:
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.03875087 = queryNorm
        0.3271659 = fieldWeight in 3627, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
    0.040784575 = weight(_text_:work in 3627) [ClassicSimilarity], result of:
      0.040784575 = score(doc=3627,freq=4.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.28674924 = fieldWeight in 3627, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3627)
  0.25 = coord(2/8)
```
Abstract

A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).

Silvester, J.P.: Computer supported indexing : a history and evaluation of NASA's MAI system (1998) 0.03

0.026279103 = product of:
  0.21023282 = sum of:
    0.21023282 = weight(_text_:supported in 1302) [ClassicSimilarity], result of:
      0.21023282 = score(doc=1302,freq=2.0), product of:
        0.22949564 = queryWeight, product of:
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.03875087 = queryNorm
        0.91606456 = fieldWeight in 1302, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.109375 = fieldNorm(doc=1302)
  0.125 = coord(1/8)

Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.03
```
0.02598055 = product of:
  0.1039222 = sum of:
    0.07508315 = weight(_text_:supported in 3151) [ClassicSimilarity], result of:
      0.07508315 = score(doc=3151,freq=2.0), product of:
        0.22949564 = queryWeight, product of:
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.03875087 = queryNorm
        0.3271659 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.9223356 = idf(docFreq=321, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
    0.028839052 = weight(_text_:work in 3151) [ClassicSimilarity], result of:
      0.028839052 = score(doc=3151,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.20276234 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
  0.25 = coord(2/8)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Munkelt, J.: Erstellung einer DNB-Retrieval-Testkollektion (2018) 0.02

0.019800395 = product of:
  0.15840316 = sum of:
    0.15840316 = weight(_text_:hochschule in 4310) [ClassicSimilarity], result of:
      0.15840316 = score(doc=4310,freq=4.0), product of:
        0.23689921 = queryWeight, product of:
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.03875087 = queryNorm
        0.6686521 = fieldWeight in 4310, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
  0.125 = coord(1/8)

Content: Bachelorarbeit, Bibliothekswissenschaften, Fakultät für Informations- und Kommunikationswissenschaften, Technische Hochschule Köln
Imprint: Köln : Technische Hochschule, Fakultät für Informations- und Kommunikationswissenschaften

Krüger, C.: Evaluation des WWW-Suchdienstes GERHARD unter besonderer Beachtung automatischer Indexierung (1999) 0.01

0.01414314 = product of:
  0.11314512 = sum of:
    0.11314512 = weight(_text_:hochschule in 1777) [ClassicSimilarity], result of:
      0.11314512 = score(doc=1777,freq=4.0), product of:
        0.23689921 = queryWeight, product of:
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.03875087 = queryNorm
        0.47760868 = fieldWeight in 1777, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1777)
  0.125 = coord(1/8)

Footnote: Diplomarbeit im Fach Inhaltliche Erschließung, Studiengang Informationsmanagement der FH Stuttgart - Hochschule für Bibliotheks- und Informationswesen
Imprint: Stuttgart : FH - Hochschule für Bibliotheks- und Informationswesen

Pollmeier, M.: Verlagsschlagwörter als Grundlage für den Einsatz eines maschinellen Verfahrens zur verbalen Erschließung der Kinder- und Jugendliteratur durch die Deutsche Nationalbibliothek : eine Datenanalyse (2019) 0.01

0.01414314 = product of:
  0.11314512 = sum of:
    0.11314512 = weight(_text_:hochschule in 1081) [ClassicSimilarity], result of:
      0.11314512 = score(doc=1081,freq=4.0), product of:
        0.23689921 = queryWeight, product of:
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.03875087 = queryNorm
        0.47760868 = fieldWeight in 1081, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1081)
  0.125 = coord(1/8)

Footnote: Bachelorarbeit an der Hochschule für Technik, Wirtschaft und Kultur Leipzig Fakultät Informatik und Medien Studiengang Bibliotheks- und Informationswissenschaft.
Imprint: Leipzig : Hochschule für Technik, Wirtschaft und Kultur / Fakultät Informatik und Medien

Automatische Indexierung zwischen Forschung und Anwendung (1986) 0.01
```
0.014000993 = product of:
  0.112007946 = sum of:
    0.112007946 = weight(_text_:hochschule in 953) [ClassicSimilarity], result of:
      0.112007946 = score(doc=953,freq=2.0), product of:
        0.23689921 = queryWeight, product of:
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.03875087 = queryNorm
        0.47280845 = fieldWeight in 953, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.113391 = idf(docFreq=265, maxDocs=44218)
          0.0546875 = fieldNorm(doc=953)
  0.125 = coord(1/8)
```
Abstract

Die automatische Indexierung von Dokumenten für das Information Retrieval, d. h. die automatische Charakterisierung von Dokumentinhalten mittels Deskriptoren (Schlagwörtern) ist bereits seit über 25 Jahren ein Gebiet theoretischer und experimenteller Forschung. Dagegen wurde erst im Oktober 1985 mit der Anwendung der automatischen Indexierung in der Inputproduktion für ein großes Retrievalsystem begonnen. Es handelt sich um die Indexierung englischer Referatetexte für die Physik-Datenbasis des Informationszentrums Energie, Physik, Mathematik GmbH in Karlsruhe. In dem vorliegenden Buch beschreiben Mitarbeiter der Technischen Hochschule Darmstadt ihre Forschungs- und Entwicklungsarbeiten, die zu dieser Pilotanwendung geführt haben.
Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01
```
0.008392914 = product of:
  0.033571657 = sum of:
    0.02307124 = weight(_text_:work in 1442) [ClassicSimilarity], result of:
      0.02307124 = score(doc=1442,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.16220987 = fieldWeight in 1442, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.010500416 = product of:
      0.021000832 = sum of:
        0.021000832 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
          0.021000832 = score(doc=1442,freq=2.0), product of:
            0.13569894 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03875087 = queryNorm
            0.15476047 = fieldWeight in 1442, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
      0.5 = coord(1/2)
  0.25 = coord(2/8)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Buckley, C.; Allan, J.; Salton, G.: Automatic routing and retrieval using Smart : TREC-2 (1995) 0.01
```
0.0061176866 = product of:
  0.048941493 = sum of:
    0.048941493 = weight(_text_:work in 5699) [ClassicSimilarity], result of:
      0.048941493 = score(doc=5699,freq=4.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.3440991 = fieldWeight in 5699, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.046875 = fieldNorm(doc=5699)
  0.125 = coord(1/8)
```
Abstract

The Smart information retrieval project emphazises completely automatic approaches to the understanding and retrieval of large quantities of text. The work in the TREC-2 environment continues, performing both routing and ad hoc experiments. The ad hoc work extends investigations into combining global similarities, giving an overall indication of how a document matches a query, with local similarities identifying a smaller part of the document that matches the query. The performance of ad hoc runs is good, but it is clear that full advantage of the available local information is not been taken advantage of. The routing experiments use conventional relevance feedback approaches to routing, but with a much greater degree of query expansion than was previously done. The length of a query vector is increased by a factor of 5 to 10 by adding terms found in previously seen relevant documents. This approach improves effectiveness by 30-40% over the original query
Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.01
```
0.0061176866 = product of:
  0.048941493 = sum of:
    0.048941493 = weight(_text_:work in 4311) [ClassicSimilarity], result of:
      0.048941493 = score(doc=4311,freq=4.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.3440991 = fieldWeight in 4311, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.046875 = fieldNorm(doc=4311)
  0.125 = coord(1/8)
```
Abstract

Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?

Oliver, C.T.: One-eyed king: automated indexing (1989) 0.01

0.00576781 = product of:
  0.04614248 = sum of:
    0.04614248 = weight(_text_:work in 2316) [ClassicSimilarity], result of:
      0.04614248 = score(doc=2316,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.32441974 = fieldWeight in 2316, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.0625 = fieldNorm(doc=2316)
  0.125 = coord(1/8)

Abstract: In a work entitled 'Adagia' published in 1508, Erasmus collected ancient Greek and Roman proverbs. He included this proverb: "Among the blind, the one-eyed man is king". In a field where there is little interest in the theoretical research of related fields, and in understanding the theoretical assumptions on which practical activity is based, a one-eyed man, such as autumatic or mechanical indexing, easily appears respectable and becomes widely practiced despite its obvious deficiencies

Li, Z.: Research on dynamic morphological indexing (1998) 0.01

0.00576781 = product of:
  0.04614248 = sum of:
    0.04614248 = weight(_text_:work in 3242) [ClassicSimilarity], result of:
      0.04614248 = score(doc=3242,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.32441974 = fieldWeight in 3242, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.0625 = fieldNorm(doc=3242)
  0.125 = coord(1/8)

Abstract: Notes that in automatic indexing of Chinese words using dictionary matching methods, there is some difficulty in the indexing of proper nouns. Presents a solution called dynamic morphological indexing, based on work using automatic indexing of archive documents. Presents the algorithm for this solution

Salton, G.: SMART System: 1961-1976 (2009) 0.01

0.00576781 = product of:
  0.04614248 = sum of:
    0.04614248 = weight(_text_:work in 3879) [ClassicSimilarity], result of:
      0.04614248 = score(doc=3879,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.32441974 = fieldWeight in 3879, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.0625 = fieldNorm(doc=3879)
  0.125 = coord(1/8)

Abstract: While a number of researchers had experimented during the 1950's on automatic indexing and retrieval in various forms, it was Gerard Salton who brought the information retrieval experimental paradigm to full fruition, with his "SMART" system. His work has been enormously influential.

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.01

0.005250208 = product of:
  0.042001665 = sum of:
    0.042001665 = product of:
      0.08400333 = sum of:
        0.08400333 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.08400333 = score(doc=402,freq=2.0), product of:
            0.13569894 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03875087 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.125 = coord(1/8)

Source: Information processing and management. 22(1986) no.6, S.465-476

Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.01
```
0.005046834 = product of:
  0.04037467 = sum of:
    0.04037467 = weight(_text_:work in 710) [ClassicSimilarity], result of:
      0.04037467 = score(doc=710,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.28386727 = fieldWeight in 710, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.0546875 = fieldNorm(doc=710)
  0.125 = coord(1/8)
```
Abstract

Subject headings and genre terms are notoriously difficult to apply, yet are important for fiction. The current project functions as a proof of concept, using a text-mining methodology to identify affective information (emotion and tone) about fiction titles from professional book reviews as a potential first step in automating the subject analysis process. Findings are presented and discussed, comparing results to the range of aboutness and isness information in library cataloging records. The methodology is likewise presented, and how future work might expand on the current project to enhance catalog records through text-mining is explored.

Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984) 0.00

0.004593932 = product of:
  0.036751457 = sum of:
    0.036751457 = product of:
      0.07350291 = sum of:
        0.07350291 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
          0.07350291 = score(doc=262,freq=2.0), product of:
            0.13569894 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03875087 = queryNorm
            0.5416616 = fieldWeight in 262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=262)
      0.5 = coord(1/2)
  0.125 = coord(1/8)

Date: 20.10.2000 12:22:23

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.00

0.004593932 = product of:
  0.036751457 = sum of:
    0.036751457 = product of:
      0.07350291 = sum of:
        0.07350291 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.07350291 = score(doc=6265,freq=2.0), product of:
            0.13569894 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03875087 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.125 = coord(1/8)

Source: Information outlook. 9(2005) no.8, S.22-23

Mansour, N.; Haraty, R.A.; Daher, W.; Houri, M.: ¬An auto-indexing method for Arabic text (2008) 0.00
```
0.004325858 = product of:
  0.034606863 = sum of:
    0.034606863 = weight(_text_:work in 2103) [ClassicSimilarity], result of:
      0.034606863 = score(doc=2103,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.2433148 = fieldWeight in 2103, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.046875 = fieldNorm(doc=2103)
  0.125 = coord(1/8)
```
Abstract

This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%.
Zhitomirsky-Geffet, M.; Prebor, G.; Bloch, O.: Improving proverb search and retrieval with a generic multidimensional ontology (2017) 0.00
```
0.004325858 = product of:
  0.034606863 = sum of:
    0.034606863 = weight(_text_:work in 3320) [ClassicSimilarity], result of:
      0.034606863 = score(doc=3320,freq=2.0), product of:
        0.14223081 = queryWeight, product of:
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.03875087 = queryNorm
        0.2433148 = fieldWeight in 3320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.6703904 = idf(docFreq=3060, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
  0.125 = coord(1/8)
```
Abstract

The goal of this research is to develop a generic ontological model for proverbs that unifies potential classification criteria and various characteristics of proverbs to enable their effective retrieval and large-scale analysis. Because proverbs can be described and indexed by multiple characteristics and criteria, we built a multidimensional ontology suitable for proverb classification. To evaluate the effectiveness of the constructed ontology for improving search and retrieval of proverbs, a large-scale user experiment was arranged with 70 users who were asked to search a proverb repository using ontology-based and free-text search interfaces. The comparative analysis of the results shows that the use of this ontology helped to substantially improve the search recall, precision, user satisfaction, and efficiency and to minimize user effort during the search process. A practical contribution of this work is an automated web-based proverb search and retrieval system which incorporates the proposed ontological scheme and an initial corpus of ontology-based annotated proverbs.

Search (55 results, page 1 of 3)

Authors

Years

Languages

Types

Themes