Search (61 results, page 1 of 4)

Lepsky, K.; Müller, T.; Wille, J.: Metadata improvement for image information retrieval (2010) 0.01

0.011066877 = product of:
  0.051645428 = sum of:
    0.018822279 = weight(_text_:system in 4995) [ClassicSimilarity], result of:
      0.018822279 = score(doc=4995,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.2435858 = fieldWeight in 4995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
    0.008269517 = weight(_text_:information in 4995) [ClassicSimilarity], result of:
      0.008269517 = score(doc=4995,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.1920054 = fieldWeight in 4995, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
    0.024553634 = weight(_text_:retrieval in 4995) [ClassicSimilarity], result of:
      0.024553634 = score(doc=4995,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.33085006 = fieldWeight in 4995, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4995)
  0.21428572 = coord(3/14)

Abstract: This paper discusses the goals and results of the research project Perseus-a as an attempt to improve information retrieval of digital images by automatically connecting them with text-based descriptions. The development uses the image collection of prometheus, the distributed digital image archive for research and studies, the articles of the digitized Reallexikon zur Deutschen Kunstgeschichte, art historical terminological resources and classification data, and an open source system for linguistic and statistic automatic indexing called lingo.

Zhitomirsky-Geffet, M.; Prebor, G.; Bloch, O.: Improving proverb search and retrieval with a generic multidimensional ontology (2017) 0.01

0.010909065 = product of:
  0.050908968 = sum of:
    0.016133383 = weight(_text_:system in 3320) [ClassicSimilarity], result of:
      0.016133383 = score(doc=3320,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.20878783 = fieldWeight in 3320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
    0.0050120843 = weight(_text_:information in 3320) [ClassicSimilarity], result of:
      0.0050120843 = score(doc=3320,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.116372846 = fieldWeight in 3320, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
    0.029763501 = weight(_text_:retrieval in 3320) [ClassicSimilarity], result of:
      0.029763501 = score(doc=3320,freq=8.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.40105087 = fieldWeight in 3320, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3320)
  0.21428572 = coord(3/14)

Abstract: The goal of this research is to develop a generic ontological model for proverbs that unifies potential classification criteria and various characteristics of proverbs to enable their effective retrieval and large-scale analysis. Because proverbs can be described and indexed by multiple characteristics and criteria, we built a multidimensional ontology suitable for proverb classification. To evaluate the effectiveness of the constructed ontology for improving search and retrieval of proverbs, a large-scale user experiment was arranged with 70 users who were asked to search a proverb repository using ontology-based and free-text search interfaces. The comparative analysis of the results shows that the use of this ontology helped to substantially improve the search recall, precision, user satisfaction, and efficiency and to minimize user effort during the search process. A practical contribution of this work is an automated web-based proverb search and retrieval system which incorporates the proposed ontological scheme and an initial corpus of ontology-based annotated proverbs.
Source: Journal of the Association for Information Science and Technology. 68(2017) no.1, S.141-153

Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01

0.010034669 = product of:
  0.03512134 = sum of:
    0.015210699 = weight(_text_:system in 1441) [ClassicSimilarity], result of:
      0.015210699 = score(doc=1441,freq=4.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.19684705 = fieldWeight in 1441, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.0033413896 = weight(_text_:information in 1441) [ClassicSimilarity], result of:
      0.0033413896 = score(doc=1441,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.0775819 = fieldWeight in 1441, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.009921167 = weight(_text_:retrieval in 1441) [ClassicSimilarity], result of:
      0.009921167 = score(doc=1441,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.13368362 = fieldWeight in 1441, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.0066480828 = product of:
      0.0132961655 = sum of:
        0.0132961655 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
          0.0132961655 = score(doc=1441,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.15476047 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
  0.2857143 = coord(4/14)

Abstract: This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.
Source: Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Golub, K.; Lykke, M.; Tudhope, D.: Enhancing social tagging with automated keywords from the Dewey Decimal Classification (2014) 0.01
```
0.008378824 = product of:
  0.03910118 = sum of:
    0.013444485 = weight(_text_:system in 2918) [ClassicSimilarity], result of:
      0.013444485 = score(doc=2918,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.17398985 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
    0.004176737 = weight(_text_:information in 2918) [ClassicSimilarity], result of:
      0.004176737 = score(doc=2918,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.09697737 = fieldWeight in 2918, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
    0.021479957 = weight(_text_:retrieval in 2918) [ClassicSimilarity], result of:
      0.021479957 = score(doc=2918,freq=6.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.28943354 = fieldWeight in 2918, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2918)
  0.21428572 = coord(3/14)
```
Abstract

Purpose - The purpose of this paper is to explore the potential of applying the Dewey Decimal Classification (DDC) as an established knowledge organization system (KOS) for enhancing social tagging, with the ultimate purpose of improving subject indexing and information retrieval. Design/methodology/approach - Over 11.000 Intute metadata records in politics were used. Totally, 28 politics students were each given four tasks, in which a total of 60 resources were tagged in two different configurations, one with uncontrolled social tags only and another with uncontrolled social tags as well as suggestions from a controlled vocabulary. The controlled vocabulary was DDC comprising also mappings from the Library of Congress Subject Headings. Findings - The results demonstrate the importance of controlled vocabulary suggestions for indexing and retrieval: to help produce ideas of which tags to use, to make it easier to find focus for the tagging, to ensure consistency and to increase the number of access points in retrieval. The value and usefulness of the suggestions proved to be dependent on the quality of the suggestions, both as to conceptual relevance to the user and as to appropriateness of the terminology. Originality/value - No research has investigated the enhancement of social tagging with suggestions from the DDC, an established KOS, in a user trial, comparing social tagging only and social tagging enhanced with the suggestions. This paper is a final reflection on all aspects of the study.

Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.01

0.0069613922 = product of:
  0.048729744 = sum of:
    0.012277049 = weight(_text_:information in 3187) [ClassicSimilarity], result of:
      0.012277049 = score(doc=3187,freq=12.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.2850541 = fieldWeight in 3187, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3187)
    0.036452696 = weight(_text_:retrieval in 3187) [ClassicSimilarity], result of:
      0.036452696 = score(doc=3187,freq=12.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.49118498 = fieldWeight in 3187, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3187)
  0.14285715 = coord(2/14)

Abstract: The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
Source: Information processing and management. 52(2016) no.5, S.840-854

Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.01
```
0.0056966185 = product of:
  0.02658422 = sum of:
    0.015210699 = weight(_text_:system in 5499) [ClassicSimilarity], result of:
      0.015210699 = score(doc=5499,freq=4.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.19684705 = fieldWeight in 5499, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.0047254385 = weight(_text_:information in 5499) [ClassicSimilarity], result of:
      0.0047254385 = score(doc=5499,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.10971737 = fieldWeight in 5499, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.0066480828 = product of:
      0.0132961655 = sum of:
        0.0132961655 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
          0.0132961655 = score(doc=5499,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.15476047 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
  0.21428572 = coord(3/14)
```
Abstract

Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

Date

20. 1.2015 18:30:22

Footnote

Beitrag in einem Special Issue: Information Science in the German-speaking Countries.

Source

Aslib journal of information management. 71(2019) no.3, S.415-439
Gil-Leiva, I.: SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules (2017) 0.01
```
0.0053854003 = product of:
  0.0376978 = sum of:
    0.022816047 = weight(_text_:system in 3622) [ClassicSimilarity], result of:
      0.022816047 = score(doc=3622,freq=4.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.29527056 = fieldWeight in 3622, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=3622)
    0.014881751 = weight(_text_:retrieval in 3622) [ClassicSimilarity], result of:
      0.014881751 = score(doc=3622,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.20052543 = fieldWeight in 3622, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3622)
  0.14285715 = coord(2/14)
```
Abstract

Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
Grün, S.: Mehrwortbegriffe und Latent Semantic Analysis : Bewertung automatisch extrahierter Mehrwortgruppen mit LSA (2017) 0.00
```
0.004261919 = product of:
  0.02983343 = sum of:
    0.008353474 = weight(_text_:information in 3954) [ClassicSimilarity], result of:
      0.008353474 = score(doc=3954,freq=8.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.19395474 = fieldWeight in 3954, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3954)
    0.021479957 = weight(_text_:retrieval in 3954) [ClassicSimilarity], result of:
      0.021479957 = score(doc=3954,freq=6.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.28943354 = fieldWeight in 3954, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3954)
  0.14285715 = coord(2/14)
```
Abstract

Die vorliegende Studie untersucht das Potenzial von Mehrwortbegriffen für das Information Retrieval. Zielsetzung der Arbeit ist es, intellektuell positiv bewertete Kandidaten mithilfe des Latent Semantic Analysis (LSA) Verfahren höher zu gewichten, als negativ bewertete Kandidaten. Die positiven Kandidaten sollen demnach bei einem Ranking im Information Retrieval bevorzugt werden. Als Kollektion wurde eine Version der sozialwissenschaftlichen GIRT-Datenbank (German Indexing and Retrieval Testdatabase) eingesetzt. Um Kandidaten für Mehrwortbegriffe zu identifizieren wurde die automatische Indexierung Lingo verwendet. Die notwendigen Kernfunktionalitäten waren Lemmatisierung, Identifizierung von Komposita, algorithmische Mehrworterkennung sowie Gewichtung von Indextermen durch das LSA-Modell. Die durch Lingo erkannten und LSAgewichteten Mehrwortkandidaten wurden evaluiert. Zuerst wurde dazu eine intellektuelle Auswahl von positiven und negativen Mehrwortkandidaten vorgenommen. Im zweiten Schritt der Evaluierung erfolgte die Berechnung der Ausbeute, um den Anteil der positiven Mehrwortkandidaten zu erhalten. Im letzten Schritt der Evaluierung wurde auf der Basis der R-Precision berechnet, wie viele positiv bewerteten Mehrwortkandidaten es an der Stelle k des Rankings geschafft haben. Die Ausbeute der positiven Mehrwortkandidaten lag bei durchschnittlich ca. 39%, während die R-Precision einen Durchschnittswert von 54% erzielte. Das LSA-Modell erzielt ein ambivalentes Ergebnis mit positiver Tendenz.

Footnote

Masterarbeit, Studiengang Informationswissenschaft und Sprachtechnologie, Institut für Sprache und Information, Philosophische Fakultät, Heinrich-Heine-Universität Düsseldorf

Imprint

Düsseldorf : Heinrich-Heine-Universität / Philosophische Fakultät / Institut für Sprache und Information
Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.00
```
0.0041108094 = product of:
  0.028775666 = sum of:
    0.024050226 = weight(_text_:system in 1264) [ClassicSimilarity], result of:
      0.024050226 = score(doc=1264,freq=10.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.31124252 = fieldWeight in 1264, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.03125 = fieldNorm(doc=1264)
    0.0047254385 = weight(_text_:information in 1264) [ClassicSimilarity], result of:
      0.0047254385 = score(doc=1264,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.10971737 = fieldWeight in 1264, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=1264)
  0.14285715 = coord(2/14)
```
Abstract

Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).

Source

Journal of the Association for Information Science and Technology. 65(2014) no.5, S.1058-1075
Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.00
```
0.00410204 = product of:
  0.028714279 = sum of:
    0.0072343214 = weight(_text_:information in 3311) [ClassicSimilarity], result of:
      0.0072343214 = score(doc=3311,freq=6.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.16796975 = fieldWeight in 3311, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
    0.021479957 = weight(_text_:retrieval in 3311) [ClassicSimilarity], result of:
      0.021479957 = score(doc=3311,freq=6.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.28943354 = fieldWeight in 3311, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3311)
  0.14285715 = coord(2/14)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.

Series

Advances in information science

Source

Journal of the Association for Information Science and Technology. 67(2016) no.1, S.3-16

Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.00

0.0040277084 = product of:
  0.028193956 = sum of:
    0.021511177 = weight(_text_:system in 3440) [ClassicSimilarity], result of:
      0.021511177 = score(doc=3440,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.27838376 = fieldWeight in 3440, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0625 = fieldNorm(doc=3440)
    0.006682779 = weight(_text_:information in 3440) [ClassicSimilarity], result of:
      0.006682779 = score(doc=3440,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.1551638 = fieldWeight in 3440, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=3440)
  0.14285715 = coord(2/14)

Abstract: The invention of automatic indexing using a keyword-in-context approach has generally been attributed solely to Hans Peter Luhn of IBM. This article shows that credit for this invention belongs equally to Luhn and Herbert Ohlman of the System Development Corporation. It also traces the origins of title derivative automatic indexing, its development and implementation, and current status.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.835-849

Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.00

0.0037893022 = product of:
  0.026525114 = sum of:
    0.006682779 = weight(_text_:information in 466) [ClassicSimilarity], result of:
      0.006682779 = score(doc=466,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.1551638 = fieldWeight in 466, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=466)
    0.019842334 = weight(_text_:retrieval in 466) [ClassicSimilarity], result of:
      0.019842334 = score(doc=466,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.26736724 = fieldWeight in 466, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0625 = fieldNorm(doc=466)
  0.14285715 = coord(2/14)

Abstract: We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.

Busch, D.: Domänenspezifische hybride automatische Indexierung von bibliographischen Metadaten (2019) 0.00
```
0.0037293585 = product of:
  0.026105508 = sum of:
    0.016133383 = weight(_text_:system in 5628) [ClassicSimilarity], result of:
      0.016133383 = score(doc=5628,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.20878783 = fieldWeight in 5628, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.046875 = fieldNorm(doc=5628)
    0.009972124 = product of:
      0.019944249 = sum of:
        0.019944249 = weight(_text_:22 in 5628) [ClassicSimilarity], result of:
          0.019944249 = score(doc=5628,freq=2.0), product of:
            0.085914485 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02453417 = queryNorm
            0.23214069 = fieldWeight in 5628, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=5628)
      0.5 = coord(1/2)
  0.14285715 = coord(2/14)
```
Abstract

Im Fraunhofer-Informationszentrum Raum und Bau (IRB) wird Fachliteratur im Bereich Planen und Bauen bibliographisch erschlossen. Die daraus resultierenden Dokumente (Metadaten-Einträge) werden u.a. bei der Produktion der bibliographischen Datenbanken des IRB verwendet. In Abb. 1 ist ein Dokument dargestellt, das einen Zeitschriftenartikel beschreibt. Die Dokumente werden mit Deskriptoren von einer Nomenklatur (Schlagwortliste IRB) indexiert. Ein Deskriptor ist "eine Benennung., die für sich allein verwendbar, eindeutig zur Inhaltskennzeichnung geeignet und im betreffenden Dokumentationssystem zugelassen ist". Momentan wird die Indexierung intellektuell von menschlichen Experten durchgeführt. Die intellektuelle Indexierung ist zeitaufwendig und teuer. Eine Lösung des Problems besteht in der automatischen Indexierung, bei der die Zuordnung von Deskriptoren durch ein Computerprogramm erfolgt. Solche Computerprogramme werden im Folgenden auch als Klassifikatoren bezeichnet. In diesem Beitrag geht es um ein System zur automatischen Indexierung von deutschsprachigen Dokumenten im Bereich Bauwesen mit Deskriptoren aus der Schlagwortliste IRB.

Source

B.I.T.online. 22(2019) H.6, S.465-469
Benson, A.C.: Image descriptions and their relational expressions : a review of the literature and the issues (2015) 0.00
```
0.0031385582 = product of:
  0.021969907 = sum of:
    0.0070881573 = weight(_text_:information in 1867) [ClassicSimilarity], result of:
      0.0070881573 = score(doc=1867,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.16457605 = fieldWeight in 1867, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1867)
    0.014881751 = weight(_text_:retrieval in 1867) [ClassicSimilarity], result of:
      0.014881751 = score(doc=1867,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.20052543 = fieldWeight in 1867, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1867)
  0.14285715 = coord(2/14)
```
Abstract

Purpose - The purpose of this paper is to survey the treatment of relationships, relationship expressions and the ways in which they manifest themselves in image descriptions. Design/methodology/approach - The term "relationship" is construed in the broadest possible way to include spatial relationships ("to the right of"), temporal ("in 1936," "at noon"), meronymic ("part of"), and attributive ("has color," "has dimension"). The intentions of these vaguely delimited categories with image information, image creation, and description in libraries and archives is complex and in need of explanation. Findings - The review brings into question many generally held beliefs about the relationship problem such as the belief that the semantics of relationships are somehow embedded in the relationship term itself and that image search and retrieval solutions can be found through refinement of word-matching systems. Originality/value - This review has no hope of systematically examining all evidence in all disciplines pertaining to this topic. It instead focusses on a general description of a theoretical treatment in Library and Information Science.
Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.00
```
0.0031139941 = product of:
  0.021797959 = sum of:
    0.013444485 = weight(_text_:system in 2895) [ClassicSimilarity], result of:
      0.013444485 = score(doc=2895,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.17398985 = fieldWeight in 2895, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2895)
    0.008353474 = weight(_text_:information in 2895) [ClassicSimilarity], result of:
      0.008353474 = score(doc=2895,freq=8.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.19395474 = fieldWeight in 2895, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2895)
  0.14285715 = coord(2/14)
```
Abstract

The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.

Source

Journal of the Association for Information Science and Technology. 67(2016) no.5, S.1138-1152
Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.00
```
0.00310215 = product of:
  0.021715049 = sum of:
    0.004176737 = weight(_text_:information in 3667) [ClassicSimilarity], result of:
      0.004176737 = score(doc=3667,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.09697737 = fieldWeight in 3667, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
    0.017538311 = weight(_text_:retrieval in 3667) [ClassicSimilarity], result of:
      0.017538311 = score(doc=3667,freq=4.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.23632148 = fieldWeight in 3667, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3667)
  0.14285715 = coord(2/14)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.

Series

Communications in computer and information science; 544
Kempf, A.O.: Automatische Inhaltserschließung in der Fachinformation (2013) 0.00
```
0.0026154653 = product of:
  0.018308256 = sum of:
    0.005906798 = weight(_text_:information in 905) [ClassicSimilarity], result of:
      0.005906798 = score(doc=905,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.13714671 = fieldWeight in 905, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=905)
    0.012401459 = weight(_text_:retrieval in 905) [ClassicSimilarity], result of:
      0.012401459 = score(doc=905,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.16710453 = fieldWeight in 905, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=905)
  0.14285715 = coord(2/14)
```
Abstract

Der Artikel basiert auf einer Masterarbeit mit dem Titel "Automatische Indexierung in der sozialwissenschaftlichen Fachinformation. Eine Evaluationsstudie zur maschinellen Erschließung für die Datenbank SOLIS" (Kempf 2012), die im Rahmen des Aufbaustudiengangs Bibliotheks- und Informationswissenschaft an der Humboldt- Universität zu Berlin am Lehrstuhl Information Retrieval verfasst wurde. Auf der Grundlage des Schalenmodells zur Inhaltserschließung in der Fachinformation stellt der Artikel Evaluationsergebnisse eines automatischen Erschließungsverfahrens für den Einsatz in der sozialwissenschaftlichen Fachinformation vor. Ausgehend von dem von Krause beschriebenen Anwendungsszenario, wonach SOLIS-Datenbestände (Sozialwissenschaftliches Literaturinformationssystem) von geringerer Relevanz automatisch erschlossen werden sollten, wurden auf dieser Dokumentgrundlage zwei Testreihen mit der Indexierungssoftware MindServer der Firma Recommind durchgeführt. Neben den Auswirkungen allgemeiner Systemeinstellungen in der ersten Testreihe wurde in der zweiten Testreihe die Indexierungsleistung der Software für die Rand- und die Kernbereiche der Literaturdatenbank miteinander verglichen. Für letztere Testreihe wurden für beide Bereiche der Datenbank spezifische Versionen der Indexierungssoftware aufgebaut, die anhand von Dokumentkorpora aus den entsprechenden Bereichen trainiert wurden. Die Ergebnisse der Evaluation, die auf der Grundlage intellektuell generierter Vergleichsdaten erfolgt, weisen auf Unterschiede in der Indexierungsleistung zwischen Rand- und Kernbereichen hin, die einerseits gegen den Einsatz automatischer Indexierungsverfahren in den Randbereichen sprechen. Andererseits deutet sich an, dass sich die Indexierungsresultate durch den Aufbau fachteilgebietsspezifischer Trainingsmengen verbessern lassen.

Source

Information - Wissenschaft und Praxis. 64(2013) H.2/3, S.96-106
Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.00
```
0.0026154653 = product of:
  0.018308256 = sum of:
    0.005906798 = weight(_text_:information in 3151) [ClassicSimilarity], result of:
      0.005906798 = score(doc=3151,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.13714671 = fieldWeight in 3151, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
    0.012401459 = weight(_text_:retrieval in 3151) [ClassicSimilarity], result of:
      0.012401459 = score(doc=3151,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.16710453 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
  0.14285715 = coord(2/14)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.

Source

Journal of the Association for Information Science and Technology. 67(2016) no.11, S.2667-2683
Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.00
```
0.0026154653 = product of:
  0.018308256 = sum of:
    0.005906798 = weight(_text_:information in 4005) [ClassicSimilarity], result of:
      0.005906798 = score(doc=4005,freq=4.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.13714671 = fieldWeight in 4005, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4005)
    0.012401459 = weight(_text_:retrieval in 4005) [ClassicSimilarity], result of:
      0.012401459 = score(doc=4005,freq=2.0), product of:
        0.07421378 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.02453417 = queryNorm
        0.16710453 = fieldWeight in 4005, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4005)
  0.14285715 = coord(2/14)
```
Abstract

Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1776-1784
Weiner, U.: Vor uns die Dokumentenflut oder Automatische Indexierung als notwendige und sinnvolle Ergänzung zur intellektuellen Sacherschließung (2012) 0.00
```
0.0025173177 = product of:
  0.017621223 = sum of:
    0.013444485 = weight(_text_:system in 598) [ClassicSimilarity], result of:
      0.013444485 = score(doc=598,freq=2.0), product of:
        0.07727166 = queryWeight, product of:
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.02453417 = queryNorm
        0.17398985 = fieldWeight in 598, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1495528 = idf(docFreq=5152, maxDocs=44218)
          0.0390625 = fieldNorm(doc=598)
    0.004176737 = weight(_text_:information in 598) [ClassicSimilarity], result of:
      0.004176737 = score(doc=598,freq=2.0), product of:
        0.04306919 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02453417 = queryNorm
        0.09697737 = fieldWeight in 598, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=598)
  0.14285715 = coord(2/14)
```
Abstract

Vor dem Hintergrund veränderter Ansprüche der Bibliotheksbenutzer an Recherchemöglichkeiten - weg vom klassischen Online-Katalog hin zum "One-Stop-Shop" mit Funktionalitäten wie thematisches Browsing, Relevanzranking und dergleichen mehr - einerseits und der notwendigen Bearbeitung von Massendaten (Stichwort Dokumentenflut) andererseits rücken Systeme zur automatischen Indexierung wieder verstärkt in den Mittelpunkt des Interesses. Da in Österreich die Beschäftigung mit diesem Thema im Bibliotheksbereich bislang nur sehr selektiv, bezogen auf wenige konkrete Projekte, erfolgte, wird zuerst ein allgemeiner theoretischer Überblick über die unterschiedlichen Verfahrensansätze der automatischen Indexierung geboten. Im nächsten Schritt werden mit der IDX-basierten Indexierungssoftware MILOS (mit den Teilprojekten MILOS I, MILOS II und KASCADE) und dem modularen System intelligentCAPTURE (mit der integrierten Indexierungssoftware AUTINDEX) die bis vor wenigen Jahren im deutschsprachigen Raum einzigen im Praxiseinsatz befindlichen automatischen Indexierungssysteme vorgestellt. Mit zunehmender Notwendigkeit, neue Wege der inhaltlichen Erschließung zu beschreiten, wurden in den vergangenen 5 - 6 Jahren zahlreiche Softwareentwicklungen auf ihre Einsatzmöglichkeit im Bibliotheksbereich hin getestet. Stellvertretend für diese in Entwicklung befindlichen Systeme zur automatischen inhaltlichen Erschließung wird das Projekt PETRUS, welches in den Jahren 2009 - 2011 an der DNB durchgeführt wurde und die Komponenten PICA Match&Merge sowie die Extraction Platform der Firma Averbis beinhaltet, vorgestellt.

Footnote

Wien, Univ., Lehrgang Library and Information Studies, Master-Thesis, 2012

Search (61 results, page 1 of 4)

Authors

Languages

Types

Themes

Classifications