Search (53 results, page 1 of 3)

Munkelt, J.: Erstellung einer DNB-Retrieval-Testkollektion (2018) 0.03

0.027426437 = product of:
  0.11754187 = sum of:
    0.020072797 = weight(_text_:und in 4310) [ClassicSimilarity], result of:
      0.020072797 = score(doc=4310,freq=12.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.41987535 = fieldWeight in 4310, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
    0.004365201 = weight(_text_:in in 4310) [ClassicSimilarity], result of:
      0.004365201 = score(doc=4310,freq=4.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.14877784 = fieldWeight in 4310, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
    0.020072797 = weight(_text_:und in 4310) [ClassicSimilarity], result of:
      0.020072797 = score(doc=4310,freq=12.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.41987535 = fieldWeight in 4310, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
    0.023686372 = weight(_text_:bibliotheken in 4310) [ClassicSimilarity], result of:
      0.023686372 = score(doc=4310,freq=2.0), product of:
        0.08127756 = queryWeight, product of:
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.021569785 = queryNorm
        0.29142573 = fieldWeight in 4310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
    0.023686372 = weight(_text_:bibliotheken in 4310) [ClassicSimilarity], result of:
      0.023686372 = score(doc=4310,freq=2.0), product of:
        0.08127756 = queryWeight, product of:
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.021569785 = queryNorm
        0.29142573 = fieldWeight in 4310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
    0.023686372 = weight(_text_:bibliotheken in 4310) [ClassicSimilarity], result of:
      0.023686372 = score(doc=4310,freq=2.0), product of:
        0.08127756 = queryWeight, product of:
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.021569785 = queryNorm
        0.29142573 = fieldWeight in 4310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.768121 = idf(docFreq=2775, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
    0.0019719584 = weight(_text_:s in 4310) [ClassicSimilarity], result of:
      0.0019719584 = score(doc=4310,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08408674 = fieldWeight in 4310, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4310)
  0.23333333 = coord(7/30)

Abstract: Seit Herbst 2017 findet in der Deutschen Nationalbibliothek die Inhaltserschließung bestimmter Medienwerke rein maschinell statt. Die Qualität dieses Verfahrens, das die Prozessorganisation von Bibliotheken maßgeblich prägen kann, wird unter Fachleuten kontrovers diskutiert. Ihre Standpunkte werden zunächst hinreichend erläutert, ehe die Notwendigkeit einer Qualitätsprüfung des Verfahrens und dessen Grundlagen dargelegt werden. Zentraler Bestandteil einer künftigen Prüfung ist eine Testkollektion. Ihre Erstellung und deren Dokumentation steht im Fokus dieser Arbeit. In diesem Zusammenhang werden auch die Entstehungsgeschichte und Anforderungen an gelungene Testkollektionen behandelt. Abschließend wird ein Retrievaltest durchgeführt, der die Einsatzfähigkeit der erarbeiteten Testkollektion belegt. Seine Ergebnisse dienen ausschließlich der Funktionsüberprüfung. Eine Qualitätsbeurteilung maschineller Inhaltserschließung im Speziellen sowie im Allgemeinen findet nicht statt und ist nicht Ziel der Ausarbeitung.
Content: Bachelorarbeit, Bibliothekswissenschaften, Fakultät für Informations- und Kommunikationswissenschaften, Technische Hochschule Köln
Imprint: Köln : Technische Hochschule, Fakultät für Informations- und Kommunikationswissenschaften
Pages: II, 79 S

Tavakolizadeh-Ravari, M.: Analysis of the long term dynamics in thesaurus developments and its consequences (2017) 0.01

0.010460943 = product of:
  0.06276566 = sum of:
    0.014807926 = weight(_text_:und in 3081) [ClassicSimilarity], result of:
      0.014807926 = score(doc=3081,freq=20.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.3097467 = fieldWeight in 3081, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
    0.027356375 = weight(_text_:informationswissenschaft in 3081) [ClassicSimilarity], result of:
      0.027356375 = score(doc=3081,freq=4.0), product of:
        0.09716552 = queryWeight, product of:
          4.504705 = idf(docFreq=1328, maxDocs=44218)
          0.021569785 = queryNorm
        0.28154406 = fieldWeight in 3081, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.504705 = idf(docFreq=1328, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
    0.0046665967 = weight(_text_:in in 3081) [ClassicSimilarity], result of:
      0.0046665967 = score(doc=3081,freq=14.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.15905021 = fieldWeight in 3081, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
    0.014807926 = weight(_text_:und in 3081) [ClassicSimilarity], result of:
      0.014807926 = score(doc=3081,freq=20.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.3097467 = fieldWeight in 3081, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
    0.0011268335 = weight(_text_:s in 3081) [ClassicSimilarity], result of:
      0.0011268335 = score(doc=3081,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.048049565 = fieldWeight in 3081, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.03125 = fieldNorm(doc=3081)
  0.16666667 = coord(5/30)

Abstract: Die Arbeit analysiert die dynamische Entwicklung und den Gebrauch von Thesaurusbegriffen. Zusätzlich konzentriert sie sich auf die Faktoren, die die Zahl von Indexbegriffen pro Dokument oder Zeitschrift beeinflussen. Als Untersuchungsobjekt dienten der MeSH und die entsprechende Datenbank "MEDLINE". Die wichtigsten Konsequenzen sind: 1. Der MeSH-Thesaurus hat sich durch drei unterschiedliche Phasen jeweils logarithmisch entwickelt. Solch einen Thesaurus sollte folgenden Gleichung folgen: "T = 3.076,6 Ln (d) - 22.695 + 0,0039d" (T = Begriffe, Ln = natürlicher Logarithmus und d = Dokumente). Um solch einen Thesaurus zu konstruieren, muss man demnach etwa 1.600 Dokumente von unterschiedlichen Themen des Bereiches des Thesaurus haben. Die dynamische Entwicklung von Thesauri wie MeSH erfordert die Einführung eines neuen Begriffs pro Indexierung von 256 neuen Dokumenten. 2. Die Verteilung der Thesaurusbegriffe erbrachte drei Kategorien: starke, normale und selten verwendete Headings. Die letzte Gruppe ist in einer Testphase, während in der ersten und zweiten Kategorie die neu hinzukommenden Deskriptoren zu einem Thesauruswachstum führen. 3. Es gibt ein logarithmisches Verhältnis zwischen der Zahl von Index-Begriffen pro Aufsatz und dessen Seitenzahl für die Artikeln zwischen einer und einundzwanzig Seiten. 4. Zeitschriftenaufsätze, die in MEDLINE mit Abstracts erscheinen erhalten fast zwei Deskriptoren mehr. 5. Die Findablity der nicht-englisch sprachigen Dokumente in MEDLINE ist geringer als die englische Dokumente. 6. Aufsätze der Zeitschriften mit einem Impact Factor 0 bis fünfzehn erhalten nicht mehr Indexbegriffe als die der anderen von MEDINE erfassten Zeitschriften. 7. In einem Indexierungssystem haben unterschiedliche Zeitschriften mehr oder weniger Gewicht in ihrem Findability. Die Verteilung der Indexbegriffe pro Seite hat gezeigt, dass es bei MEDLINE drei Kategorien der Publikationen gibt. Außerdem gibt es wenige stark bevorzugten Zeitschriften."
Footnote: Dissertation, Humboldt-Universität zu Berlin - Institut für Bibliotheks- und Informationswissenschaft.
Imprint: Berlin : Humboldt-Universität zu Berlin / Institut für Bibliotheks- und Informationswissenschaft
Pages: 128 S
Theme: Konzeption und Anwendung des Prinzips Thesaurus

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2014) 0.01

0.0059930957 = product of:
  0.059930958 = sum of:
    0.0053462577 = weight(_text_:in in 1969) [ClassicSimilarity], result of:
      0.0053462577 = score(doc=1969,freq=6.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.1822149 = fieldWeight in 1969, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.05261274 = weight(_text_:deutsche in 1969) [ClassicSimilarity], result of:
      0.05261274 = score(doc=1969,freq=4.0), product of:
        0.10186133 = queryWeight, product of:
          4.7224083 = idf(docFreq=1068, maxDocs=44218)
          0.021569785 = queryNorm
        0.5165134 = fieldWeight in 1969, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.7224083 = idf(docFreq=1068, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
    0.0019719584 = weight(_text_:s in 1969) [ClassicSimilarity], result of:
      0.0019719584 = score(doc=1969,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08408674 = fieldWeight in 1969, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1969)
  0.1 = coord(3/30)

Abstract: The German Integrated Authority File (Gemeinsame Normdatei, GND), provides a broad controlled vocabulary for indexing documents on all subjects. Traditionally used for intellectual subject cataloging primarily for books, the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developing and implementing procedures for automated assignment of subject headings for online publications. This project, its results, and problems are outlined in this article.
Footnote: Contribution in a special issue "Beyond libraries: Subject metadata in the digital environment and Semantic Web" - Enthält Beiträge der gleichnamigen IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn.
Source: Cataloging and classification quarterly. 52(2014) no.1, S.102-109

Junger, U.: Can indexing be automated? : the example of the Deutsche Nationalbibliothek (2012) 0.01

0.00589499 = product of:
  0.058949903 = sum of:
    0.004365201 = weight(_text_:in in 1717) [ClassicSimilarity], result of:
      0.004365201 = score(doc=1717,freq=4.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.14877784 = fieldWeight in 1717, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.05261274 = weight(_text_:deutsche in 1717) [ClassicSimilarity], result of:
      0.05261274 = score(doc=1717,freq=4.0), product of:
        0.10186133 = queryWeight, product of:
          4.7224083 = idf(docFreq=1068, maxDocs=44218)
          0.021569785 = queryNorm
        0.5165134 = fieldWeight in 1717, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.7224083 = idf(docFreq=1068, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
    0.0019719584 = weight(_text_:s in 1717) [ClassicSimilarity], result of:
      0.0019719584 = score(doc=1717,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08408674 = fieldWeight in 1717, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1717)
  0.1 = coord(3/30)

Abstract: The German subject headings authority file (Schlagwortnormdatei/SWD) provides a broad controlled vocabulary for indexing documents of all subjects. Traditionally used for intellectual subject cataloguing primarily of books the Deutsche Nationalbibliothek (DNB, German National Library) has been working on developping and implementing procedures for automated assignment of subject headings for online publications. This project, its results and problems are sketched in the paper.
Content: Beitrag für die Tagung: Beyond libraries - subject metadata in the digital environment and semantic web. IFLA Satellite Post-Conference, 17-18 August 2012, Tallinn. Vgl.: http://http://www.nlib.ee/index.php?id=17763.
Source: Cataloguing & Classification Quarterly 52(2014) no.1, S.102-109

Siebenkäs, A.; Markscheffel, B.: Conception of a workflow for the semi-automatic construction of a thesaurus for the German printing industry (2015) 0.00

0.0030302042 = product of:
  0.02272653 = sum of:
    0.008194685 = weight(_text_:und in 2091) [ClassicSimilarity], result of:
      0.008194685 = score(doc=2091,freq=2.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.17141339 = fieldWeight in 2091, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2091)
    0.004365201 = weight(_text_:in in 2091) [ClassicSimilarity], result of:
      0.004365201 = score(doc=2091,freq=4.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.14877784 = fieldWeight in 2091, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2091)
    0.008194685 = weight(_text_:und in 2091) [ClassicSimilarity], result of:
      0.008194685 = score(doc=2091,freq=2.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.17141339 = fieldWeight in 2091, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2091)
    0.0019719584 = weight(_text_:s in 2091) [ClassicSimilarity], result of:
      0.0019719584 = score(doc=2091,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08408674 = fieldWeight in 2091, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2091)
  0.13333334 = coord(4/30)

Abstract: During the BMWI granted project "Print-IT", the need of a thesaurus based uniform and consistent language for the German printing industry became evident. In this paper we introduce a semi-automatic construction approach for such a thesaurus and present a workflow which supports users to generate thesaurus typical information structures from relevant digitalized resources with the help of common IT-tools.
Pages: S.217-229
Source: Re:inventing information science in the networked society: Proceedings of the 14th International Symposium on Information Science, Zadar/Croatia, 19th-21st May 2015. Eds.: F. Pehar, C. Schloegl u. C. Wolff
Theme: Konzeption und Anwendung des Prinzips Thesaurus

Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.00

0.0022048783 = product of:
  0.016536586 = sum of:
    0.004682677 = weight(_text_:und in 1016) [ClassicSimilarity], result of:
      0.004682677 = score(doc=1016,freq=2.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.09795051 = fieldWeight in 1016, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
    0.0055776495 = weight(_text_:in in 1016) [ClassicSimilarity], result of:
      0.0055776495 = score(doc=1016,freq=20.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.19010136 = fieldWeight in 1016, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
    0.004682677 = weight(_text_:und in 1016) [ClassicSimilarity], result of:
      0.004682677 = score(doc=1016,freq=2.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.09795051 = fieldWeight in 1016, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
    0.0015935833 = weight(_text_:s in 1016) [ClassicSimilarity], result of:
      0.0015935833 = score(doc=1016,freq=4.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.06795235 = fieldWeight in 1016, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.03125 = fieldNorm(doc=1016)
  0.13333334 = coord(4/30)

Abstract: Relationships between terms and features are an essential component of thesauri, ontologies, and a range of controlled vocabularies. In this article, we describe ways to identify important concepts in documents using the relationships in a thesaurus or other vocabulary structures. We introduce a methodology for the analysis and modeling of the indexing process based on a weighted random walk algorithm. The primary goal of this research is the analysis of the contribution of thesaurus structure to the indexing process. The resulting models are evaluated in the context of automatic subject indexing using four collections of documents pre-indexed with 4 different thesauri (AGROVOC [UN Food and Agriculture Organization], high-energy physics taxonomy [HEP], National Agricultural Library Thesaurus [NALT], and medical subject headings [MeSH]). We also introduce a thesaurus-centric matching algorithm intended to improve the quality of candidate concepts. In all cases, the weighted random walk improves automatic indexing performance over matching alone with an increase in average precision (AP) of 9% for HEP, 11% for MeSH, 35% for NALT, and 37% for AGROVOC. The results of the analysis support our hypothesis that subject indexing is in part a browsing process, and that using the vocabulary and its structure in a thesaurus contributes to the indexing process. The amount that the vocabulary structure contributes was found to differ among the 4 thesauri, possibly due to the vocabulary used in the corresponding thesauri and the structural relationships between the terms. Each of the thesauri and the manual indexing associated with it is characterized using the methods developed here.
Content: Korrektur einer Referenz in: JASIST 64(2013) no.8, S.1757.
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1330-1344
Theme: Konzeption und Anwendung des Prinzips Thesaurus

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.00

0.0021838644 = product of:
  0.021838643 = sum of:
    0.004409519 = weight(_text_:in in 2759) [ClassicSimilarity], result of:
      0.004409519 = score(doc=2759,freq=2.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.15028831 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.0028170836 = weight(_text_:s in 2759) [ClassicSimilarity], result of:
      0.0028170836 = score(doc=2759,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.120123915 = fieldWeight in 2759, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.0146120405 = product of:
      0.029224081 = sum of:
        0.029224081 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.029224081 = score(doc=2759,freq=2.0), product of:
            0.07553371 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.021569785 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.1 = coord(3/30)

Date: 1. 2.2016 18:25:22
Pages: S.167-181
Series: Lecture notes in computer science ; 9398

Karpathy, A.; Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions (2015) 0.00

0.0021577757 = product of:
  0.021577757 = sum of:
    0.0070240153 = weight(_text_:und in 1868) [ClassicSimilarity], result of:
      0.0070240153 = score(doc=1868,freq=2.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.14692576 = fieldWeight in 1868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=1868)
    0.007529726 = product of:
      0.022589177 = sum of:
        0.022589177 = weight(_text_:l in 1868) [ClassicSimilarity], result of:
          0.022589177 = score(doc=1868,freq=2.0), product of:
            0.0857324 = queryWeight, product of:
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.021569785 = queryNorm
            0.26348472 = fieldWeight in 1868, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.046875 = fieldNorm(doc=1868)
      0.33333334 = coord(1/3)
    0.0070240153 = weight(_text_:und in 1868) [ClassicSimilarity], result of:
      0.0070240153 = score(doc=1868,freq=2.0), product of:
        0.04780656 = queryWeight, product of:
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.021569785 = queryNorm
        0.14692576 = fieldWeight in 1868, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.216367 = idf(docFreq=13101, maxDocs=44218)
          0.046875 = fieldNorm(doc=1868)
  0.1 = coord(3/30)

Content: Vgl. auch: http://cs.stanford.edu/people/karpathy/cvpr2015.pdf und http://cs.stanford.edu/people/karpathy/deepimagesent/. Vgl. auch: https://news.ycombinator.com/item?id=8621658.

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.00
```
0.0013331149 = product of:
  0.013331149 = sum of:
    0.0063594985 = weight(_text_:in in 1442) [ClassicSimilarity], result of:
      0.0063594985 = score(doc=1442,freq=26.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.2167489 = fieldWeight in 1442, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.0011268335 = weight(_text_:s in 1442) [ClassicSimilarity], result of:
      0.0011268335 = score(doc=1442,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.048049565 = fieldWeight in 1442, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.005844816 = product of:
      0.011689632 = sum of:
        0.011689632 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
          0.011689632 = score(doc=1442,freq=2.0), product of:
            0.07553371 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.021569785 = queryNorm
            0.15476047 = fieldWeight in 1442, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1442)
      0.5 = coord(1/2)
  0.1 = coord(3/30)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Pages

S.327-334

Series

Advances in knowledge organization; vol. 14

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.00

0.0012613306 = product of:
  0.012613306 = sum of:
    0.004929992 = weight(_text_:in in 3232) [ClassicSimilarity], result of:
      0.004929992 = score(doc=3232,freq=10.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.16802745 = fieldWeight in 3232, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3232)
    0.006274772 = product of:
      0.018824315 = sum of:
        0.018824315 = weight(_text_:l in 3232) [ClassicSimilarity], result of:
          0.018824315 = score(doc=3232,freq=2.0), product of:
            0.0857324 = queryWeight, product of:
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.021569785 = queryNorm
            0.2195706 = fieldWeight in 3232, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3232)
      0.33333334 = coord(1/3)
    0.0014085418 = weight(_text_:s in 3232) [ClassicSimilarity], result of:
      0.0014085418 = score(doc=3232,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.060061958 = fieldWeight in 3232, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3232)
  0.1 = coord(3/30)

Abstract: Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.
Source: Journal of the Association for Information Science and Technology. 67(2016) no.12, S.3073-3091

Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.00
```
0.0012263072 = product of:
  0.012263073 = sum of:
    0.0052914224 = weight(_text_:in in 5499) [ClassicSimilarity], result of:
      0.0052914224 = score(doc=5499,freq=18.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.18034597 = fieldWeight in 5499, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.0011268335 = weight(_text_:s in 5499) [ClassicSimilarity], result of:
      0.0011268335 = score(doc=5499,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.048049565 = fieldWeight in 5499, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.03125 = fieldNorm(doc=5499)
    0.005844816 = product of:
      0.011689632 = sum of:
        0.011689632 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
          0.011689632 = score(doc=5499,freq=2.0), product of:
            0.07553371 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.021569785 = queryNorm
            0.15476047 = fieldWeight in 5499, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
      0.5 = coord(1/2)
  0.1 = coord(3/30)
```
Abstract

Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.

Date

20. 1.2015 18:30:22

Footnote

Beitrag in einem Special Issue: Information Science in the German-speaking Countries.

Source

Aslib journal of information management. 71(2019) no.3, S.415-439
Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.00
```
0.0011638247 = product of:
  0.0116382465 = sum of:
    0.0046665967 = weight(_text_:in in 1441) [ClassicSimilarity], result of:
      0.0046665967 = score(doc=1441,freq=14.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.15905021 = fieldWeight in 1441, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.0011268335 = weight(_text_:s in 1441) [ClassicSimilarity], result of:
      0.0011268335 = score(doc=1441,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.048049565 = fieldWeight in 1441, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.03125 = fieldNorm(doc=1441)
    0.005844816 = product of:
      0.011689632 = sum of:
        0.011689632 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
          0.011689632 = score(doc=1441,freq=2.0), product of:
            0.07553371 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.021569785 = queryNorm
            0.15476047 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
      0.5 = coord(1/2)
  0.1 = coord(3/30)
```
Abstract

This paper presents a research on syntactic structures known as noun phrases (NP) being applied to increase the effectiveness and efficiency of the mechanisms for the document's classification. Our hypothesis is the fact that the NP can be used instead of single words as a semantic aggregator to reduce the number of words that will be used for the classification system without losing its semantic coverage, increasing its efficiency. The experiment divided the documents classification process in three phases: a) NP preprocessing b) system training; and c) classification experiments. In the first step, a corpus of digitalized texts was submitted to a natural language processing platform1 in which the part-of-speech tagging was done, and them PERL scripts pertaining to the PALAVRAS package were used to extract the Noun Phrases. The preprocessing also involved the tasks of a) removing NP low meaning pre-modifiers, as quantifiers; b) identification of synonyms and corresponding substitution for common hyperonyms; and c) stemming of the relevant words contained in the NP, for similitude checking with other NPs. The first tests with the resulting documents have demonstrated its effectiveness. We have compared the structural similarity of the documents before and after the whole pre-processing steps of phase one. The texts maintained the consistency with the original and have kept the readability. The second phase involves submitting the modified documents to a SVM algorithm to identify clusters and classify the documents. The classification rules are to be established using a machine learning approach. Finally, tests will be conducted to check the effectiveness of the whole process.

Pages

S.320-326

Series

Advances in knowledge organization; vol. 14

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Markoff, J.: Researchers announce advance in image-recognition software (2014) 0.00
```
9.012275E-4 = product of:
  0.009012275 = sum of:
    0.005170619 = weight(_text_:in in 1875) [ClassicSimilarity], result of:
      0.005170619 = score(doc=1875,freq=44.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.17622867 = fieldWeight in 1875, product of:
          6.6332498 = tf(freq=44.0), with freq of:
            44.0 = termFreq=44.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
    0.003137386 = product of:
      0.009412157 = sum of:
        0.009412157 = weight(_text_:l in 1875) [ClassicSimilarity], result of:
          0.009412157 = score(doc=1875,freq=2.0), product of:
            0.0857324 = queryWeight, product of:
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.021569785 = queryNorm
            0.1097853 = fieldWeight in 1875, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9746525 = idf(docFreq=2257, maxDocs=44218)
              0.01953125 = fieldNorm(doc=1875)
      0.33333334 = coord(1/3)
    7.042709E-4 = weight(_text_:s in 1875) [ClassicSimilarity], result of:
      7.042709E-4 = score(doc=1875,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.030030979 = fieldWeight in 1875, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1875)
  0.1 = coord(3/30)
```
Content

"Until now, so-called computer vision has largely been limited to recognizing individual objects. The new software, described on Monday by researchers at Google and at Stanford University, teaches itself to identify entire scenes: a group of young men playing Frisbee, for example, or a herd of elephants marching on a grassy plain. The software then writes a caption in English describing the picture. Compared with human observations, the researchers found, the computer-written descriptions are surprisingly accurate. The advances may make it possible to better catalog and search for the billions of images and hours of video available online, which are often poorly described and archived. At the moment, search engines like Google rely largely on written language accompanying an image or video to ascertain what it contains. "I consider the pixel data in images and video to be the dark matter of the Internet," said Fei-Fei Li, director of the Stanford Artificial Intelligence Laboratory, who led the research with Andrej Karpathy, a graduate student. "We are now starting to illuminate it." Dr. Li and Mr. Karpathy published their research as a Stanford University technical report. The Google team published their paper on arXiv.org, an open source site hosted by Cornell University.
In the longer term, the new research may lead to technology that helps the blind and robots navigate natural environments. But it also raises chilling possibilities for surveillance. During the past 15 years, video cameras have been placed in a vast number of public and private spaces. In the future, the software operating the cameras will not only be able to identify particular humans via facial recognition, experts say, but also identify certain types of behavior, perhaps even automatically alerting authorities. Two years ago Google researchers created image-recognition software and presented it with 10 million images taken from YouTube videos. Without human guidance, the program trained itself to recognize cats - a testament to the number of cat videos on YouTube. Current artificial intelligence programs in new cars already can identify pedestrians and bicyclists from cameras positioned atop the windshield and can stop the car automatically if the driver does not take action to avoid a collision. But "just single object recognition is not very beneficial," said Ali Farhadi, a computer scientist at the University of Washington who has published research on software that generates sentences from digital pictures. "We've focused on objects, and we've ignored verbs," he said, adding that these programs do not grasp what is going on in an image. Both the Google and Stanford groups tackled the problem by refining software programs known as neural networks, inspired by our understanding of how the brain works. Neural networks can "train" themselves to discover similarities and patterns in data, even when their human creators do not know the patterns exist.
In living organisms, webs of neurons in the brain vastly outperform even the best computer-based networks in perception and pattern recognition. But by adopting some of the same architecture, computers are catching up, learning to identify patterns in speech and imagery with increasing accuracy. The advances are apparent to consumers who use Apple's Siri personal assistant, for example, or Google's image search. Both groups of researchers employed similar approaches, weaving together two types of neural networks, one focused on recognizing images and the other on human language. In both cases the researchers trained the software with relatively small sets of digital images that had been annotated with descriptive sentences by humans. After the software programs "learned" to see patterns in the pictures and description, the researchers turned them on previously unseen images. The programs were able to identify objects and actions with roughly double the accuracy of earlier efforts, although still nowhere near human perception capabilities. "I was amazed that even with the small amount of training data that we were able to do so well," said Oriol Vinyals, a Google computer scientist who wrote the paper with Alexander Toshev, Samy Bengio and Dumitru Erhan, members of the Google Brain project. "The field is just starting, and we will see a lot of increases."
Computer vision specialists said that despite the improvements, these software systems had made only limited progress toward the goal of digitally duplicating human vision and, even more elusive, understanding. "I don't know that I would say this is 'understanding' in the sense we want," said John R. Smith, a senior manager at I.B.M.'s T.J. Watson Research Center in Yorktown Heights, N.Y. "I think even the ability to generate language here is very limited." But the Google and Stanford teams said that they expect to see significant increases in accuracy as they improve their software and train these programs with larger sets of annotated images. A research group led by Tamara L. Berg, a computer scientist at the University of North Carolina at Chapel Hill, is training a neural network with one million images annotated by humans. "You're trying to tell the story behind the image," she said. "A natural scene will be very complex, and you want to pick out the most important objects in the image.""

Footnote

A version of this article appears in print on November 18, 2014, on page A13 of the New York edition with the headline: Advance Reported in Content-Recognition Software. Vgl.: http://cs.stanford.edu/people/karpathy/cvpr2015.pdf. Vgl. auch: http://googleresearch.blogspot.de/2014/11/a-picture-is-worth-thousand-coherent.html. https://news.ycombinator.com/item?id=8621658 Vgl. auch: https://news.ycombinator.com/item?id=8621658.

Source

http://www.nytimes.com/2014/11/18/science/researchers-announce-breakthrough-in-content-recognition-software.html

Simões, M. da Graça; Machado, L.M.; Souza, R.R.; Almeida, M.B.; Tavares Lopes, A.: Automatic indexing and ontologies : the consistency of research chronology and authoring in the context of Information Science (2018) 0.00

6.4605067E-4 = product of:
  0.00969076 = sum of:
    0.006901989 = weight(_text_:in in 5909) [ClassicSimilarity], result of:
      0.006901989 = score(doc=5909,freq=10.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.23523843 = fieldWeight in 5909, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5909)
    0.0027887707 = weight(_text_:s in 5909) [ClassicSimilarity], result of:
      0.0027887707 = score(doc=5909,freq=4.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.118916616 = fieldWeight in 5909, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5909)
  0.06666667 = coord(2/30)

Footnote: U.d.T. 'Indexação automática e ontologias: identificação dos contributos convergentes na ciência da informação' in: Ciência da Informação 46(2017) no.1, S.152-162.
Pages: S.86-94
Series: Advances in knowledge organization; vol.16
Source: Challenges and opportunities for knowledge organization in the digital age: proceedings of the Fifteenth International ISKO Conference, 9-11 July 2018, Porto, Portugal / organized by: International Society for Knowledge Organization (ISKO), ISKO Spain and Portugal Chapter, University of Porto - Faculty of Arts and Humanities, Research Centre in Communication, Information and Digital Culture (CIC.digital) - Porto. Eds.: F. Ribeiro u. M.E. Cerveira

Benson, A.C.: Image descriptions and their relational expressions : a review of the literature and the issues (2015) 0.00
```
6.4182567E-4 = product of:
  0.009627384 = sum of:
    0.007937134 = weight(_text_:in in 1867) [ClassicSimilarity], result of:
      0.007937134 = score(doc=1867,freq=18.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.27051896 = fieldWeight in 1867, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1867)
    0.0016902501 = weight(_text_:s in 1867) [ClassicSimilarity], result of:
      0.0016902501 = score(doc=1867,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.072074346 = fieldWeight in 1867, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=1867)
  0.06666667 = coord(2/30)
```
Abstract

Purpose - The purpose of this paper is to survey the treatment of relationships, relationship expressions and the ways in which they manifest themselves in image descriptions. Design/methodology/approach - The term "relationship" is construed in the broadest possible way to include spatial relationships ("to the right of"), temporal ("in 1936," "at noon"), meronymic ("part of"), and attributive ("has color," "has dimension"). The intentions of these vaguely delimited categories with image information, image creation, and description in libraries and archives is complex and in need of explanation. Findings - The review brings into question many generally held beliefs about the relationship problem such as the belief that the semantics of relationships are somehow embedded in the relationship term itself and that image search and retrieval solutions can be found through refinement of word-matching systems. Originality/value - This review has no hope of systematically examining all evidence in all disciplines pertaining to this topic. It instead focusses on a general description of a theoretical treatment in Library and Information Science.

Source

Journal of documentation. 71(2015) no.1, S.143-164

Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.00

6.355139E-4 = product of:
  0.009532709 = sum of:
    0.00756075 = weight(_text_:in in 2629) [ClassicSimilarity], result of:
      0.00756075 = score(doc=2629,freq=12.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.2576908 = fieldWeight in 2629, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2629)
    0.0019719584 = weight(_text_:s in 2629) [ClassicSimilarity], result of:
      0.0019719584 = score(doc=2629,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08408674 = fieldWeight in 2629, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2629)
  0.06666667 = coord(2/30)

Abstract: The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.
Source: Cataloging and classification quarterly. 53(2015) no.8, S.895-904

Flores, F.N.; Moreira, V.P.: Assessing the impact of stemming accuracy on information retrieval : a multilingual perspective (2016) 0.00

5.79343E-4 = product of:
  0.008690145 = sum of:
    0.0069998945 = weight(_text_:in in 3187) [ClassicSimilarity], result of:
      0.0069998945 = score(doc=3187,freq=14.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.23857531 = fieldWeight in 3187, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=3187)
    0.0016902501 = weight(_text_:s in 3187) [ClassicSimilarity], result of:
      0.0016902501 = score(doc=3187,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.072074346 = fieldWeight in 3187, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.046875 = fieldNorm(doc=3187)
  0.06666667 = coord(2/30)

Abstract: The quality of stemming algorithms is typically measured in two different ways: (i) how accurately they map the variant forms of a word to the same stem; or (ii) how much improvement they bring to Information Retrieval systems. In this article, we evaluate various stemming algorithms, in four languages, in terms of accuracy and in terms of their aid to Information Retrieval. The aim is to assess whether the most accurate stemmers are also the ones that bring the biggest gain in Information Retrieval. Experiments in English, French, Portuguese, and Spanish show that this is not always the case, as stemmers with higher error rates yield better retrieval quality. As a byproduct, we also identified the most accurate stemmers and the best for Information Retrieval purposes.
Source: Information processing and management. 52(2016) no.5, S.840-854

Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.00
```
5.7375047E-4 = product of:
  0.008606257 = sum of:
    0.006614278 = weight(_text_:in in 4292) [ClassicSimilarity], result of:
      0.006614278 = score(doc=4292,freq=18.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.22543246 = fieldWeight in 4292, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4292)
    0.001991979 = weight(_text_:s in 4292) [ClassicSimilarity], result of:
      0.001991979 = score(doc=4292,freq=4.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.08494043 = fieldWeight in 4292, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4292)
  0.06666667 = coord(2/30)
```
Abstract

Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.

Footnote

Vgl. das Erratum in JASIST 69(2018) no.7, S.956.

Source

Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133

Husevag, A.-S.R.: Named entities in indexing : a case study of TV subtitles and metadata records (2016) 0.00

5.58707E-4 = product of:
  0.008380604 = sum of:
    0.006972062 = weight(_text_:in in 3105) [ClassicSimilarity], result of:
      0.006972062 = score(doc=3105,freq=20.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.2376267 = fieldWeight in 3105, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3105)
    0.0014085418 = weight(_text_:s in 3105) [ClassicSimilarity], result of:
      0.0014085418 = score(doc=3105,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.060061958 = fieldWeight in 3105, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3105)
  0.06666667 = coord(2/30)

Abstract: This paper explores the possible role of named entities in an automatic index-ing process, based on text in subtitles. This is done by analyzing entity types, name den-sity and name frequencies in subtitles and metadata records from different TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Personal names, geographical names and names of organizations where the most prominent entity types in both the news subtitles and news metadata, while persons, works and locations are the most prominent in culture programs.
Pages: S.48-58

Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.00

5.575784E-4 = product of:
  0.008363675 = sum of:
    0.006110009 = weight(_text_:in in 3440) [ClassicSimilarity], result of:
      0.006110009 = score(doc=3440,freq=6.0), product of:
        0.029340398 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.021569785 = queryNorm
        0.2082456 = fieldWeight in 3440, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0625 = fieldNorm(doc=3440)
    0.002253667 = weight(_text_:s in 3440) [ClassicSimilarity], result of:
      0.002253667 = score(doc=3440,freq=2.0), product of:
        0.023451481 = queryWeight, product of:
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.021569785 = queryNorm
        0.09609913 = fieldWeight in 3440, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.0872376 = idf(docFreq=40523, maxDocs=44218)
          0.0625 = fieldNorm(doc=3440)
  0.06666667 = coord(2/30)

Abstract: The invention of automatic indexing using a keyword-in-context approach has generally been attributed solely to Hans Peter Luhn of IBM. This article shows that credit for this invention belongs equally to Luhn and Herbert Ohlman of the System Development Corporation. It also traces the origins of title derivative automatic indexing, its development and implementation, and current status.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.835-849

Search (53 results, page 1 of 3)

Authors

Types

Themes