Search (114 results, page 1 of 6)

  • × theme_ss:"Automatisches Indexieren"
  1. Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.03
    0.02734995 = product of:
      0.0546999 = sum of:
        0.026445134 = product of:
          0.052890267 = sum of:
            0.052890267 = weight(_text_:web in 2673) [ClassicSimilarity], result of:
              0.052890267 = score(doc=2673,freq=6.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.43716836 = fieldWeight in 2673, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2673)
          0.5 = coord(1/2)
        0.028254768 = product of:
          0.07063692 = sum of:
            0.035478037 = weight(_text_:29 in 2673) [ClassicSimilarity], result of:
              0.035478037 = score(doc=2673,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.27205724 = fieldWeight in 2673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2673)
            0.035158884 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
              0.035158884 = score(doc=2673,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.2708308 = fieldWeight in 2673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2673)
          0.4 = coord(2/5)
      0.5 = coord(2/4)
    
    Abstract
    Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
    Source
    Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156
  2. Carevic, Z.: Semi-automatische Verschlagwortung zur Integration externer semantischer Inhalte innerhalb einer medizinischen Kooperationsplattform (2012) 0.02
    0.021874327 = product of:
      0.08749731 = sum of:
        0.08749731 = sum of:
          0.017449262 = weight(_text_:web in 897) [ClassicSimilarity], result of:
            0.017449262 = score(doc=897,freq=2.0), product of:
              0.12098375 = queryWeight, product of:
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.03707166 = queryNorm
              0.14422815 = fieldWeight in 897, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2635105 = idf(docFreq=4597, maxDocs=44218)
                0.03125 = fieldNorm(doc=897)
          0.07004805 = weight(_text_:seiten in 897) [ClassicSimilarity], result of:
            0.07004805 = score(doc=897,freq=4.0), product of:
              0.20383513 = queryWeight, product of:
                5.4984083 = idf(docFreq=491, maxDocs=44218)
                0.03707166 = queryNorm
              0.34365052 = fieldWeight in 897, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.4984083 = idf(docFreq=491, maxDocs=44218)
                0.03125 = fieldNorm(doc=897)
      0.25 = coord(1/4)
    
    Abstract
    Die vorliegende Arbeit beschäftigt sich mit der Integration von externen semantischen Inhalten auf Basis eines medizinischen Begriffssystems. Die zugrundeliegende Annahme ist, dass die Verwendung einer einheitlichen Terminologie auf Seiten des Anfragesystems und der Wissensbasis zu qualitativ hochwertigen Ergebnissen führt. Um dies zu erreichen muss auf Seiten des Anfragesystems eine Abbildung natürlicher Sprache auf die verwendete Terminologie gewährleistet werden. Dies geschieht auf Basis einer (semi-)automatischen Verschlagwortung textbasierter Inhalte. Im Wesentlichen lassen sich folgende Fragestellungen festhalten: Automatische Verschlagwortung textbasierter Inhalte Kann eine automatische Verschlagwortung textbasierter Inhalte auf Basis eines Begriffssystems optimiert werden? Der zentrale Aspekt der vorliegenden Arbeit ist die (semi-)automatische Verschlagwortung textbasierter Inhalte auf Basis eines medizinischen Begriffssystems. Zu diesem Zweck wird der aktuelle Stand der Forschung betrachtet. Es werden eine Reihe von Tokenizern verglichen um zu erfahren welche Algorithmen sich zur Ermittlung von Wortgrenzen eignen. Speziell wird betrachtet, wie die Ermittlung von Wortgrenzen in einer domänenspezifischen Umgebung eingesetzt werden kann. Auf Basis von identifizierten Token in einem Text werden die Auswirkungen des Stemming und POS-Tagging auf die Gesamtmenge der zu analysierenden Inhalte beobachtet. Abschließend wird evaluiert wie ein kontrolliertes Vokabular die Präzision bei der Verschlagwortung erhöhen kann. Dies geschieht unter der Annahme dass domänenspezifische Inhalte auch innerhalb eines domänenspezifischen Begriffssystems definiert sind. Zu diesem Zweck wird ein allgemeines Prozessmodell entwickelt anhand dessen eine Verschlagwortung vorgenommen wird.
    Integration externer Inhalte Inwieweit kann die Nutzung einer einheitlichen Terminologie zwischen Anfragesystem und Wissensbasis den Prozess der Informationsbeschaffung unterstützen? Zu diesem Zweck wird in einer ersten Phase ermittelt welche Wissensbasen aus der medizinischen Domäne in der Linked Data Cloud zur Verfügung stehen. Aufbauend auf den Ergebnissen werden Informationen aus verschiedenen dezentralen Wissensbasen exemplarisch integriert. Der Fokus der Betrachtung liegt dabei auf der verwendeten Terminologie sowie der Nutzung von Semantic Web Technologien. Neben Informationen aus der Linked Data Cloud erfolgt eine Suche nach medizinischer Literatur in PubMed. Wie auch in der Linked Data Cloud erfolgt die Integration unter Verwendung einer einheitlichen Terminologie. Eine weitere Fragestellung ist, wie Informationen aus insgesamt 21. Mio Aufsatzzitaten in PubMed sinnvoll integriert werden können. Dabei wird ermittelt welche Mechanismen eingesetzt werden können um die Präzision der Ergebnisse zu optimieren. Eignung medizinischer Begriffssystem Welche medizinischen Begriffssysteme existieren und wie eignen sich diese als zugrungeliegendes Vokabular für die automatische Verschlagwortung und Integration semantischer Inhalte? Der Fokus liegt dabei speziell auf einer Bewertung der Reichhaltigkeit von Begriffssystemen, wobei insbesondere der Detaillierungsgrad von Interesse ist. Handelt es sich um ein spezifisches oder allgemeines Begriffssystem und eignet sich dieses auch dafür bestimmte Teilaspekte der Medizin, wie bspw. die Chirurige oder die Anästhesie, in einer ausreichenden Tiefe zu beschreiben?
  3. Tavakolizadeh-Ravari, M.: Analysis of the long term dynamics in thesaurus developments and its consequences (2017) 0.01
    0.014485295 = product of:
      0.02897059 = sum of:
        0.024765724 = product of:
          0.04953145 = sum of:
            0.04953145 = weight(_text_:seiten in 3081) [ClassicSimilarity], result of:
              0.04953145 = score(doc=3081,freq=2.0), product of:
                0.20383513 = queryWeight, product of:
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.03707166 = queryNorm
                0.2429976 = fieldWeight in 3081, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4984083 = idf(docFreq=491, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3081)
          0.5 = coord(1/2)
        0.004204865 = product of:
          0.021024324 = sum of:
            0.021024324 = weight(_text_:28 in 3081) [ClassicSimilarity], result of:
              0.021024324 = score(doc=3081,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.15831517 = fieldWeight in 3081, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3081)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Die Arbeit analysiert die dynamische Entwicklung und den Gebrauch von Thesaurusbegriffen. Zusätzlich konzentriert sie sich auf die Faktoren, die die Zahl von Indexbegriffen pro Dokument oder Zeitschrift beeinflussen. Als Untersuchungsobjekt dienten der MeSH und die entsprechende Datenbank "MEDLINE". Die wichtigsten Konsequenzen sind: 1. Der MeSH-Thesaurus hat sich durch drei unterschiedliche Phasen jeweils logarithmisch entwickelt. Solch einen Thesaurus sollte folgenden Gleichung folgen: "T = 3.076,6 Ln (d) - 22.695 + 0,0039d" (T = Begriffe, Ln = natürlicher Logarithmus und d = Dokumente). Um solch einen Thesaurus zu konstruieren, muss man demnach etwa 1.600 Dokumente von unterschiedlichen Themen des Bereiches des Thesaurus haben. Die dynamische Entwicklung von Thesauri wie MeSH erfordert die Einführung eines neuen Begriffs pro Indexierung von 256 neuen Dokumenten. 2. Die Verteilung der Thesaurusbegriffe erbrachte drei Kategorien: starke, normale und selten verwendete Headings. Die letzte Gruppe ist in einer Testphase, während in der ersten und zweiten Kategorie die neu hinzukommenden Deskriptoren zu einem Thesauruswachstum führen. 3. Es gibt ein logarithmisches Verhältnis zwischen der Zahl von Index-Begriffen pro Aufsatz und dessen Seitenzahl für die Artikeln zwischen einer und einundzwanzig Seiten. 4. Zeitschriftenaufsätze, die in MEDLINE mit Abstracts erscheinen erhalten fast zwei Deskriptoren mehr. 5. Die Findablity der nicht-englisch sprachigen Dokumente in MEDLINE ist geringer als die englische Dokumente. 6. Aufsätze der Zeitschriften mit einem Impact Factor 0 bis fünfzehn erhalten nicht mehr Indexbegriffe als die der anderen von MEDINE erfassten Zeitschriften. 7. In einem Indexierungssystem haben unterschiedliche Zeitschriften mehr oder weniger Gewicht in ihrem Findability. Die Verteilung der Indexbegriffe pro Seite hat gezeigt, dass es bei MEDLINE drei Kategorien der Publikationen gibt. Außerdem gibt es wenige stark bevorzugten Zeitschriften."
    Date
    24. 8.2016 13:45:28
  4. Spitters, M.J.: Adjust : automatische thesauriele ontsluiting van grote hoeveelheden krantenartikelen (1999) 0.01
    0.012389246 = product of:
      0.049556985 = sum of:
        0.049556985 = product of:
          0.123892464 = sum of:
            0.06307297 = weight(_text_:28 in 3938) [ClassicSimilarity], result of:
              0.06307297 = score(doc=3938,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.4749455 = fieldWeight in 3938, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3938)
            0.06081949 = weight(_text_:29 in 3938) [ClassicSimilarity], result of:
              0.06081949 = score(doc=3938,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.46638384 = fieldWeight in 3938, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3938)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Date
    27. 8.2005 12:55:28
    Source
    Informatie professional. 3(1999) no.10, S.29-31
  5. Schulz, K.U.; Brunner, L.: Vollautomatische thematische Verschlagwortung großer Textkollektionen mittels semantischer Netze (2017) 0.01
    0.0111818565 = product of:
      0.022363713 = sum of:
        0.015268105 = product of:
          0.03053621 = sum of:
            0.03053621 = weight(_text_:web in 3493) [ClassicSimilarity], result of:
              0.03053621 = score(doc=3493,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.25239927 = fieldWeight in 3493, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3493)
          0.5 = coord(1/2)
        0.0070956075 = product of:
          0.035478037 = sum of:
            0.035478037 = weight(_text_:29 in 3493) [ClassicSimilarity], result of:
              0.035478037 = score(doc=3493,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.27205724 = fieldWeight in 3493, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3493)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Source
    Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
  6. Böhm, A.; Seifert, C.; Schlötterer, J.; Granitzer, M.: Identifying tweets from the economic domain (2017) 0.01
    0.0111818565 = product of:
      0.022363713 = sum of:
        0.015268105 = product of:
          0.03053621 = sum of:
            0.03053621 = weight(_text_:web in 3495) [ClassicSimilarity], result of:
              0.03053621 = score(doc=3495,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.25239927 = fieldWeight in 3495, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3495)
          0.5 = coord(1/2)
        0.0070956075 = product of:
          0.035478037 = sum of:
            0.035478037 = weight(_text_:29 in 3495) [ClassicSimilarity], result of:
              0.035478037 = score(doc=3495,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.27205724 = fieldWeight in 3495, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3495)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Source
    Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
  7. Kempf, A.O.: Neue Verfahrenswege der Wissensorganisation : eine Evaluation automatischer Indexierung in der sozialwissenschaftlichen Fachinformation (2017) 0.01
    0.0111818565 = product of:
      0.022363713 = sum of:
        0.015268105 = product of:
          0.03053621 = sum of:
            0.03053621 = weight(_text_:web in 3497) [ClassicSimilarity], result of:
              0.03053621 = score(doc=3497,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.25239927 = fieldWeight in 3497, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3497)
          0.5 = coord(1/2)
        0.0070956075 = product of:
          0.035478037 = sum of:
            0.035478037 = weight(_text_:29 in 3497) [ClassicSimilarity], result of:
              0.035478037 = score(doc=3497,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.27205724 = fieldWeight in 3497, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3497)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Source
    Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
  8. Salton, G.: Another look at automatic text-retrieval systems (1986) 0.01
    0.010324373 = product of:
      0.04129749 = sum of:
        0.04129749 = product of:
          0.10324372 = sum of:
            0.052560814 = weight(_text_:28 in 1356) [ClassicSimilarity], result of:
              0.052560814 = score(doc=1356,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.39578792 = fieldWeight in 1356, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1356)
            0.05068291 = weight(_text_:29 in 1356) [ClassicSimilarity], result of:
              0.05068291 = score(doc=1356,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.38865322 = fieldWeight in 1356, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1356)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Footnote
    Bezugnahme auf: Blair, D.C.: An evaluation of retrieval effectiveness for a full-text document-retrieval system. Comm. ACM 28(1985) S.280-299. - Vgl. auch: Blair, D.C.: Full text retrieval ... Int. Class. 13(1986) S.18-23; Blair, D.C., M.E. Maron: full-text information retrieval ... Inf. Proc. Man. 26(1990) S.437-447.
    Source
    Communications of the Association for Computing Machinery. 29(1986), S.648-656
  9. Souza, R.R.; Raghavan, K.S.: ¬A methodology for noun phrase-based automatic indexing (2006) 0.01
    0.009697122 = product of:
      0.019394243 = sum of:
        0.013086946 = product of:
          0.026173891 = sum of:
            0.026173891 = weight(_text_:web in 173) [ClassicSimilarity], result of:
              0.026173891 = score(doc=173,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.21634221 = fieldWeight in 173, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=173)
          0.5 = coord(1/2)
        0.0063072974 = product of:
          0.031536486 = sum of:
            0.031536486 = weight(_text_:28 in 173) [ClassicSimilarity], result of:
              0.031536486 = score(doc=173,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23747274 = fieldWeight in 173, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.046875 = fieldNorm(doc=173)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    The scholarly community is increasingly employing the Web both for publication of scholarly output and for locating and accessing relevant scholarly literature. Organization of this vast body of digital information assumes significance in this context. The sheer volume of digital information to be handled makes traditional indexing and knowledge representation strategies ineffective and impractical. It is, therefore, worth exploring new approaches. An approach being discussed considers the intrinsic semantics of texts of documents. Based on the hypothesis that noun phrases in a text are semantically rich in terms of their ability to represent the subject content of the document, this approach seeks to identify and extract noun phrases instead of single keywords, and use them as descriptors. This paper presents a methodology that has been developed for extracting noun phrases from Portuguese texts. The results of an experiment carried out to test the adequacy of the methodology are also presented.
    Date
    6. 1.1997 18:30:28
  10. Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.01
    0.008223023 = product of:
      0.032892093 = sum of:
        0.032892093 = product of:
          0.08223023 = sum of:
            0.042048648 = weight(_text_:28 in 3581) [ClassicSimilarity], result of:
              0.042048648 = score(doc=3581,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.31663033 = fieldWeight in 3581, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3581)
            0.040181585 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
              0.040181585 = score(doc=3581,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.30952093 = fieldWeight in 3581, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3581)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Date
    24. 3.2006 12:22:02
    Source
    ABI-Technik. 26(2006) H.1, S.18-28
  11. Pintscher, L.; Bourgonje, P.; Moreno Schneider, J.; Ostendorff, M.; Rehm, G.: Wissensbasen für die automatische Erschließung und ihre Qualität am Beispiel von Wikidata : die Inhaltserschließungspolitik der Deutschen Nationalbibliothek (2021) 0.01
    0.008080935 = product of:
      0.01616187 = sum of:
        0.010905789 = product of:
          0.021811578 = sum of:
            0.021811578 = weight(_text_:web in 366) [ClassicSimilarity], result of:
              0.021811578 = score(doc=366,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.18028519 = fieldWeight in 366, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=366)
          0.5 = coord(1/2)
        0.0052560815 = product of:
          0.026280407 = sum of:
            0.026280407 = weight(_text_:28 in 366) [ClassicSimilarity], result of:
              0.026280407 = score(doc=366,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19789396 = fieldWeight in 366, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=366)
          0.2 = coord(1/5)
      0.5 = coord(2/4)
    
    Abstract
    Wikidata ist eine freie Wissensbasis, die allgemeine Daten über die Welt zur Verfügung stellt. Sie wird von Wikimedia entwickelt und betrieben, wie auch das Schwesterprojekt Wikipedia. Die Daten in Wikidata werden von einer großen Community von Freiwilligen gesammelt und gepflegt, wobei die Daten sowie die zugrundeliegende Ontologie von vielen Projekten, Institutionen und Firmen als Basis für Applikationen und Visualisierungen, aber auch für das Training von maschinellen Lernverfahren genutzt werden. Wikidata nutzt MediaWiki und die Erweiterung Wikibase als technische Grundlage der kollaborativen Arbeit an einer Wissensbasis, die verlinkte offene Daten für Menschen und Maschinen zugänglich macht. Ende 2020 beschreibt Wikidata über 90 Millionen Entitäten unter Verwendung von über 8 000 Eigenschaften, womit insgesamt mehr als 1,15 Milliarden Aussagen über die beschriebenen Entitäten getroffen werden. Die Datenobjekte dieser Entitäten sind mit äquivalenten Einträgen in mehr als 5 500 externen Datenbanken, Katalogen und Webseiten verknüpft, was Wikidata zu einem der zentralen Knotenpunkte des Linked Data Web macht. Mehr als 11 500 aktiv Editierende tragen neue Daten in die Wissensbasis ein und pflegen sie. Diese sind in Wiki-Projekten organisiert, die jeweils bestimmte Themenbereiche oder Aufgabengebiete adressieren. Die Daten werden in mehr als der Hälfte der Inhaltsseiten in den Wikimedia-Projekten genutzt und unter anderem mehr als 6,5 Millionen Mal am Tag über den SPARQL-Endpoint abgefragt, um sie in externe Applikationen und Visualisierungen einzubinden.
    Date
    23. 9.2021 19:08:28
  12. Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.01
    0.007195145 = product of:
      0.02878058 = sum of:
        0.02878058 = product of:
          0.07195145 = sum of:
            0.03679257 = weight(_text_:28 in 530) [ClassicSimilarity], result of:
              0.03679257 = score(doc=530,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.27705154 = fieldWeight in 530, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=530)
            0.035158884 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
              0.035158884 = score(doc=530,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.2708308 = fieldWeight in 530, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=530)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Source
    International forum on information and documentation. 22(1997) no.1, S.17-28
  13. Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.01
    0.0071410015 = product of:
      0.028564006 = sum of:
        0.028564006 = product of:
          0.057128012 = sum of:
            0.057128012 = weight(_text_:web in 4285) [ClassicSimilarity], result of:
              0.057128012 = score(doc=4285,freq=28.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.47219574 = fieldWeight in 4285, product of:
                  5.2915025 = tf(freq=28.0), with freq of:
                    28.0 = termFreq=28.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=4285)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
    Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).
  14. Franke-Maier, M.: Anforderungen an die Qualität der Inhaltserschließung im Spannungsfeld von intellektuell und automatisch erzeugten Metadaten (2018) 0.01
    0.007063692 = product of:
      0.028254768 = sum of:
        0.028254768 = product of:
          0.07063692 = sum of:
            0.035478037 = weight(_text_:29 in 5344) [ClassicSimilarity], result of:
              0.035478037 = score(doc=5344,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.27205724 = fieldWeight in 5344, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5344)
            0.035158884 = weight(_text_:22 in 5344) [ClassicSimilarity], result of:
              0.035158884 = score(doc=5344,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.2708308 = fieldWeight in 5344, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5344)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Abstract
    Spätestens seit dem Deutschen Bibliothekartag 2018 hat sich die Diskussion zu den automatischen Verfahren der Inhaltserschließung der Deutschen Nationalbibliothek von einer politisch geführten Diskussion in eine Qualitätsdiskussion verwandelt. Der folgende Beitrag beschäftigt sich mit Fragen der Qualität von Inhaltserschließung in digitalen Zeiten, wo heterogene Erzeugnisse unterschiedlicher Verfahren aufeinandertreffen und versucht, wichtige Anforderungen an Qualität zu definieren. Dieser Tagungsbeitrag fasst die vom Autor als Impulse vorgetragenen Ideen beim Workshop der FAG "Erschließung und Informationsvermittlung" des GBV am 29. August 2018 in Kiel zusammen. Der Workshop fand im Rahmen der 22. Verbundkonferenz des GBV statt.
  15. Ward, M.L.: ¬The future of the human indexer (1996) 0.01
    0.0061672674 = product of:
      0.02466907 = sum of:
        0.02466907 = product of:
          0.061672673 = sum of:
            0.031536486 = weight(_text_:28 in 7244) [ClassicSimilarity], result of:
              0.031536486 = score(doc=7244,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23747274 = fieldWeight in 7244, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7244)
            0.030136187 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
              0.030136187 = score(doc=7244,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.23214069 = fieldWeight in 7244, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7244)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Date
    9. 2.1997 18:44:22
    Source
    Journal of librarianship and information science. 28(1996) no.4, S.217-225
  16. Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.01
    0.0056668143 = product of:
      0.022667257 = sum of:
        0.022667257 = product of:
          0.045334514 = sum of:
            0.045334514 = weight(_text_:web in 2721) [ClassicSimilarity], result of:
              0.045334514 = score(doc=2721,freq=6.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.37471575 = fieldWeight in 2721, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2721)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
  17. McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996) 0.01
    0.0054528946 = product of:
      0.021811578 = sum of:
        0.021811578 = product of:
          0.043623157 = sum of:
            0.043623157 = weight(_text_:web in 2533) [ClassicSimilarity], result of:
              0.043623157 = score(doc=2533,freq=2.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.36057037 = fieldWeight in 2533, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2533)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  18. Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.01
    0.0053980905 = product of:
      0.021592362 = sum of:
        0.021592362 = product of:
          0.043184724 = sum of:
            0.043184724 = weight(_text_:web in 6750) [ClassicSimilarity], result of:
              0.043184724 = score(doc=6750,freq=4.0), product of:
                0.12098375 = queryWeight, product of:
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.03707166 = queryNorm
                0.35694647 = fieldWeight in 6750, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.2635105 = idf(docFreq=4597, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=6750)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    As the amount of accessible information on the WWW increases, so will the cost of accessing it, even if search servcies remain free, due to the increasing amount of time users will have to spend to find needed items. Considers what the seemingly unorganized Web and the organized world of libraries can offer each other. The OCLC Scorpion Project is attempting to combine indexing and cataloguing, specifically focusing on building tools for automatic subject recognition using the technqiues of library science and information retrieval. If subject headings or concept domains can be automatically assigned to electronic items, improved filtering tools for searching can be produced
  19. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.01
    0.0051621865 = product of:
      0.020648746 = sum of:
        0.020648746 = product of:
          0.05162186 = sum of:
            0.026280407 = weight(_text_:28 in 5400) [ClassicSimilarity], result of:
              0.026280407 = score(doc=5400,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19789396 = fieldWeight in 5400, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5400)
            0.025341455 = weight(_text_:29 in 5400) [ClassicSimilarity], result of:
              0.025341455 = score(doc=5400,freq=2.0), product of:
                0.13040651 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19432661 = fieldWeight in 5400, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5400)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Date
    29. 9.2019 12:18:42
    Footnote
    Beitrag eines Special Issue: Research Information Systems and Science Classifications; including papers from "Trajectories for Research: Fathoming the Promise of the NARCIS Classification," 27-28 September 2018, The Hague, The Netherlands.
  20. Milstead, J.L.: Thesauri in a full-text world (1998) 0.01
    0.0051393895 = product of:
      0.020557558 = sum of:
        0.020557558 = product of:
          0.051393896 = sum of:
            0.026280407 = weight(_text_:28 in 2337) [ClassicSimilarity], result of:
              0.026280407 = score(doc=2337,freq=2.0), product of:
                0.13280044 = queryWeight, product of:
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19789396 = fieldWeight in 2337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5822632 = idf(docFreq=3342, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2337)
            0.025113491 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
              0.025113491 = score(doc=2337,freq=2.0), product of:
                0.12981863 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03707166 = queryNorm
                0.19345059 = fieldWeight in 2337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2337)
          0.4 = coord(2/5)
      0.25 = coord(1/4)
    
    Date
    22. 9.1997 19:16:05
    Pages
    S.28-38

Years

Languages

Types

  • a 100
  • el 12
  • x 6
  • m 5
  • p 1
  • s 1
  • More… Less…