-
Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986)
0.04
0.038550586 = product of:
0.07710117 = sum of:
0.025789656 = weight(_text_:information in 402) [ClassicSimilarity], result of:
0.025789656 = score(doc=402,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.3103276 = fieldWeight in 402, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.125 = fieldNorm(doc=402)
0.05131151 = product of:
0.10262302 = sum of:
0.10262302 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
0.10262302 = score(doc=402,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.61904186 = fieldWeight in 402, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.125 = fieldNorm(doc=402)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Source
- Information processing and management. 22(1986) no.6, S.465-476
-
Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005)
0.03
0.03373176 = product of:
0.06746352 = sum of:
0.02256595 = weight(_text_:information in 6265) [ClassicSimilarity], result of:
0.02256595 = score(doc=6265,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.27153665 = fieldWeight in 6265, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.109375 = fieldNorm(doc=6265)
0.04489757 = product of:
0.08979514 = sum of:
0.08979514 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
0.08979514 = score(doc=6265,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.5416616 = fieldWeight in 6265, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.109375 = fieldNorm(doc=6265)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Source
- Information outlook. 9(2005) no.8, S.22-23
-
Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988)
0.03
0.02743237 = product of:
0.05486474 = sum of:
0.02279505 = weight(_text_:information in 1952) [ClassicSimilarity], result of:
0.02279505 = score(doc=1952,freq=4.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.27429342 = fieldWeight in 1952, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.078125 = fieldNorm(doc=1952)
0.032069694 = product of:
0.06413939 = sum of:
0.06413939 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
0.06413939 = score(doc=1952,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.38690117 = fieldWeight in 1952, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.078125 = fieldNorm(doc=1952)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Date
- 16. 8.1998 12:51:22
- Footnote
- Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
- Source
- Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella
-
McKiernan, G.: Automated categorisation of Web resources : a profile of selected projects, research, products, and services (1996)
0.03
0.025684355 = product of:
0.05136871 = sum of:
0.016118534 = weight(_text_:information in 2533) [ClassicSimilarity], result of:
0.016118534 = score(doc=2533,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.19395474 = fieldWeight in 2533, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.078125 = fieldNorm(doc=2533)
0.035250176 = product of:
0.07050035 = sum of:
0.07050035 = weight(_text_:services in 2533) [ClassicSimilarity], result of:
0.07050035 = score(doc=2533,freq=2.0), product of:
0.1738033 = queryWeight, product of:
3.6713707 = idf(docFreq=3057, maxDocs=44218)
0.047340166 = queryNorm
0.405633 = fieldWeight in 2533, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.6713707 = idf(docFreq=3057, maxDocs=44218)
0.078125 = fieldNorm(doc=2533)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Source
- New review of information networking. 1996, no.2, S.15-40
-
Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998)
0.02
0.024094114 = product of:
0.04818823 = sum of:
0.016118534 = weight(_text_:information in 4157) [ClassicSimilarity], result of:
0.016118534 = score(doc=4157,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.19395474 = fieldWeight in 4157, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.078125 = fieldNorm(doc=4157)
0.032069694 = product of:
0.06413939 = sum of:
0.06413939 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
0.06413939 = score(doc=4157,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.38690117 = fieldWeight in 4157, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.078125 = fieldNorm(doc=4157)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Source
- Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
-
Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996)
0.02
0.023995128 = product of:
0.047990255 = sum of:
0.022334497 = weight(_text_:information in 6752) [ClassicSimilarity], result of:
0.022334497 = score(doc=6752,freq=6.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.2687516 = fieldWeight in 6752, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0625 = fieldNorm(doc=6752)
0.025655756 = product of:
0.05131151 = sum of:
0.05131151 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
0.05131151 = score(doc=6752,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.30952093 = fieldWeight in 6752, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0625 = fieldNorm(doc=6752)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
- Date
- 6. 3.1997 16:22:15
-
Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006)
0.02
0.019275293 = product of:
0.038550586 = sum of:
0.012894828 = weight(_text_:information in 3581) [ClassicSimilarity], result of:
0.012894828 = score(doc=3581,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.1551638 = fieldWeight in 3581, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0625 = fieldNorm(doc=3581)
0.025655756 = product of:
0.05131151 = sum of:
0.05131151 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
0.05131151 = score(doc=3581,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.30952093 = fieldWeight in 3581, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0625 = fieldNorm(doc=3581)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
- Date
- 24. 3.2006 12:22:02
-
Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983)
0.02
0.01920266 = product of:
0.03840532 = sum of:
0.015956536 = weight(_text_:information in 5001) [ClassicSimilarity], result of:
0.015956536 = score(doc=5001,freq=4.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.1920054 = fieldWeight in 5001, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=5001)
0.022448786 = product of:
0.04489757 = sum of:
0.04489757 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
0.04489757 = score(doc=5001,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.2708308 = fieldWeight in 5001, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=5001)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
- Date
- 14. 3.1996 13:22:21
-
Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997)
0.02
0.01920266 = product of:
0.03840532 = sum of:
0.015956536 = weight(_text_:information in 530) [ClassicSimilarity], result of:
0.015956536 = score(doc=530,freq=4.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.1920054 = fieldWeight in 530, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=530)
0.022448786 = product of:
0.04489757 = sum of:
0.04489757 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
0.04489757 = score(doc=530,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.2708308 = fieldWeight in 530, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=530)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
- Source
- International forum on information and documentation. 22(1997) no.1, S.17-28
-
Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006)
0.02
0.01920266 = product of:
0.03840532 = sum of:
0.015956536 = weight(_text_:information in 5291) [ClassicSimilarity], result of:
0.015956536 = score(doc=5291,freq=4.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.1920054 = fieldWeight in 5291, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=5291)
0.022448786 = product of:
0.04489757 = sum of:
0.04489757 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
0.04489757 = score(doc=5291,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.2708308 = fieldWeight in 5291, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=5291)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
- Date
- 22. 7.2006 17:32:00
- Source
- Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767
-
Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997)
0.02
0.01686588 = product of:
0.03373176 = sum of:
0.011282975 = weight(_text_:information in 2673) [ClassicSimilarity], result of:
0.011282975 = score(doc=2673,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.13576832 = fieldWeight in 2673, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=2673)
0.022448786 = product of:
0.04489757 = sum of:
0.04489757 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
0.04489757 = score(doc=2673,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.2708308 = fieldWeight in 2673, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=2673)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
- Date
- 1. 8.1996 22:08:06
-
Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001)
0.02
0.01686588 = product of:
0.03373176 = sum of:
0.011282975 = weight(_text_:information in 5671) [ClassicSimilarity], result of:
0.011282975 = score(doc=5671,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.13576832 = fieldWeight in 5671, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0546875 = fieldNorm(doc=5671)
0.022448786 = product of:
0.04489757 = sum of:
0.04489757 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
0.04489757 = score(doc=5671,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.2708308 = fieldWeight in 5671, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0546875 = fieldNorm(doc=5671)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Date
- 22. 3.2001 13:14:48
- Source
- nfd Information - Wissenschaft und Praxis. 52(2001) H.2, S.69-78
-
Milstead, J.L.: Thesauri in a full-text world (1998)
0.02
0.016076691 = product of:
0.032153383 = sum of:
0.016118534 = weight(_text_:information in 2337) [ClassicSimilarity], result of:
0.016118534 = score(doc=2337,freq=8.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.19395474 = fieldWeight in 2337, product of:
2.828427 = tf(freq=8.0), with freq of:
8.0 = termFreq=8.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0390625 = fieldNorm(doc=2337)
0.016034847 = product of:
0.032069694 = sum of:
0.032069694 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
0.032069694 = score(doc=2337,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.19345059 = fieldWeight in 2337, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0390625 = fieldNorm(doc=2337)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
- Date
- 22. 9.1997 19:16:05
- Imprint
- Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
- Source
- Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
-
Ward, M.L.: ¬The future of the human indexer (1996)
0.01
0.014456468 = product of:
0.028912935 = sum of:
0.009671121 = weight(_text_:information in 7244) [ClassicSimilarity], result of:
0.009671121 = score(doc=7244,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.116372846 = fieldWeight in 7244, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.046875 = fieldNorm(doc=7244)
0.019241815 = product of:
0.03848363 = sum of:
0.03848363 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
0.03848363 = score(doc=7244,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.23214069 = fieldWeight in 7244, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046875 = fieldNorm(doc=7244)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Date
- 9. 2.1997 18:44:22
- Source
- Journal of librarianship and information science. 28(1996) no.4, S.217-225
-
Lorenz, S.: Konzeption und prototypische Realisierung einer begriffsbasierten Texterschließung (2006)
0.01
0.014456468 = product of:
0.028912935 = sum of:
0.009671121 = weight(_text_:information in 1746) [ClassicSimilarity], result of:
0.009671121 = score(doc=1746,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.116372846 = fieldWeight in 1746, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.046875 = fieldNorm(doc=1746)
0.019241815 = product of:
0.03848363 = sum of:
0.03848363 = weight(_text_:22 in 1746) [ClassicSimilarity], result of:
0.03848363 = score(doc=1746,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.23214069 = fieldWeight in 1746, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.046875 = fieldNorm(doc=1746)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- Im Rahmen dieser Arbeit wird eine Vorgehensweise entwickelt, die die Fixierung auf das Wort und die damit verbundenen Schwächen überwindet. Sie gestattet die Extraktion von Informationen anhand der repräsentierten Begriffe und bildet damit die Basis einer inhaltlichen Texterschließung. Die anschließende prototypische Realisierung dient dazu, die Konzeption zu überprüfen sowie ihre Möglichkeiten und Grenzen abzuschätzen und zu bewerten. Arbeiten zum Information Extraction widmen sich fast ausschließlich dem Englischen, wobei insbesondere im Bereich der Named Entities sehr gute Ergebnisse erzielt werden. Deutlich schlechter sehen die Resultate für weniger regelmäßige Sprachen wie beispielsweise das Deutsche aus. Aus diesem Grund sowie praktischen Erwägungen wie insbesondere der Vertrautheit des Autors damit, soll diese Sprache primär Gegenstand der Untersuchungen sein. Die Lösung von einer engen Termorientierung bei gleichzeitiger Betonung der repräsentierten Begriffe legt nahe, dass nicht nur die verwendeten Worte sekundär werden sondern auch die verwendete Sprache. Um den Rahmen dieser Arbeit nicht zu sprengen wird bei der Untersuchung dieses Punktes das Augenmerk vor allem auf die mit unterschiedlichen Sprachen verbundenen Schwierigkeiten und Besonderheiten gelegt.
- Date
- 22. 3.2015 9:17:30
-
Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998)
0.01
0.013716185 = product of:
0.02743237 = sum of:
0.011397525 = weight(_text_:information in 1794) [ClassicSimilarity], result of:
0.011397525 = score(doc=1794,freq=4.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.13714671 = fieldWeight in 1794, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.0390625 = fieldNorm(doc=1794)
0.016034847 = product of:
0.032069694 = sum of:
0.032069694 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
0.032069694 = score(doc=1794,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.19345059 = fieldWeight in 1794, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.0390625 = fieldNorm(doc=1794)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Abstract
- In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
- Date
- 11. 9.2000 19:53:22
- Source
- Journal of the American Society for Information Science. 49(1998) no.10, S.888-902
-
Nohr, H.: Grundlagen der automatischen Indexierung : ein Lehrbuch (2003)
0.01
0.011997564 = product of:
0.023995128 = sum of:
0.011167249 = weight(_text_:information in 1767) [ClassicSimilarity], result of:
0.011167249 = score(doc=1767,freq=6.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.1343758 = fieldWeight in 1767, product of:
2.4494898 = tf(freq=6.0), with freq of:
6.0 = termFreq=6.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.03125 = fieldNorm(doc=1767)
0.012827878 = product of:
0.025655756 = sum of:
0.025655756 = weight(_text_:22 in 1767) [ClassicSimilarity], result of:
0.025655756 = score(doc=1767,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.15476047 = fieldWeight in 1767, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.03125 = fieldNorm(doc=1767)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Date
- 22. 6.2009 12:46:51
- Footnote
- Rez. in: nfd 54(2003) H.5, S.314 (W. Ratzek): "Um entscheidungsrelevante Daten aus der ständig wachsenden Flut von mehr oder weniger relevanten Dokumenten zu extrahieren, müssen Unternehmen, öffentliche Verwaltung oder Einrichtungen der Fachinformation effektive und effiziente Filtersysteme entwickeln, einsetzen und pflegen. Das vorliegende Lehrbuch von Holger Nohr bietet erstmalig eine grundlegende Einführung in das Thema "automatische Indexierung". Denn: "Wie man Information sammelt, verwaltet und verwendet, wird darüber entscheiden, ob man zu den Gewinnern oder Verlierern gehört" (Bill Gates), heißt es einleitend. Im ersten Kapitel "Einleitung" stehen die Grundlagen im Mittelpunkt. Die Zusammenhänge zwischen Dokumenten-Management-Systeme, Information Retrieval und Indexierung für Planungs-, Entscheidungs- oder Innovationsprozesse, sowohl in Profit- als auch Non-Profit-Organisationen werden beschrieben. Am Ende des einleitenden Kapitels geht Nohr auf die Diskussion um die intellektuelle und automatische Indexierung ein und leitet damit über zum zweiten Kapitel "automatisches Indexieren. Hier geht der Autor überblickartig unter anderem ein auf - Probleme der automatischen Sprachverarbeitung und Indexierung - verschiedene Verfahren der automatischen Indexierung z.B. einfache Stichwortextraktion / Volltextinvertierung, - statistische Verfahren, Pattern-Matching-Verfahren. Die "Verfahren der automatischen Indexierung" behandelt Nohr dann vertiefend und mit vielen Beispielen versehen im umfangreichsten dritten Kapitel. Das vierte Kapitel "Keyphrase Extraction" nimmt eine Passpartout-Status ein: "Eine Zwischenstufe auf dem Weg von der automatischen Indexierung hin zur automatischen Generierung textueller Zusammenfassungen (Automatic Text Summarization) stellen Ansätze dar, die Schlüsselphrasen aus Dokumenten extrahieren (Keyphrase Extraction). Die Grenzen zwischen den automatischen Verfahren der Indexierung und denen des Text Summarization sind fließend." (S. 91). Am Beispiel NCR"s Extractor/Copernic Summarizer beschreibt Nohr die Funktionsweise.
Im fünften Kapitel "Information Extraction" geht Nohr auf eine Problemstellung ein, die in der Fachwelt eine noch stärkere Betonung verdiente: "Die stetig ansteigende Zahl elektronischer Dokumente macht neben einer automatischen Erschließung auch eine automatische Gewinnung der relevanten Informationen aus diesen Dokumenten wünschenswert, um diese z.B. für weitere Bearbeitungen oder Auswertungen in betriebliche Informationssysteme übernehmen zu können." (S. 103) "Indexierung und Retrievalverfahren" als voneinander abhängige Verfahren werden im sechsten Kapitel behandelt. Hier stehen Relevance Ranking und Relevance Feedback sowie die Anwendung informationslinguistischer Verfahren in der Recherche im Mittelpunkt. Die "Evaluation automatischer Indexierung" setzt den thematischen Schlusspunkt. Hier geht es vor allem um die Oualität einer Indexierung, um gängige Retrievalmaße in Retrievaltest und deren Einssatz. Weiterhin ist hervorzuheben, dass jedes Kapitel durch die Vorgabe von Lernzielen eingeleitet wird und zu den jeweiligen Kapiteln (im hinteren Teil des Buches) einige Kontrollfragen gestellt werden. Die sehr zahlreichen Beispiele aus der Praxis, ein Abkürzungsverzeichnis und ein Sachregister erhöhen den Nutzwert des Buches. Die Lektüre förderte beim Rezensenten das Verständnis für die Zusammenhänge von BID-Handwerkzeug, Wirtschaftsinformatik (insbesondere Data Warehousing) und Künstlicher Intelligenz. Die "Grundlagen der automatischen Indexierung" sollte auch in den bibliothekarischen Studiengängen zur Pflichtlektüre gehören. Holger Nohrs Lehrbuch ist auch für den BID-Profi geeignet, um die mehr oder weniger fundierten Kenntnisse auf dem Gebiet "automatisches Indexieren" schnell, leicht verständlich und informativ aufzufrischen."
-
Fuhr, N.; Niewelt, B.: ¬Ein Retrievaltest mit automatisch indexierten Dokumenten (1984)
0.01
0.011224393 = product of:
0.04489757 = sum of:
0.04489757 = product of:
0.08979514 = sum of:
0.08979514 = weight(_text_:22 in 262) [ClassicSimilarity], result of:
0.08979514 = score(doc=262,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.5416616 = fieldWeight in 262, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.109375 = fieldNorm(doc=262)
0.5 = coord(1/2)
0.25 = coord(1/4)
- Date
- 20.10.2000 12:22:23
-
Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019)
0.01
0.010972949 = product of:
0.021945897 = sum of:
0.00911802 = weight(_text_:information in 5499) [ClassicSimilarity], result of:
0.00911802 = score(doc=5499,freq=4.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.10971737 = fieldWeight in 5499, product of:
2.0 = tf(freq=4.0), with freq of:
4.0 = termFreq=4.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.03125 = fieldNorm(doc=5499)
0.012827878 = product of:
0.025655756 = sum of:
0.025655756 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
0.025655756 = score(doc=5499,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.15476047 = fieldWeight in 5499, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.03125 = fieldNorm(doc=5499)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Date
- 20. 1.2015 18:30:22
- Footnote
- Beitrag in einem Special Issue: Information Science in the German-speaking Countries.
- Source
- Aslib journal of information management. 71(2019) no.3, S.415-439
-
Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014)
0.01
0.009637646 = product of:
0.019275293 = sum of:
0.006447414 = weight(_text_:information in 1441) [ClassicSimilarity], result of:
0.006447414 = score(doc=1441,freq=2.0), product of:
0.08310462 = queryWeight, product of:
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.047340166 = queryNorm
0.0775819 = fieldWeight in 1441, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
1.7554779 = idf(docFreq=20772, maxDocs=44218)
0.03125 = fieldNorm(doc=1441)
0.012827878 = product of:
0.025655756 = sum of:
0.025655756 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
0.025655756 = score(doc=1441,freq=2.0), product of:
0.16577719 = queryWeight, product of:
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.047340166 = queryNorm
0.15476047 = fieldWeight in 1441, product of:
1.4142135 = tf(freq=2.0), with freq of:
2.0 = termFreq=2.0
3.5018296 = idf(docFreq=3622, maxDocs=44218)
0.03125 = fieldNorm(doc=1441)
0.5 = coord(1/2)
0.5 = coord(2/4)
- Source
- Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik