Search (268 results, page 1 of 14)

  • × theme_ss:"Automatisches Indexieren"
  1. Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.05
    0.046909127 = product of:
      0.093818255 = sum of:
        0.016039573 = weight(_text_:information in 5291) [ClassicSimilarity], result of:
          0.016039573 = score(doc=5291,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1920054 = fieldWeight in 5291, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
        0.07777868 = sum of:
          0.032647457 = weight(_text_:technology in 5291) [ClassicSimilarity], result of:
            0.032647457 = score(doc=5291,freq=2.0), product of:
              0.1417311 = queryWeight, product of:
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.047586527 = queryNorm
              0.23034787 = fieldWeight in 5291, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5291)
          0.04513122 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
            0.04513122 = score(doc=5291,freq=2.0), product of:
              0.16663991 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047586527 = queryNorm
              0.2708308 = fieldWeight in 5291, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(2/4)
    
    Abstract
    We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
    Date
    22. 7.2006 17:32:00
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767
  2. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.04
    0.038751204 = product of:
      0.07750241 = sum of:
        0.025923865 = weight(_text_:information in 402) [ClassicSimilarity], result of:
          0.025923865 = score(doc=402,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.3103276 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
        0.05157854 = product of:
          0.10315708 = sum of:
            0.10315708 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.10315708 = score(doc=402,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  3. Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.03
    0.0339073 = product of:
      0.0678146 = sum of:
        0.022683382 = weight(_text_:information in 6265) [ClassicSimilarity], result of:
          0.022683382 = score(doc=6265,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.27153665 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
        0.04513122 = product of:
          0.09026244 = sum of:
            0.09026244 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
              0.09026244 = score(doc=6265,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.5416616 = fieldWeight in 6265, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6265)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Information outlook. 9(2005) no.8, S.22-23
  4. Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.03
    0.027575132 = product of:
      0.055150263 = sum of:
        0.022913676 = weight(_text_:information in 1952) [ClassicSimilarity], result of:
          0.022913676 = score(doc=1952,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.27429342 = fieldWeight in 1952, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
        0.032236587 = product of:
          0.064473175 = sum of:
            0.064473175 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
              0.064473175 = score(doc=1952,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.38690117 = fieldWeight in 1952, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1952)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    16. 8.1998 12:51:22
    Footnote
    Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
    Source
    Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella
  5. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.03
    0.026805215 = product of:
      0.05361043 = sum of:
        0.0091654705 = weight(_text_:information in 5499) [ClassicSimilarity], result of:
          0.0091654705 = score(doc=5499,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.10971737 = fieldWeight in 5499, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
        0.04444496 = sum of:
          0.01865569 = weight(_text_:technology in 5499) [ClassicSimilarity], result of:
            0.01865569 = score(doc=5499,freq=2.0), product of:
              0.1417311 = queryWeight, product of:
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.047586527 = queryNorm
              0.13162735 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
          0.02578927 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
            0.02578927 = score(doc=5499,freq=2.0), product of:
              0.16663991 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047586527 = queryNorm
              0.15476047 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
      0.5 = coord(2/4)
    
    Abstract
    Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.
    Date
    20. 1.2015 18:30:22
    Footnote
    Beitrag in einem Special Issue: Information Science in the German-speaking Countries.
    Source
    Aslib journal of information management. 71(2019) no.3, S.415-439
  6. Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.02
    0.024219502 = product of:
      0.048439004 = sum of:
        0.016202414 = weight(_text_:information in 4157) [ClassicSimilarity], result of:
          0.016202414 = score(doc=4157,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.19395474 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
        0.032236587 = product of:
          0.064473175 = sum of:
            0.064473175 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
              0.064473175 = score(doc=4157,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.38690117 = fieldWeight in 4157, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=4157)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill
  7. Riloff, E.: ¬An empirical study of automated dictionary construction for information extraction in three domains (1996) 0.02
    0.02412 = product of:
      0.04824 = sum of:
        0.022450726 = weight(_text_:information in 6752) [ClassicSimilarity], result of:
          0.022450726 = score(doc=6752,freq=6.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.2687516 = fieldWeight in 6752, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=6752)
        0.02578927 = product of:
          0.05157854 = sum of:
            0.05157854 = weight(_text_:22 in 6752) [ClassicSimilarity], result of:
              0.05157854 = score(doc=6752,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.30952093 = fieldWeight in 6752, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6752)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    AutoSlog is a system that addresses the knowledge engineering bottleneck for information extraction. AutoSlog automatically creates domain specific dictionaries for information extraction, given an appropriate training corpus. Describes experiments with AutoSlog in terrorism, joint ventures and microelectronics domains. Compares the performance of AutoSlog across the 3 domains, discusses the lessons learned and presents results from 2 experiments which demonstrate that novice users can generate effective dictionaries using AutoSlog
    Date
    6. 3.1997 16:22:15
  8. Smart, G.: Using language analysis to manage information (1993) 0.02
    0.023819726 = product of:
      0.047639452 = sum of:
        0.028983762 = weight(_text_:information in 4423) [ClassicSimilarity], result of:
          0.028983762 = score(doc=4423,freq=10.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.3469568 = fieldWeight in 4423, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=4423)
        0.01865569 = product of:
          0.03731138 = sum of:
            0.03731138 = weight(_text_:technology in 4423) [ClassicSimilarity], result of:
              0.03731138 = score(doc=4423,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2632547 = fieldWeight in 4423, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4423)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The ESPRIT project SIMPR developed software to analyse documents and generate indexes for them. Of immediate application as a document indexing and classification system, this also offers a technology for information modelling that has broader implications, supporting many new uses for information management softeware. The project was based on the assumption that information can only be managed successfully by computer systems that can view the information contained in a document through the language in which the document is written, and that systems need to be sufficiently flexible to respond to the changing requirements of document use
  9. Alexander, M.: Automatic indexing of document images using Excalibur EFS (1995) 0.02
    0.022637269 = product of:
      0.045274537 = sum of:
        0.012961932 = weight(_text_:information in 1911) [ClassicSimilarity], result of:
          0.012961932 = score(doc=1911,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1551638 = fieldWeight in 1911, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=1911)
        0.032312606 = product of:
          0.06462521 = sum of:
            0.06462521 = weight(_text_:technology in 1911) [ClassicSimilarity], result of:
              0.06462521 = score(doc=1911,freq=6.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.45597056 = fieldWeight in 1911, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1911)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Discusses research into the application of adaptive pattern recognition technology to enable effective retrieval from scanned document images. Describes application at the British Library of Excalibur EFS software which uses adaptive pattern recognition technology to provide access to digital information in its native forms, fuzzy searching retrieval and automatic indexing capabilities. It was used to make specialist printed catalogues and indexes accessible on computer via content based indexes
    Source
    Library technology news. 1995, no.16, S.4-8
  10. Lepsky, K.; Vorhauer, J.: Lingo - ein open source System für die Automatische Indexierung deutschsprachiger Dokumente (2006) 0.02
    0.019375602 = product of:
      0.038751204 = sum of:
        0.012961932 = weight(_text_:information in 3581) [ClassicSimilarity], result of:
          0.012961932 = score(doc=3581,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1551638 = fieldWeight in 3581, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=3581)
        0.02578927 = product of:
          0.05157854 = sum of:
            0.05157854 = weight(_text_:22 in 3581) [ClassicSimilarity], result of:
              0.05157854 = score(doc=3581,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.30952093 = fieldWeight in 3581, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3581)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Lingo ist ein frei verfügbares System (open source) zur automatischen Indexierung der deutschen Sprache. Bei der Entwicklung von lingo standen hohe Konfigurierbarkeit und Flexibilität des Systems für unterschiedliche Einsatzmöglichkeiten im Vordergrund. Der Beitrag zeigt den Nutzen einer linguistisch basierten automatischen Indexierung für das Information Retrieval auf. Die für eine Retrievalverbesserung zur Verfügung stehende linguistische Funktionalität von lingo wird vorgestellt und an Beispielen erläutert: Grundformerkennung, Kompositumerkennung bzw. Kompositumzerlegung, Wortrelationierung, lexikalische und algorithmische Mehrwortgruppenerkennung, OCR-Fehlerkorrektur. Der offene Systemaufbau von lingo wird beschrieben, mögliche Einsatzszenarien und Anwendungsgrenzen werden benannt.
    Date
    24. 3.2006 12:22:02
  11. Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.02
    0.019302592 = product of:
      0.038605183 = sum of:
        0.016039573 = weight(_text_:information in 5001) [ClassicSimilarity], result of:
          0.016039573 = score(doc=5001,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1920054 = fieldWeight in 5001, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
        0.02256561 = product of:
          0.04513122 = sum of:
            0.04513122 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
              0.04513122 = score(doc=5001,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2708308 = fieldWeight in 5001, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5001)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
    Date
    14. 3.1996 13:22:21
  12. Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.02
    0.019302592 = product of:
      0.038605183 = sum of:
        0.016039573 = weight(_text_:information in 530) [ClassicSimilarity], result of:
          0.016039573 = score(doc=530,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1920054 = fieldWeight in 530, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
        0.02256561 = product of:
          0.04513122 = sum of:
            0.04513122 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
              0.04513122 = score(doc=530,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2708308 = fieldWeight in 530, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=530)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
    Source
    International forum on information and documentation. 22(1997) no.1, S.17-28
  13. Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.02
    0.018049927 = product of:
      0.036099855 = sum of:
        0.02196309 = weight(_text_:information in 4285) [ClassicSimilarity], result of:
          0.02196309 = score(doc=4285,freq=30.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.2629142 = fieldWeight in 4285, product of:
              5.477226 = tf(freq=30.0), with freq of:
                30.0 = termFreq=30.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4285)
        0.014136764 = product of:
          0.028273528 = sum of:
            0.028273528 = weight(_text_:technology in 4285) [ClassicSimilarity], result of:
              0.028273528 = score(doc=4285,freq=6.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19948712 = fieldWeight in 4285, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=4285)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
    Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).
    Source
    Annual review of information science and technology. 37(2003), S.91-126
  14. Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.02
    0.017864795 = product of:
      0.03572959 = sum of:
        0.021737823 = weight(_text_:information in 2721) [ClassicSimilarity], result of:
          0.021737823 = score(doc=2721,freq=10.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.2602176 = fieldWeight in 2721, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
        0.013991767 = product of:
          0.027983533 = sum of:
            0.027983533 = weight(_text_:technology in 2721) [ClassicSimilarity], result of:
              0.027983533 = score(doc=2721,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19744103 = fieldWeight in 2721, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2721)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
    Source
    Information processing and management. 49(2013) no.2, S.420-440
  15. Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.02
    0.01695365 = product of:
      0.0339073 = sum of:
        0.011341691 = weight(_text_:information in 2673) [ClassicSimilarity], result of:
          0.011341691 = score(doc=2673,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.13576832 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.02256561 = product of:
          0.04513122 = sum of:
            0.04513122 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
              0.04513122 = score(doc=2673,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2708308 = fieldWeight in 2673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2673)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
    Date
    1. 8.1996 22:08:06
  16. Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001) 0.02
    0.01695365 = product of:
      0.0339073 = sum of:
        0.011341691 = weight(_text_:information in 5671) [ClassicSimilarity], result of:
          0.011341691 = score(doc=5671,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.13576832 = fieldWeight in 5671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5671)
        0.02256561 = product of:
          0.04513122 = sum of:
            0.04513122 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
              0.04513122 = score(doc=5671,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2708308 = fieldWeight in 5671, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5671)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Date
    22. 3.2001 13:14:48
    Source
    nfd Information - Wissenschaft und Praxis. 52(2001) H.2, S.69-78
  17. Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.02
    0.016345935 = product of:
      0.03269187 = sum of:
        0.016202414 = weight(_text_:information in 4292) [ClassicSimilarity], result of:
          0.016202414 = score(doc=4292,freq=8.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.19395474 = fieldWeight in 4292, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
        0.016489455 = product of:
          0.03297891 = sum of:
            0.03297891 = weight(_text_:technology in 4292) [ClassicSimilarity], result of:
              0.03297891 = score(doc=4292,freq=4.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.23268649 = fieldWeight in 4292, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4292)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133
  18. Milstead, J.L.: Thesauri in a full-text world (1998) 0.02
    0.016160354 = product of:
      0.032320708 = sum of:
        0.016202414 = weight(_text_:information in 2337) [ClassicSimilarity], result of:
          0.016202414 = score(doc=2337,freq=8.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.19395474 = fieldWeight in 2337, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
        0.016118294 = product of:
          0.032236587 = sum of:
            0.032236587 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
              0.032236587 = score(doc=2337,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19345059 = fieldWeight in 2337, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2337)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
    Date
    22. 9.1997 19:16:05
    Imprint
    Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  19. Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.02
    0.015808811 = product of:
      0.031617623 = sum of:
        0.012961932 = weight(_text_:information in 3440) [ClassicSimilarity], result of:
          0.012961932 = score(doc=3440,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1551638 = fieldWeight in 3440, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=3440)
        0.01865569 = product of:
          0.03731138 = sum of:
            0.03731138 = weight(_text_:technology in 3440) [ClassicSimilarity], result of:
              0.03731138 = score(doc=3440,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2632547 = fieldWeight in 3440, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3440)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.835-849
  20. Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.02
    0.015751816 = product of:
      0.031503633 = sum of:
        0.019843826 = weight(_text_:information in 6029) [ClassicSimilarity], result of:
          0.019843826 = score(doc=6029,freq=12.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.23754507 = fieldWeight in 6029, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6029)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 6029) [ClassicSimilarity], result of:
              0.02331961 = score(doc=6029,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 6029, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6029)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications
    Source
    Journal of the American Society for Information Science and technology. 52(2001) no.9, S.748-762

Languages

Types

  • a 232
  • x 14
  • m 13
  • el 10
  • s 8
  • d 1
  • More… Less…