Search (301 results, page 1 of 16)

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.05

0.05065006 = product of:
  0.08441676 = sum of:
    0.026643137 = weight(_text_:on in 1952) [ClassicSimilarity], result of:
      0.026643137 = score(doc=1952,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24300331 = fieldWeight in 1952, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.078125 = fieldNorm(doc=1952)
    0.024003621 = weight(_text_:information in 1952) [ClassicSimilarity], result of:
      0.024003621 = score(doc=1952,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.27429342 = fieldWeight in 1952, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1952)
    0.03377 = product of:
      0.06754 = sum of:
        0.06754 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.06754 = score(doc=1952,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Date: 16. 8.1998 12:51:22
Footnote: Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
Source: Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella

Smart, G.: Using language analysis to manage information (1993) 0.04

0.04273203 = product of:
  0.07122005 = sum of:
    0.02131451 = weight(_text_:on in 4423) [ClassicSimilarity], result of:
      0.02131451 = score(doc=4423,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.19440265 = fieldWeight in 4423, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0625 = fieldNorm(doc=4423)
    0.030362446 = weight(_text_:information in 4423) [ClassicSimilarity], result of:
      0.030362446 = score(doc=4423,freq=10.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.3469568 = fieldWeight in 4423, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=4423)
    0.01954309 = product of:
      0.03908618 = sum of:
        0.03908618 = weight(_text_:technology in 4423) [ClassicSimilarity], result of:
          0.03908618 = score(doc=4423,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.2632547 = fieldWeight in 4423, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0625 = fieldNorm(doc=4423)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: The ESPRIT project SIMPR developed software to analyse documents and generate indexes for them. Of immediate application as a document indexing and classification system, this also offers a technology for information modelling that has broader implications, supporting many new uses for information management softeware. The project was based on the assumption that information can only be managed successfully by computer systems that can view the information contained in a document through the language in which the document is written, and that systems need to be sufficiently flexible to respond to the changing requirements of document use

Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.04

0.041640554 = product of:
  0.06940092 = sum of:
    0.031971764 = weight(_text_:on in 2721) [ClassicSimilarity], result of:
      0.031971764 = score(doc=2721,freq=8.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.29160398 = fieldWeight in 2721, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.022771835 = weight(_text_:information in 2721) [ClassicSimilarity], result of:
      0.022771835 = score(doc=2721,freq=10.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.2602176 = fieldWeight in 2721, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2721)
    0.014657319 = product of:
      0.029314637 = sum of:
        0.029314637 = weight(_text_:technology in 2721) [ClassicSimilarity], result of:
          0.029314637 = score(doc=2721,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.19744103 = fieldWeight in 2721, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
Source: Information processing and management. 49(2013) no.2, S.420-440

Alexander, M.: Automatic indexing of document images using Excalibur EFS (1995) 0.04

0.04124558 = product of:
  0.06874263 = sum of:
    0.02131451 = weight(_text_:on in 1911) [ClassicSimilarity], result of:
      0.02131451 = score(doc=1911,freq=2.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.19440265 = fieldWeight in 1911, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0625 = fieldNorm(doc=1911)
    0.013578499 = weight(_text_:information in 1911) [ClassicSimilarity], result of:
      0.013578499 = score(doc=1911,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.1551638 = fieldWeight in 1911, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=1911)
    0.03384963 = product of:
      0.06769926 = sum of:
        0.06769926 = weight(_text_:technology in 1911) [ClassicSimilarity], result of:
          0.06769926 = score(doc=1911,freq=6.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.45597056 = fieldWeight in 1911, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0625 = fieldNorm(doc=1911)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Discusses research into the application of adaptive pattern recognition technology to enable effective retrieval from scanned document images. Describes application at the British Library of Excalibur EFS software which uses adaptive pattern recognition technology to provide access to digital information in its native forms, fuzzy searching retrieval and automatic indexing capabilities. It was used to make specialist printed catalogues and indexes accessible on computer via content based indexes
Source: Library technology news. 1995, no.16, S.4-8

Hodges, P.R.: Keyword in title indexes : effectiveness of retrieval in computer searches (1983) 0.04

0.04009014 = product of:
  0.066816896 = sum of:
    0.02637536 = weight(_text_:on in 5001) [ClassicSimilarity], result of:
      0.02637536 = score(doc=5001,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24056101 = fieldWeight in 5001, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.016802534 = weight(_text_:information in 5001) [ClassicSimilarity], result of:
      0.016802534 = score(doc=5001,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.1920054 = fieldWeight in 5001, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5001)
    0.023639 = product of:
      0.047278 = sum of:
        0.047278 = weight(_text_:22 in 5001) [ClassicSimilarity], result of:
          0.047278 = score(doc=5001,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.2708308 = fieldWeight in 5001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5001)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: A study was done to test the effectiveness of retrieval using title word searching. It was based on actual search profiles used in the Mechanized Information Center at Ohio State University, in order ro replicate as closely as possible actual searching conditions. Fewer than 50% of the relevant titles were retrieved by keywords in titles. The low rate of retrieval can be attributes to three sources: titles themselves, user and information specialist ignorance of the subject vocabulary in use, and to general language problems. Across fields it was found that the social sciences had the best retrieval rate, with science having the next best, and arts and humanities the lowest. Ways to enhance and supplement keyword in title searching on the computer and in printed indexes are discussed.
Date: 14. 3.1996 13:22:21

Bordoni, L.; Pazienza, M.T.: Documents automatic indexing in an environmental domain (1997) 0.04

0.04009014 = product of:
  0.066816896 = sum of:
    0.02637536 = weight(_text_:on in 530) [ClassicSimilarity], result of:
      0.02637536 = score(doc=530,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24056101 = fieldWeight in 530, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.016802534 = weight(_text_:information in 530) [ClassicSimilarity], result of:
      0.016802534 = score(doc=530,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.1920054 = fieldWeight in 530, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=530)
    0.023639 = product of:
      0.047278 = sum of:
        0.047278 = weight(_text_:22 in 530) [ClassicSimilarity], result of:
          0.047278 = score(doc=530,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.2708308 = fieldWeight in 530, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=530)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Describes an application of Natural Language Processing (NLP) techniques, in HIRMA (Hypertextual Information Retrieval Managed by ARIOSTO), to the problem of document indexing by referring to a system which incorporates natural language processing techniques to determine the subject of the text of documents and to associate them with relevant semantic indexes. Describes briefly the overall system, details of its implementation on a corpus of scientific abstracts related to environmental topics and experimental evidence of the system's behaviour. Analyzes in detail an experiment designed to evaluate the system's retrieval ability in terms of recall and precision
Source: International forum on information and documentation. 22(1997) no.1, S.17-28

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.04

0.039312378 = product of:
  0.098280944 = sum of:
    0.016802534 = weight(_text_:information in 5291) [ClassicSimilarity], result of:
      0.016802534 = score(doc=5291,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.1920054 = fieldWeight in 5291, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.08147841 = sum of:
      0.03420041 = weight(_text_:technology in 5291) [ClassicSimilarity], result of:
        0.03420041 = score(doc=5291,freq=2.0), product of:
          0.14847288 = queryWeight, product of:
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.049850095 = queryNorm
          0.23034787 = fieldWeight in 5291, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.978387 = idf(docFreq=6114, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5291)
      0.047278 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
        0.047278 = score(doc=5291,freq=2.0), product of:
          0.17456654 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.049850095 = queryNorm
          0.2708308 = fieldWeight in 5291, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=5291)
  0.4 = coord(2/5)

Abstract: We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
Date: 22. 7.2006 17:32:00
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767

Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.04

0.03713733 = product of:
  0.06189555 = sum of:
    0.02637536 = weight(_text_:on in 2673) [ClassicSimilarity], result of:
      0.02637536 = score(doc=2673,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24056101 = fieldWeight in 2673, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.011881187 = weight(_text_:information in 2673) [ClassicSimilarity], result of:
      0.011881187 = score(doc=2673,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.13576832 = fieldWeight in 2673, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2673)
    0.023639 = product of:
      0.047278 = sum of:
        0.047278 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
          0.047278 = score(doc=2673,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.2708308 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Date: 1. 8.1996 22:08:06

Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.03
```
0.034392346 = product of:
  0.057320572 = sum of:
    0.023073634 = weight(_text_:on in 4292) [ClassicSimilarity], result of:
      0.023073634 = score(doc=4292,freq=6.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.21044704 = fieldWeight in 4292, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4292)
    0.016973123 = weight(_text_:information in 4292) [ClassicSimilarity], result of:
      0.016973123 = score(doc=4292,freq=8.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.19395474 = fieldWeight in 4292, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4292)
    0.017273815 = product of:
      0.03454763 = sum of:
        0.03454763 = weight(_text_:technology in 4292) [ClassicSimilarity], result of:
          0.03454763 = score(doc=4292,freq=4.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.23268649 = fieldWeight in 4292, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.

Source

Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133

Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.03

0.034048714 = product of:
  0.056747854 = sum of:
    0.027688364 = weight(_text_:on in 1871) [ClassicSimilarity], result of:
      0.027688364 = score(doc=1871,freq=6.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.25253648 = fieldWeight in 1871, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
    0.0144021725 = weight(_text_:information in 1871) [ClassicSimilarity], result of:
      0.0144021725 = score(doc=1871,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.16457605 = fieldWeight in 1871, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1871)
    0.014657319 = product of:
      0.029314637 = sum of:
        0.029314637 = weight(_text_:technology in 1871) [ClassicSimilarity], result of:
          0.029314637 = score(doc=1871,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.19744103 = fieldWeight in 1871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=1871)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.7, S.1026-1040

Moreno, J.M.T.: Automatic text summarization (2014) 0.03
```
0.033328988 = product of:
  0.05554831 = sum of:
    0.029787935 = weight(_text_:on in 1518) [ClassicSimilarity], result of:
      0.029787935 = score(doc=1518,freq=10.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.271686 = fieldWeight in 1518, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1518)
    0.0084865615 = weight(_text_:information in 1518) [ClassicSimilarity], result of:
      0.0084865615 = score(doc=1518,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.09697737 = fieldWeight in 1518, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1518)
    0.017273815 = product of:
      0.03454763 = sum of:
        0.03454763 = weight(_text_:technology in 1518) [ClassicSimilarity], result of:
          0.03454763 = score(doc=1518,freq=4.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.23268649 = fieldWeight in 1518, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1518)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

This new textbook examines the motivations and the different algorithms for automatic document summarization (ADS). We performed a recent state of the art. The book shows the main problems of ADS, difficulties and the solutions provided by the community. It presents recent advances in ADS, as well as current applications and trends. The approaches are statistical, linguistic and symbolic. Several exemples are included in order to clarify the theoretical concepts. The books currently available in the area of Automatic Document Summarization are not recent. Powerful algorithms have been developed in recent years that include several applications of ADS. The development of recent technology has impacted on the development of algorithms and their applications. The massive use of social networks and the new forms of the technology requires the adaptation of the classical methods of text summarizers. This is a new textbook on Automatic Text Summarization, based on teaching materials used in two or one-semester courses. It presents a extensive state-of-art and describes the new systems on the subject. Previous automatic summarization books have been either collections of specialized papers, or else authored books with only a chapter or two devoted to the field as a whole. In other hand, the classic books on the subject are not recent.

Content

Automatic Text Summarization Some Important Concepts 23 Single document Summarization 53 Guided Multi-Document Summarization 109 Emerging systems 151 Source and DomainSpecific Summarization 179 Text Abstracting 219 Evaluating Document Summaries 243 Conclusion 275 Information Retrieval NLP and Automatic Text Summarization 281 Automatic Text Summarization Resources 305

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.03

0.0324756 = product of:
  0.08118899 = sum of:
    0.027156997 = weight(_text_:information in 402) [ClassicSimilarity], result of:
      0.027156997 = score(doc=402,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.3103276 = fieldWeight in 402, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.125 = fieldNorm(doc=402)
    0.054031998 = product of:
      0.108063996 = sum of:
        0.108063996 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.108063996 = score(doc=402,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information processing and management. 22(1986) no.6, S.465-476

Milstead, J.L.: Thesauri in a full-text world (1998) 0.03

0.031618603 = product of:
  0.052697666 = sum of:
    0.018839544 = weight(_text_:on in 2337) [ClassicSimilarity], result of:
      0.018839544 = score(doc=2337,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.1718293 = fieldWeight in 2337, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.016973123 = weight(_text_:information in 2337) [ClassicSimilarity], result of:
      0.016973123 = score(doc=2337,freq=8.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.19395474 = fieldWeight in 2337, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2337)
    0.016885 = product of:
      0.03377 = sum of:
        0.03377 = weight(_text_:22 in 2337) [ClassicSimilarity], result of:
          0.03377 = score(doc=2337,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.19345059 = fieldWeight in 2337, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2337)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Despite early claims to the contemporary, thesauri continue to find use as access tools for information in the full-text environment. Their mode of use is changing, but this change actually represents an expansion rather than a contrdiction of their utility. Thesauri and similar vocabulary tools can complement full-text access by aiding users in focusing their searches, by supplementing the linguistic analysis of the text search engine, and even by serving as one of the tools used by the linguistic engine for its analysis. While human indexing contunues to be used for many databases, the trend is to increase the use of machine aids for this purpose. All machine-aided indexing (MAI) systems rely on thesauri as the basis for term selection. In the 21st century, the balance of effort between human and machine will change at both input and output, but thesauri will continue to play an important role for the foreseeable future
Date: 22. 9.1997 19:16:05
Imprint: Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
Source: Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al

Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.03

0.030515628 = product of:
  0.050859377 = sum of:
    0.026643137 = weight(_text_:on in 2161) [ClassicSimilarity], result of:
      0.026643137 = score(doc=2161,freq=8.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24300331 = fieldWeight in 2161, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2161)
    0.012001811 = weight(_text_:information in 2161) [ClassicSimilarity], result of:
      0.012001811 = score(doc=2161,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.13714671 = fieldWeight in 2161, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2161)
    0.012214432 = product of:
      0.024428863 = sum of:
        0.024428863 = weight(_text_:technology in 2161) [ClassicSimilarity], result of:
          0.024428863 = score(doc=2161,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.16453418 = fieldWeight in 2161, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2161)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.
Source: Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1799-1816

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.03
```
0.029992333 = product of:
  0.04998722 = sum of:
    0.023073634 = weight(_text_:on in 3300) [ClassicSimilarity], result of:
      0.023073634 = score(doc=3300,freq=6.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.21044704 = fieldWeight in 3300, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.014699157 = weight(_text_:information in 3300) [ClassicSimilarity], result of:
      0.014699157 = score(doc=3300,freq=6.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.16796975 = fieldWeight in 3300, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.012214432 = product of:
      0.024428863 = sum of:
        0.024428863 = weight(_text_:technology in 3300) [ClassicSimilarity], result of:
          0.024428863 = score(doc=3300,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.16453418 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Source

Journal of the American Society for Information Science and Technology. 60(2009) no.12, S.2530-2539

Chung, Y.M.; Lee, J.Y.: ¬A corpus-based approach to comparative evaluation of statistical term association measures (2001) 0.03

0.02881626 = product of:
  0.0480271 = sum of:
    0.018839544 = weight(_text_:on in 5769) [ClassicSimilarity], result of:
      0.018839544 = score(doc=5769,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.1718293 = fieldWeight in 5769, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.016973123 = weight(_text_:information in 5769) [ClassicSimilarity], result of:
      0.016973123 = score(doc=5769,freq=8.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.19395474 = fieldWeight in 5769, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5769)
    0.012214432 = product of:
      0.024428863 = sum of:
        0.024428863 = weight(_text_:technology in 5769) [ClassicSimilarity], result of:
          0.024428863 = score(doc=5769,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.16453418 = fieldWeight in 5769, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5769)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: Statistical association measures have been widely applied in information retrieval research, usually employing a clustering of documents or terms on the basis of their relationships. Applications of the association measures for term clustering include automatic thesaurus construction and query expansion. This research evaluates the similarity of six association measures by comparing the relationship and behavior they demonstrate in various analyses of a test corpus. Analysis techniques include comparisons of highly ranked term pairs and term clusters, analyses of the correlation among the association measures using Pearson's correlation coefficient and MDS mapping, and an analysis of the impact of a term frequency on the association values by means of z-score. The major findings of the study are as follows: First, the most similar association measures are mutual information and Yule's coefficient of colligation Y, whereas cosine and Jaccard coefficients, as well as X**2 statistic and likelihood ratio, demonstrate quite similar behavior for terms with high frequency. Second, among all the measures, the X**2 statistic is the least affected by the frequency of terms. Third, although cosine and Jaccard coefficients tend to emphasize high frequency terms, mutual information and Yule's Y seem to overestimate rare terms
Source: Journal of the American Society for Information Science and technology. 52(2001) no.4, S.283-296

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.03

0.028635815 = product of:
  0.047726355 = sum of:
    0.018839544 = weight(_text_:on in 1794) [ClassicSimilarity], result of:
      0.018839544 = score(doc=1794,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.1718293 = fieldWeight in 1794, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.012001811 = weight(_text_:information in 1794) [ClassicSimilarity], result of:
      0.012001811 = score(doc=1794,freq=4.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.13714671 = fieldWeight in 1794, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.016885 = product of:
      0.03377 = sum of:
        0.03377 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.03377 = score(doc=1794,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: In this article, we describe and test a two-stage algorithm based on a lexical collocation technique which maps from the lexical clues contained in a document representation into a controlled vocabulary list of subject headings. Using a collection of 4.626 INSPEC documents, we create a 'dictionary' of associations between the lexical items contained in the titles, authors, and abstracts, and controlled vocabulary subject headings assigned to those records by human indexers using a likelihood ratio statistic as the measure of association. In the deployment stage, we use the dictiony to predict which of the controlled vocabulary subject headings best describe new documents when they are presented to the system. Our evaluation of this algorithm, in which we compare the automatically assigned subject headings to the subject headings assigned to the test documents by human catalogers, shows that we can obtain results comparable to, and consistent with, human cataloging. In effect we have cast this as a classic partial match information retrieval problem. We consider the problem to be one of 'retrieving' (or assigning) the most probably 'relevant' (or correct) controlled vocabulary subject headings to a document based on the clues contained in that document
Date: 11. 9.2000 19:53:22
Source: Journal of the American Society for Information Science. 49(1998) no.10, S.888-902

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.03

0.028579636 = product of:
  0.071449086 = sum of:
    0.037679087 = weight(_text_:on in 2759) [ClassicSimilarity], result of:
      0.037679087 = score(doc=2759,freq=4.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.3436586 = fieldWeight in 2759, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.078125 = fieldNorm(doc=2759)
    0.03377 = product of:
      0.06754 = sum of:
        0.06754 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
          0.06754 = score(doc=2759,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.38690117 = fieldWeight in 2759, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2759)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.03

0.028416147 = product of:
  0.07104037 = sum of:
    0.023762373 = weight(_text_:information in 6265) [ClassicSimilarity], result of:
      0.023762373 = score(doc=6265,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.27153665 = fieldWeight in 6265, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.109375 = fieldNorm(doc=6265)
    0.047278 = product of:
      0.094556 = sum of:
        0.094556 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.094556 = score(doc=6265,freq=2.0), product of:
            0.17456654 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049850095 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Source: Information outlook. 9(2005) no.8, S.22-23

Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.03
```
0.028406477 = product of:
  0.047344126 = sum of:
    0.026643137 = weight(_text_:on in 5816) [ClassicSimilarity], result of:
      0.026643137 = score(doc=5816,freq=8.0), product of:
        0.109641045 = queryWeight, product of:
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.049850095 = queryNorm
        0.24300331 = fieldWeight in 5816, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          2.199415 = idf(docFreq=13325, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5816)
    0.0084865615 = weight(_text_:information in 5816) [ClassicSimilarity], result of:
      0.0084865615 = score(doc=5816,freq=2.0), product of:
        0.08751074 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.049850095 = queryNorm
        0.09697737 = fieldWeight in 5816, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5816)
    0.012214432 = product of:
      0.024428863 = sum of:
        0.024428863 = weight(_text_:technology in 5816) [ClassicSimilarity], result of:
          0.024428863 = score(doc=5816,freq=2.0), product of:
            0.14847288 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.049850095 = queryNorm
            0.16453418 = fieldWeight in 5816, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5816)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.

Source

Journal of the Association for Information Science and Technology. 71(2020) no.5, S.553-567

Search (301 results, page 1 of 16)

Authors

Years

Languages

Types

Themes

Subjects

Classifications