Search (172 results, page 1 of 9)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.36
    0.36031273 = product of:
      0.6305472 = sum of:
        0.047687992 = product of:
          0.14306398 = sum of:
            0.14306398 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.14306398 = score(doc=562,freq=2.0), product of:
                0.25455406 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03002521 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.14306398 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.14306398 = score(doc=562,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.03496567 = weight(_text_:classification in 562) [ClassicSimilarity], result of:
          0.03496567 = score(doc=562,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3656675 = fieldWeight in 562, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.07153199 = product of:
          0.14306398 = sum of:
            0.14306398 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.14306398 = score(doc=562,freq=2.0), product of:
                0.25455406 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03002521 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
        0.14306398 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.14306398 = score(doc=562,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.03496567 = weight(_text_:classification in 562) [ClassicSimilarity], result of:
          0.03496567 = score(doc=562,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3656675 = fieldWeight in 562, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.14306398 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.14306398 = score(doc=562,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.0122040035 = product of:
          0.024408007 = sum of:
            0.024408007 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.024408007 = score(doc=562,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.5714286 = coord(8/14)
    
    Abstract
    Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.21
    0.2064732 = product of:
      0.4817708 = sum of:
        0.14306398 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.14306398 = score(doc=563,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.02018744 = weight(_text_:classification in 563) [ClassicSimilarity], result of:
          0.02018744 = score(doc=563,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.14306398 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.14306398 = score(doc=563,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.02018744 = weight(_text_:classification in 563) [ClassicSimilarity], result of:
          0.02018744 = score(doc=563,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.14306398 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
          0.14306398 = score(doc=563,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
        0.0122040035 = product of:
          0.024408007 = sum of:
            0.024408007 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
              0.024408007 = score(doc=563,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.23214069 = fieldWeight in 563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=563)
          0.5 = coord(1/2)
      0.42857143 = coord(6/14)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Content
    A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
    Date
    10. 1.2013 19:22:47
  3. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.20
    0.1958614 = product of:
      0.5484119 = sum of:
        0.047687992 = product of:
          0.14306398 = sum of:
            0.14306398 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.14306398 = score(doc=862,freq=2.0), product of:
                0.25455406 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03002521 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
        0.14306398 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.14306398 = score(doc=862,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
        0.07153199 = product of:
          0.14306398 = sum of:
            0.14306398 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.14306398 = score(doc=862,freq=2.0), product of:
                0.25455406 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.03002521 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.5 = coord(1/2)
        0.14306398 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.14306398 = score(doc=862,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
        0.14306398 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
          0.14306398 = score(doc=862,freq=2.0), product of:
            0.25455406 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03002521 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.35714287 = coord(5/14)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  4. Morris, V.: Automated language identification of bibliographic resources (2020) 0.03
    0.031521946 = product of:
      0.11032681 = sum of:
        0.026916584 = weight(_text_:classification in 5749) [ClassicSimilarity], result of:
          0.026916584 = score(doc=5749,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.28149095 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
        0.04022163 = weight(_text_:bibliographic in 5749) [ClassicSimilarity], result of:
          0.04022163 = score(doc=5749,freq=2.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.34409973 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
        0.026916584 = weight(_text_:classification in 5749) [ClassicSimilarity], result of:
          0.026916584 = score(doc=5749,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.28149095 = fieldWeight in 5749, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=5749)
        0.016272005 = product of:
          0.03254401 = sum of:
            0.03254401 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
              0.03254401 = score(doc=5749,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.30952093 = fieldWeight in 5749, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5749)
          0.5 = coord(1/2)
      0.2857143 = coord(4/14)
    
    Date
    2. 3.2020 19:04:22
    Source
    Cataloging and classification quarterly. 58(2020) no.1, S.1-27
  5. Chou, C.; Chu, T.: ¬An analysis of BERT (NLP) for assisted subject indexing for Project Gutenberg (2022) 0.03
    0.027005369 = product of:
      0.12602505 = sum of:
        0.059409913 = weight(_text_:subject in 1139) [ClassicSimilarity], result of:
          0.059409913 = score(doc=1139,freq=8.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.5532265 = fieldWeight in 1139, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.033307575 = weight(_text_:classification in 1139) [ClassicSimilarity], result of:
          0.033307575 = score(doc=1139,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.34832728 = fieldWeight in 1139, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
        0.033307575 = weight(_text_:classification in 1139) [ClassicSimilarity], result of:
          0.033307575 = score(doc=1139,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.34832728 = fieldWeight in 1139, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1139)
      0.21428572 = coord(3/14)
    
    Abstract
    In light of AI (Artificial Intelligence) and NLP (Natural language processing) technologies, this article examines the feasibility of using AI/NLP models to enhance the subject indexing of digital resources. While BERT (Bidirectional Encoder Representations from Transformers) models are widely used in scholarly communities, the authors assess whether BERT models can be used in machine-assisted indexing in the Project Gutenberg collection, through suggesting Library of Congress subject headings filtered by certain Library of Congress Classification subclass labels. The findings of this study are informative for further research on BERT models to assist with automatic subject indexing for digital library collections.
    Source
    Cataloging and classification quarterly. 60(2022) no.8, p.807-835
  6. Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.02
    0.024449104 = product of:
      0.11409582 = sum of:
        0.028549349 = weight(_text_:classification in 2339) [ClassicSimilarity], result of:
          0.028549349 = score(doc=2339,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.29856625 = fieldWeight in 2339, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2339)
        0.05699712 = product of:
          0.11399424 = sum of:
            0.11399424 = weight(_text_:schemes in 2339) [ClassicSimilarity], result of:
              0.11399424 = score(doc=2339,freq=8.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.7094823 = fieldWeight in 2339, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2339)
          0.5 = coord(1/2)
        0.028549349 = weight(_text_:classification in 2339) [ClassicSimilarity], result of:
          0.028549349 = score(doc=2339,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.29856625 = fieldWeight in 2339, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=2339)
      0.21428572 = coord(3/14)
    
    Abstract
    Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.
  7. Carrillo-de-Albornoz, J.; Plaza, L.: ¬An emotion-based model of negation, intensifiers, and modality for polarity and intensity classification (2013) 0.02
    0.023001626 = product of:
      0.10734092 = sum of:
        0.041207436 = weight(_text_:classification in 1005) [ClassicSimilarity], result of:
          0.041207436 = score(doc=1005,freq=12.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.43094325 = fieldWeight in 1005, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1005)
        0.041207436 = weight(_text_:classification in 1005) [ClassicSimilarity], result of:
          0.041207436 = score(doc=1005,freq=12.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.43094325 = fieldWeight in 1005, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1005)
        0.024926046 = product of:
          0.04985209 = sum of:
            0.04985209 = weight(_text_:texts in 1005) [ClassicSimilarity], result of:
              0.04985209 = score(doc=1005,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.302856 = fieldWeight in 1005, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1005)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    Negation, intensifiers, and modality are common linguistic constructions that may modify the emotional meaning of the text and therefore need to be taken into consideration in sentiment analysis. Negation is usually considered as a polarity shifter, whereas intensifiers are regarded as amplifiers or diminishers of the strength of such polarity. Modality, in turn, has only been addressed in a very naïve fashion, so that modal forms are treated as polarity blockers. However, processing these constructions as mere polarity modifiers may be adequate for polarity classification, but it is not enough for more complex tasks (e.g., intensity classification), for which a more fine-grained model based on emotions is needed. In this work, we study the effect of modifiers on the emotions affected by them and propose a model of negation, intensifiers, and modality especially conceived for sentiment analysis tasks. We compare our emotion-based strategy with two traditional approaches based on polar expressions and find that representing the text as a set of emotions increases accuracy in different classification tasks and that this representation allows for a more accurate modeling of modifiers that results in further classification improvements. We also study the most common uses of modifiers in opinionated texts and quantify their impact in polarity and intensity classification. Finally, we analyze the joint effect of emotional modifiers and find that interesting synergies exist between them.
  8. Argamon, S.; Whitelaw, C.; Chase, P.; Hota, S.R.; Garg, N.; Levitan, S.: Stylistic text classification using functional lexical features (2007) 0.02
    0.021394843 = product of:
      0.0998426 = sum of:
        0.03496567 = weight(_text_:classification in 280) [ClassicSimilarity], result of:
          0.03496567 = score(doc=280,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3656675 = fieldWeight in 280, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=280)
        0.03496567 = weight(_text_:classification in 280) [ClassicSimilarity], result of:
          0.03496567 = score(doc=280,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.3656675 = fieldWeight in 280, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=280)
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 280) [ClassicSimilarity], result of:
              0.059822515 = score(doc=280,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 280, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=280)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    Most text analysis and retrieval work to date has focused on the topic of a text; that is, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This article develops a new type of lexical feature for use in stylistic text classification, based on taxonomies of various semantic functions of certain choice words or phrases. We demonstrate the usefulness of such features for the stylistic text classification tasks of determining author identity and nationality, the gender of literary characters, a text's sentiment (positive/ negative evaluation), and the rhetorical character of scientific journal articles. We further show how the use of functional features aids in gaining insight about stylistic differences among different kinds of texts.
  9. Way, E.C.: Knowledge representation and metaphor (oder: meaning) (1994) 0.02
    0.01890399 = product of:
      0.088218614 = sum of:
        0.033948522 = weight(_text_:subject in 771) [ClassicSimilarity], result of:
          0.033948522 = score(doc=771,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.31612942 = fieldWeight in 771, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=771)
        0.03799808 = product of:
          0.07599616 = sum of:
            0.07599616 = weight(_text_:schemes in 771) [ClassicSimilarity], result of:
              0.07599616 = score(doc=771,freq=2.0), product of:
                0.16067243 = queryWeight, product of:
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.03002521 = queryNorm
                0.4729882 = fieldWeight in 771, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.3512506 = idf(docFreq=569, maxDocs=44218)
                  0.0625 = fieldNorm(doc=771)
          0.5 = coord(1/2)
        0.016272005 = product of:
          0.03254401 = sum of:
            0.03254401 = weight(_text_:22 in 771) [ClassicSimilarity], result of:
              0.03254401 = score(doc=771,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.30952093 = fieldWeight in 771, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=771)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Content
    Enthält folgende 9 Kapitel: The literal and the metaphoric; Views of metaphor; Knowledge representation; Representation schemes and conceptual graphs; The dynamic type hierarchy theory of metaphor; Computational approaches to metaphor; Thenature and structure of semantic hierarchies; Language games, open texture and family resemblance; Programming the dynamic type hierarchy; Subject index
    Footnote
    Bereits 1991 bei Kluwer publiziert // Rez. in: Knowledge organization 22(1995) no.1, S.48-49 (O. Sechser)
  10. Hsinchun, C.: Knowledge-based document retrieval framework and design (1992) 0.02
    0.018810362 = product of:
      0.08778169 = sum of:
        0.033948522 = weight(_text_:subject in 6686) [ClassicSimilarity], result of:
          0.033948522 = score(doc=6686,freq=2.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.31612942 = fieldWeight in 6686, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.0625 = fieldNorm(doc=6686)
        0.026916584 = weight(_text_:classification in 6686) [ClassicSimilarity], result of:
          0.026916584 = score(doc=6686,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.28149095 = fieldWeight in 6686, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=6686)
        0.026916584 = weight(_text_:classification in 6686) [ClassicSimilarity], result of:
          0.026916584 = score(doc=6686,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.28149095 = fieldWeight in 6686, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=6686)
      0.21428572 = coord(3/14)
    
    Abstract
    Presents research on the design of knowledge-based document retrieval systems in which a semantic network was adopted to represent subject knowledge and classification scheme knowledge and experts' search strategies and user modelling capability were modelled as procedural knowledge. These functionalities were incorporated into a prototype knowledge-based retrieval system, Metacat. Describes a system, the design of which was based on the blackboard architecture, which was able to create a user profile, identify task requirements, suggest heuristics-based search strategies, perform semantic-based search assistance, and assist online query refinement
  11. Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.02
    0.01774993 = product of:
      0.08283301 = sum of:
        0.023791125 = weight(_text_:classification in 1853) [ClassicSimilarity], result of:
          0.023791125 = score(doc=1853,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 1853, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
        0.023791125 = weight(_text_:classification in 1853) [ClassicSimilarity], result of:
          0.023791125 = score(doc=1853,freq=4.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24880521 = fieldWeight in 1853, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1853)
        0.035250753 = product of:
          0.07050151 = sum of:
            0.07050151 = weight(_text_:texts in 1853) [ClassicSimilarity], result of:
              0.07050151 = score(doc=1853,freq=4.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.42830306 = fieldWeight in 1853, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1853)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.
  12. Bird, S.; Dale, R.; Dorr, B.; Gibson, B.; Joseph, M.; Kan, M.-Y.; Lee, D.; Powley, B.; Radev, D.; Tan, Y.F.: ¬The ACL Anthology Reference Corpus : a reference dataset for bibliographic research in computational linguistics (2008) 0.02
    0.01608469 = product of:
      0.07506188 = sum of:
        0.023310447 = weight(_text_:classification in 2804) [ClassicSimilarity], result of:
          0.023310447 = score(doc=2804,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24377833 = fieldWeight in 2804, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03125 = fieldNorm(doc=2804)
        0.028440988 = weight(_text_:bibliographic in 2804) [ClassicSimilarity], result of:
          0.028440988 = score(doc=2804,freq=4.0), product of:
            0.11688946 = queryWeight, product of:
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03002521 = queryNorm
            0.24331525 = fieldWeight in 2804, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.893044 = idf(docFreq=2449, maxDocs=44218)
              0.03125 = fieldNorm(doc=2804)
        0.023310447 = weight(_text_:classification in 2804) [ClassicSimilarity], result of:
          0.023310447 = score(doc=2804,freq=6.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.24377833 = fieldWeight in 2804, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03125 = fieldNorm(doc=2804)
      0.21428572 = coord(3/14)
    
    Abstract
    The ACL Anthology is a digital archive of conference and journal papers in natural language processing and computational linguistics. Its primary purpose is to serve as a reference repository of research results, but we believe that it can also be an object of study and a platform for research in its own right. We describe an enriched and standardized reference corpus derived from the ACL Anthology that can be used for research in scholarly document processing. This corpus, which we call the ACL Anthology Reference Corpus (ACL ARC), brings together the recent activities of a number of research groups around the world. Our goal is to make the corpus widely available, and to encourage other researchers to use it as a standard testbed for experiments in both bibliographic and bibliometric research.
    Content
    Vgl. auch: Automatic Term Recognition (ATR) is a research task that deals with the identification of domain-specific terms. Terms, in simple words, are textual realization of significant concepts in an expertise domain. Additionally, domain-specific terms may be classified into a number of categories, in which each category represents a significant concept. A term classification task is often defined on top of an ATR procedure to perform such categorization. For instance, in the biomedical domain, terms can be classified as drugs, proteins, and genes. This is a reference dataset for terminology extraction and classification research in computational linguistics. It is a set of manually annotated terms in English language that are extracted from the ACL Anthology Reference Corpus (ACL ARC). The ACL ARC is a canonicalised and frozen subset of scientific publications in the domain of Human Language Technologies (HLT). It consists of 10,921 articles from 1965 to 2006. The dataset, called ACL RD-TEC, is comprised of more than 69,000 candidate terms that are manually annotated as valid and invalid terms. Furthermore, valid terms are classified as technology and non-technology terms. Technology terms refer to a method, process, or in general a technological concept in the domain of HLT, e.g. machine translation, word sense disambiguation, and language modelling. On the other hand, non-technology terms refer to important concepts other than technological; examples of such terms in the domain of HLT are multilingual lexicon, corpora, word sense, and language model. The dataset is created to serve as a gold standard for the comparison of the algorithms of term recognition and classification. [http://catalog.elra.info/product_info.php?products_id=1236].
  13. Needham, R.M.; Sparck Jones, K.: Keywords and clumps (1985) 0.02
    0.015786104 = product of:
      0.07366848 = sum of:
        0.021004576 = weight(_text_:subject in 3645) [ClassicSimilarity], result of:
          0.021004576 = score(doc=3645,freq=4.0), product of:
            0.10738805 = queryWeight, product of:
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.03002521 = queryNorm
            0.1955951 = fieldWeight in 3645, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.576596 = idf(docFreq=3361, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3645)
        0.02633195 = weight(_text_:classification in 3645) [ClassicSimilarity], result of:
          0.02633195 = score(doc=3645,freq=10.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.27537692 = fieldWeight in 3645, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3645)
        0.02633195 = weight(_text_:classification in 3645) [ClassicSimilarity], result of:
          0.02633195 = score(doc=3645,freq=10.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.27537692 = fieldWeight in 3645, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3645)
      0.21428572 = coord(3/14)
    
    Abstract
    The selection that follows was chosen as it represents "a very early paper an the possibilities allowed by computers an documentation." In the early 1960s computers were being used to provide simple automatic indexing systems wherein keywords were extracted from documents. The problem with such systems was that they lacked vocabulary control, thus documents related in subject matter were not always collocated in retrieval. To improve retrieval by improving recall is the raison d'être of vocabulary control tools such as classifications and thesauri. The question arose whether it was possible by automatic means to construct classes of terms, which when substituted, one for another, could be used to improve retrieval performance? One of the first theoretical approaches to this question was initiated by R. M. Needham and Karen Sparck Jones at the Cambridge Language Research Institute in England.t The question was later pursued using experimental methodologies by Sparck Jones, who, as a Senior Research Associate in the Computer Laboratory at the University of Cambridge, has devoted her life's work to research in information retrieval and automatic naturai language processing. Based an the principles of numerical taxonomy, automatic classification techniques start from the premise that two objects are similar to the degree that they share attributes in common. When these two objects are keywords, their similarity is measured in terms of the number of documents they index in common. Step 1 in automatic classification is to compute mathematically the degree to which two terms are similar. Step 2 is to group together those terms that are "most similar" to each other, forming equivalence classes of intersubstitutable terms. The technique for forming such classes varies and is the factor that characteristically distinguishes different approaches to automatic classification. The technique used by Needham and Sparck Jones, that of clumping, is described in the selection that follows. Questions that must be asked are whether the use of automatically generated classes really does improve retrieval performance and whether there is a true eco nomic advantage in substituting mechanical for manual labor. Several years after her work with clumping, Sparck Jones was to observe that while it was not wholly satisfactory in itself, it was valuable in that it stimulated research into automatic classification. To this it might be added that it was valuable in that it introduced to libraryl information science the methods of numerical taxonomy, thus stimulating us to think again about the fundamental nature and purpose of classification. In this connection it might be useful to review how automatically derived classes differ from those of manually constructed classifications: 1) the manner of their derivation is purely a posteriori, the ultimate operationalization of the principle of literary warrant; 2) the relationship between members forming such classes is essentially statistical; the members of a given class are similar to each other not because they possess the class-defining characteristic but by virtue of sharing a family resemblance; and finally, 3) automatically derived classes are not related meaningfully one to another, that is, they are not ordered in traditional hierarchical and precedence relationships.
    Source
    Theory of subject analysis: a sourcebook. Ed.: L.M. Chan, et al
  14. Melzer, C.: ¬Der Maschine anpassen : PC-Spracherkennung - Programme sind mittlerweile alltagsreif (2005) 0.02
    0.015459906 = product of:
      0.10821934 = sum of:
        0.10110034 = weight(_text_:henry in 4044) [ClassicSimilarity], result of:
          0.10110034 = score(doc=4044,freq=4.0), product of:
            0.23560001 = queryWeight, product of:
              7.84674 = idf(docFreq=46, maxDocs=44218)
              0.03002521 = queryNorm
            0.42911857 = fieldWeight in 4044, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              7.84674 = idf(docFreq=46, maxDocs=44218)
              0.02734375 = fieldNorm(doc=4044)
        0.0071190023 = product of:
          0.014238005 = sum of:
            0.014238005 = weight(_text_:22 in 4044) [ClassicSimilarity], result of:
              0.014238005 = score(doc=4044,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.1354154 = fieldWeight in 4044, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=4044)
          0.5 = coord(1/2)
      0.14285715 = coord(2/14)
    
    Content
    Billiger geht es mit "Via Voice Standard" von IBM. Die Software kostet etwa 50 Euro, hat aber erhebliche Schwächen in der Lernfähigkeit: Sie schneidet jedoch immer noch besser ab als das gut drei Mal so teure "Voice Office Premium 10"; das im Test der sechs Programme als einziges nur ein "Befriedigend" bekam. "Man liest über Spracherkennung nicht mehr so viel" weil es funktioniert", glaubt Dorothee Wiegand von der in Hannover erscheinenden Computerzeitschrift "c't". Die Technik" etwa "Dragon Naturally Speaking" von ScanSoft, sei ausgereift, "Spracherkennung ist vor allem Statistik, die Auswertung unendlicher Wortmöglichkeiten. Eigentlich war eher die Hardware das Problem", sagt Wiegand. Da jetzt selbst einfache Heimcomputer schnell und leistungsfähig seien, hätten die Entwickler viel mehr Möglichkeiten."Aber selbst ältere Computer kommen mit den Systemen klar. Sie brauchen nur etwas länger! "Jedes Byte macht die Spracherkennung etwas schneller, ungenauer ist sie sonst aber nicht", bestätigt Kristina Henry von linguatec in München. Auch für die Produkte des Herstellers gelte jedoch, dass "üben und deutlich sprechen wichtiger sind als jede Hardware". Selbst Stimmen von Diktiergeräten würden klar, erkannt, versichert Henry: "Wir wollen einen Schritt weiter gehen und das Diktieren von unterwegs möglich machen." Der Benutzer könnte dann eine Nummer anwählen, etwa im Auto einen Text aufsprechen und ihn zu Hause "getippt" vorfinden. Grundsätzlich passt die Spracherkennungssoftware inzwischen auch auf den privaten Computer. Klar ist aber, dass selbst der bestgesprochene Text nachbearbeitet werden muss. Zudem ist vom Nutzer Geduld gefragt: Ebenso wie sein System lernt, muss der Mensch sich in Aussprache und Geschwindigkeit dem System anpassen. Dann sind die Ergebnisse allerdings beachtlich - und "Sexterminvereinbarung" statt "zwecks Terminvereinbarung" gehört der Vergangenheit an."
    Date
    3. 5.1997 8:44:22
  15. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.02
    0.015199591 = product of:
      0.10639714 = sum of:
        0.05319857 = weight(_text_:classification in 831) [ClassicSimilarity], result of:
          0.05319857 = score(doc=831,freq=20.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.55634534 = fieldWeight in 831, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
        0.05319857 = weight(_text_:classification in 831) [ClassicSimilarity], result of:
          0.05319857 = score(doc=831,freq=20.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.55634534 = fieldWeight in 831, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0390625 = fieldNorm(doc=831)
      0.14285715 = coord(2/14)
    
    Abstract
    Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
  16. Rahmstorf, G.: Information retrieval using conceptual representations of phrases (1994) 0.02
    0.015061316 = product of:
      0.07028614 = sum of:
        0.02018744 = weight(_text_:classification in 7862) [ClassicSimilarity], result of:
          0.02018744 = score(doc=7862,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 7862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=7862)
        0.02018744 = weight(_text_:classification in 7862) [ClassicSimilarity], result of:
          0.02018744 = score(doc=7862,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 7862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=7862)
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 7862) [ClassicSimilarity], result of:
              0.059822515 = score(doc=7862,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 7862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=7862)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    The information retrieval problem is described starting from an analysis of the concepts 'user's information request' and 'information offerings of texts'. It is shown that natural language phrases are a more adequate medium for expressing information requests and information offerings than character string based query and indexing languages complemented by Boolean oprators. The phrases must be represented as concepts to reach a language invariant level for rule based relevance analysis. The special type of representation called advanced thesaurus is used for the semantic representation of natural language phrases and for relevance processing. The analysis of the retrieval problem leads to a symmetric system structure
    Series
    Studies in classification, data analysis, and knowledge organization
  17. Sebastiani, F.: Machine learning in automated text categorization (2002) 0.02
    0.015061316 = product of:
      0.07028614 = sum of:
        0.02018744 = weight(_text_:classification in 3389) [ClassicSimilarity], result of:
          0.02018744 = score(doc=3389,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 3389, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3389)
        0.02018744 = weight(_text_:classification in 3389) [ClassicSimilarity], result of:
          0.02018744 = score(doc=3389,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 3389, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3389)
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 3389) [ClassicSimilarity], result of:
              0.059822515 = score(doc=3389,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 3389, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3389)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
  18. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.02
    0.015061316 = product of:
      0.07028614 = sum of:
        0.02018744 = weight(_text_:classification in 3390) [ClassicSimilarity], result of:
          0.02018744 = score(doc=3390,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 3390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
        0.02018744 = weight(_text_:classification in 3390) [ClassicSimilarity], result of:
          0.02018744 = score(doc=3390,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 3390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 3390) [ClassicSimilarity], result of:
              0.059822515 = score(doc=3390,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 3390, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3390)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
  19. AL-Smadi, M.; Jaradat, Z.; AL-Ayyoub, M.; Jararweh, Y.: Paraphrase identification and semantic text similarity analysis in Arabic news tweets using lexical, syntactic, and semantic features (2017) 0.02
    0.015061316 = product of:
      0.07028614 = sum of:
        0.02018744 = weight(_text_:classification in 5095) [ClassicSimilarity], result of:
          0.02018744 = score(doc=5095,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 5095, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=5095)
        0.02018744 = weight(_text_:classification in 5095) [ClassicSimilarity], result of:
          0.02018744 = score(doc=5095,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.21111822 = fieldWeight in 5095, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.046875 = fieldNorm(doc=5095)
        0.029911257 = product of:
          0.059822515 = sum of:
            0.059822515 = weight(_text_:texts in 5095) [ClassicSimilarity], result of:
              0.059822515 = score(doc=5095,freq=2.0), product of:
                0.16460659 = queryWeight, product of:
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.03002521 = queryNorm
                0.36342722 = fieldWeight in 5095, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.4822793 = idf(docFreq=499, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5095)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Abstract
    The rapid growth in digital information has raised considerable challenges in particular when it comes to automated content analysis. Social media such as twitter share a lot of its users' information about their events, opinions, personalities, etc. Paraphrase Identification (PI) is concerned with recognizing whether two texts have the same/similar meaning, whereas the Semantic Text Similarity (STS) is concerned with the degree of that similarity. This research proposes a state-of-the-art approach for paraphrase identification and semantic text similarity analysis in Arabic news tweets. The approach adopts several phases of text processing, features extraction and text classification. Lexical, syntactic, and semantic features are extracted to overcome the weakness and limitations of the current technologies in solving these tasks for the Arabic language. Maximum Entropy (MaxEnt) and Support Vector Regression (SVR) classifiers are trained using these features and are evaluated using a dataset prepared for this research. The experimentation results show that the approach achieves good results in comparison to the baseline results.
  20. Ruge, G.: Sprache und Computer : Wortbedeutung und Termassoziation. Methoden zur automatischen semantischen Klassifikation (1995) 0.02
    0.015022537 = product of:
      0.07010517 = sum of:
        0.026916584 = weight(_text_:classification in 1534) [ClassicSimilarity], result of:
          0.026916584 = score(doc=1534,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.28149095 = fieldWeight in 1534, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=1534)
        0.026916584 = weight(_text_:classification in 1534) [ClassicSimilarity], result of:
          0.026916584 = score(doc=1534,freq=2.0), product of:
            0.09562149 = queryWeight, product of:
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.03002521 = queryNorm
            0.28149095 = fieldWeight in 1534, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1847067 = idf(docFreq=4974, maxDocs=44218)
              0.0625 = fieldNorm(doc=1534)
        0.016272005 = product of:
          0.03254401 = sum of:
            0.03254401 = weight(_text_:22 in 1534) [ClassicSimilarity], result of:
              0.03254401 = score(doc=1534,freq=2.0), product of:
                0.10514317 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03002521 = queryNorm
                0.30952093 = fieldWeight in 1534, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1534)
          0.5 = coord(1/2)
      0.21428572 = coord(3/14)
    
    Content
    Enthält folgende Kapitel: (1) Motivation; (2) Language philosophical foundations; (3) Structural comparison of extensions; (4) Earlier approaches towards term association; (5) Experiments; (6) Spreading-activation networks or memory models; (7) Perspective. Appendices: Heads and modifiers of 'car'. Glossary. Index. Language and computer. Word semantics and term association. Methods towards an automatic semantic classification
    Footnote
    Rez. in: Knowledge organization 22(1995) no.3/4, S.182-184 (M.T. Rolland)

Years

Languages

  • e 144
  • d 23
  • f 4
  • m 2
  • da 1
  • More… Less…

Types

  • a 139
  • el 18
  • m 17
  • s 7
  • x 5
  • p 4
  • b 1
  • d 1
  • More… Less…

Classifications