Document (#32281)

Author
Argamon, S.
Whitelaw, C.
Chase, P.
Hota, S.R.
Garg, N.
Levitan, S.
Title
Stylistic text classification using functional lexical features
Source
Journal of the American Society for Information Science and Technology. 58(2007) no.6, S.802-822
Year
2007
Abstract
Most text analysis and retrieval work to date has focused on the topic of a text; that is, what it is about. However, a text also contains much useful information in its style, or how it is written. This includes information about its author, its purpose, feelings it is meant to evoke, and more. This article develops a new type of lexical feature for use in stylistic text classification, based on taxonomies of various semantic functions of certain choice words or phrases. We demonstrate the usefulness of such features for the stylistic text classification tasks of determining author identity and nationality, the gender of literary characters, a text's sentiment (positive/ negative evaluation), and the rhetorical character of scientific journal articles. We further show how the use of functional features aids in gaining insight about stylistic differences among different kinds of texts.
Theme
Computerlinguistik

Similar documents (content)

  1. Armstrong, G.: Computer-assisted literary analysis using the TACT a text-retrieval program (1996) 0.23
    0.2262296 = sum of:
      0.2262296 = product of:
        1.131148 = sum of:
          0.07953783 = weight(abstract_txt:literary in 5690) [ClassicSimilarity], result of:
            0.07953783 = score(doc=5690,freq=1.0), product of:
              0.10672057 = queryWeight, product of:
                1.0458906 = boost
                6.8140855 = idf(docFreq=131, maxDocs=44218)
                0.014974568 = queryNorm
              0.7452906 = fieldWeight in 5690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8140855 = idf(docFreq=131, maxDocs=44218)
                0.109375 = fieldNorm(doc=5690)
          0.1397664 = weight(abstract_txt:lexical in 5690) [ClassicSimilarity], result of:
            0.1397664 = score(doc=5690,freq=1.0), product of:
              0.19579914 = queryWeight, product of:
                2.0034688 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.014974568 = queryNorm
              0.71382535 = fieldWeight in 5690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.109375 = fieldNorm(doc=5690)
          0.07053365 = weight(abstract_txt:features in 5690) [ClassicSimilarity], result of:
            0.07053365 = score(doc=5690,freq=1.0), product of:
              0.1420704 = queryWeight, product of:
                2.0901363 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014974568 = queryNorm
              0.4964697 = fieldWeight in 5690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.109375 = fieldNorm(doc=5690)
          0.14106125 = weight(abstract_txt:text in 5690) [ClassicSimilarity], result of:
            0.14106125 = score(doc=5690,freq=2.0), product of:
              0.22551624 = queryWeight, product of:
                3.7241461 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014974568 = queryNorm
              0.6255037 = fieldWeight in 5690, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.109375 = fieldNorm(doc=5690)
          0.70024884 = weight(abstract_txt:stylistic in 5690) [ClassicSimilarity], result of:
            0.70024884 = score(doc=5690,freq=1.0), product of:
              0.7223049 = queryWeight, product of:
                5.4419236 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.014974568 = queryNorm
              0.96946436 = fieldWeight in 5690, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.109375 = fieldNorm(doc=5690)
        0.2 = coord(5/25)
    
  2. Montesi, M.; Urdiciain, B.G.: Recent linguistic research into author abstracts : its value for information science (2005) 0.13
    0.12618336 = sum of:
      0.12618336 = product of:
        0.6309168 = sum of:
          0.039726198 = weight(abstract_txt:character in 4823) [ClassicSimilarity], result of:
            0.039726198 = score(doc=4823,freq=1.0), product of:
              0.09756087 = queryWeight, product of:
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.014974568 = queryNorm
              0.407194 = fieldWeight in 4823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.515104 = idf(docFreq=177, maxDocs=44218)
                0.0625 = fieldNorm(doc=4823)
          0.070876904 = weight(abstract_txt:author in 4823) [ClassicSimilarity], result of:
            0.070876904 = score(doc=4823,freq=4.0), product of:
              0.113907106 = queryWeight, product of:
                1.5281029 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.014974568 = queryNorm
              0.6222343 = fieldWeight in 4823, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=4823)
          0.07986651 = weight(abstract_txt:lexical in 4823) [ClassicSimilarity], result of:
            0.07986651 = score(doc=4823,freq=1.0), product of:
              0.19579914 = queryWeight, product of:
                2.0034688 = boost
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.014974568 = queryNorm
              0.4079002 = fieldWeight in 4823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5264034 = idf(docFreq=175, maxDocs=44218)
                0.0625 = fieldNorm(doc=4823)
          0.040304944 = weight(abstract_txt:features in 4823) [ClassicSimilarity], result of:
            0.040304944 = score(doc=4823,freq=1.0), product of:
              0.1420704 = queryWeight, product of:
                2.0901363 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014974568 = queryNorm
              0.28369698 = fieldWeight in 4823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=4823)
          0.4001422 = weight(abstract_txt:stylistic in 4823) [ClassicSimilarity], result of:
            0.4001422 = score(doc=4823,freq=1.0), product of:
              0.7223049 = queryWeight, product of:
                5.4419236 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.014974568 = queryNorm
              0.55397964 = fieldWeight in 4823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=4823)
        0.2 = coord(5/25)
    
  3. HaCohen-Kerner, Y.; Beck, H.; Yehudai, E.; Rosenstein, M.; Mughaz, D.: Cuisine : classification using stylistic feature sets and/or name-based feature sets (2010) 0.12
    0.11555391 = sum of:
      0.11555391 = product of:
        0.9629493 = sum of:
          0.072540306 = weight(abstract_txt:classification in 3706) [ClassicSimilarity], result of:
            0.072540306 = score(doc=3706,freq=7.0), product of:
              0.10988835 = queryWeight, product of:
                1.8382249 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014974568 = queryNorm
              0.66012734 = fieldWeight in 3706, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=3706)
          0.09012459 = weight(abstract_txt:features in 3706) [ClassicSimilarity], result of:
            0.09012459 = score(doc=3706,freq=5.0), product of:
              0.1420704 = queryWeight, product of:
                2.0901363 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014974568 = queryNorm
              0.63436574 = fieldWeight in 3706, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=3706)
          0.8002844 = weight(abstract_txt:stylistic in 3706) [ClassicSimilarity], result of:
            0.8002844 = score(doc=3706,freq=4.0), product of:
              0.7223049 = queryWeight, product of:
                5.4419236 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.014974568 = queryNorm
              1.1079593 = fieldWeight in 3706, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=3706)
        0.12 = coord(3/25)
    
  4. Arakawa, Y.; Kameda, A.; Aizawa, A.; Suzuki, T.: Adding Twitter-specific features to stylistic features for classifying tweets by user type and number of retweets (2014) 0.10
    0.09874781 = sum of:
      0.09874781 = product of:
        0.61717385 = sum of:
          0.061307754 = weight(abstract_txt:classification in 1307) [ClassicSimilarity], result of:
            0.061307754 = score(doc=1307,freq=5.0), product of:
              0.10988835 = queryWeight, product of:
                1.8382249 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014974568 = queryNorm
              0.5579095 = fieldWeight in 1307, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.09872655 = weight(abstract_txt:features in 1307) [ClassicSimilarity], result of:
            0.09872655 = score(doc=1307,freq=6.0), product of:
              0.1420704 = queryWeight, product of:
                2.0901363 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014974568 = queryNorm
              0.69491285 = fieldWeight in 1307, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.05699735 = weight(abstract_txt:text in 1307) [ClassicSimilarity], result of:
            0.05699735 = score(doc=1307,freq=1.0), product of:
              0.22551624 = queryWeight, product of:
                3.7241461 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014974568 = queryNorm
              0.25274166 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
          0.4001422 = weight(abstract_txt:stylistic in 1307) [ClassicSimilarity], result of:
            0.4001422 = score(doc=1307,freq=1.0), product of:
              0.7223049 = queryWeight, product of:
                5.4419236 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.014974568 = queryNorm
              0.55397964 = fieldWeight in 1307, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.0625 = fieldNorm(doc=1307)
        0.16 = coord(4/25)
    
  5. Nistico, R.: Studio e indicizzazione delle dediche librarie (1998) 0.09
    0.09085021 = sum of:
      0.09085021 = product of:
        0.7570851 = sum of:
          0.09641442 = weight(abstract_txt:literary in 2823) [ClassicSimilarity], result of:
            0.09641442 = score(doc=2823,freq=2.0), product of:
              0.10672057 = queryWeight, product of:
                1.0458906 = boost
                6.8140855 = idf(docFreq=131, maxDocs=44218)
                0.014974568 = queryNorm
              0.9034286 = fieldWeight in 2823, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8140855 = idf(docFreq=131, maxDocs=44218)
                0.09375 = fieldNorm(doc=2823)
          0.060457412 = weight(abstract_txt:features in 2823) [ClassicSimilarity], result of:
            0.060457412 = score(doc=2823,freq=1.0), product of:
              0.1420704 = queryWeight, product of:
                2.0901363 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.014974568 = queryNorm
              0.42554545 = fieldWeight in 2823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.09375 = fieldNorm(doc=2823)
          0.6002133 = weight(abstract_txt:stylistic in 2823) [ClassicSimilarity], result of:
            0.6002133 = score(doc=2823,freq=1.0), product of:
              0.7223049 = queryWeight, product of:
                5.4419236 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.014974568 = queryNorm
              0.83096945 = fieldWeight in 2823, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.09375 = fieldNorm(doc=2823)
        0.12 = coord(3/25)