Document (#37331)

Author
Altmann, E.G.
Cristadoro, G.
Esposti, M.D.
Title
On the origin of long-range correlations in texts
Source
Proceedings of the National Academy of Sciences, 2. Juli 2012. DOI: 10.1073/pnas.1117723109
Year
2012
Abstract
The complexity of human interactions with social and natural phenomena is mirrored in the way we describe our experiences through natural language. In order to retain and convey such a high dimensional information, the statistical properties of our linguistic output has to be highly correlated in time. An example are the robust observations, still largely not understood, of correlations on arbitrary long scales in literary texts. In this paper we explain how long-range correlations flow from highly structured linguistic levels down to the building blocks of a text (words, letters, etc..). By combining calculations and data analysis we show that correlations take form of a bursty sequence of events once we approach the semantically relevant topics of the text. The mechanisms we identify are fairly general and can be equally applied to other hierarchical settings.
Content
Vgl. die Pressemitteilung zum Artikel: Auf der Suche nach dem entscheidenden Wort: die Häufung bestimmter Wörter innerhalb eines Textes macht diese zu Schlüsselwörtern [11. Juli 2012]. Unter: http://www.mpg.de/5894319/statistische_Textanalyse?filter_order=L. Vgl. auch: http://arxiv.org/list/cs.CL/current.
Footnote
Volltext unter: Altmann_etal_Correlations_texts.pdf.
Theme
Computerlinguistik

Similar documents (author)

  1. Altmann, E.: Assessment of reference services (1982) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:altmann in 4608) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 4608, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=4608)
    
  2. Altmann, O.: Internet in Öffentlichen Bibliotheken : Nutzungsmöglichkeiten und Probleme (1996) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:altmann in 5950) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 5950, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=5950)
    
  3. Altmann, O.: Internet in Öffentlichen Bibliotheken : Nutzungsmöglichkeiten und Probleme (1997) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:altmann in 7764) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 7764, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=7764)
    
  4. Altmann, R.: Digitale Dia-Archive : Bilddatenbanken bringen Ordnung in die digitale Foto- und Grafiksammlung (1998) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:altmann in 7791) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 7791, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=7791)
    
  5. Altmann, M.: Metadaten (2005) 5.87
    5.871439 = sum of:
      5.871439 = weight(author_txt:altmann in 4561) [ClassicSimilarity], result of:
        5.871439 = fieldWeight in 4561, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.394302 = idf(docFreq=9, maxDocs=44218)
          0.625 = fieldNorm(doc=4561)
    

Similar documents (content)

  1. Kokol, P.; Podgorelec, V.; Zorman, M.; Kokol, T.; Njivar, T.: Computer and natural language texts : a comparison based on long-range correlations (1999) 0.18
    0.1801105 = sum of:
      0.1801105 = product of:
        0.9005525 = sum of:
          0.09489199 = weight(abstract_txt:range in 4299) [ClassicSimilarity], result of:
            0.09489199 = score(doc=4299,freq=2.0), product of:
              0.1413837 = queryWeight, product of:
                1.487468 = boost
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.018776203 = queryNorm
              0.6711664 = fieldWeight in 4299, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.062254 = idf(docFreq=760, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.095864244 = weight(abstract_txt:natural in 4299) [ClassicSimilarity], result of:
            0.095864244 = score(doc=4299,freq=2.0), product of:
              0.1423478 = queryWeight, product of:
                1.492531 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.018776203 = queryNorm
              0.6734508 = fieldWeight in 4299, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.1322281 = weight(abstract_txt:texts in 4299) [ClassicSimilarity], result of:
            0.1322281 = score(doc=4299,freq=2.0), product of:
              0.17638521 = queryWeight, product of:
                1.6614186 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.018776203 = queryNorm
              0.7496553 = fieldWeight in 4299, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.1595997 = weight(abstract_txt:long in 4299) [ClassicSimilarity], result of:
            0.1595997 = score(doc=4299,freq=2.0), product of:
              0.22889245 = queryWeight, product of:
                2.3179781 = boost
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.018776203 = queryNorm
              0.69726944 = fieldWeight in 4299, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
          0.41796848 = weight(abstract_txt:correlations in 4299) [ClassicSimilarity], result of:
            0.41796848 = score(doc=4299,freq=1.0), product of:
              0.60306203 = queryWeight, product of:
                4.3445415 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.018776203 = queryNorm
              0.6930771 = fieldWeight in 4299, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.09375 = fieldNorm(doc=4299)
        0.2 = coord(5/25)
    
  2. Schwarz, C.: THESYS: Thesaurus Syntax System : a fully automatic thesaurus building aid (1988) 0.09
    0.08617506 = sum of:
      0.08617506 = product of:
        0.7181255 = sum of:
          0.059242558 = weight(abstract_txt:text in 1361) [ClassicSimilarity], result of:
            0.059242558 = score(doc=1361,freq=3.0), product of:
              0.09022047 = queryWeight, product of:
                1.18823 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018776203 = queryNorm
              0.6566421 = fieldWeight in 1361, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=1361)
          0.067786254 = weight(abstract_txt:natural in 1361) [ClassicSimilarity], result of:
            0.067786254 = score(doc=1361,freq=1.0), product of:
              0.1423478 = queryWeight, product of:
                1.492531 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.018776203 = queryNorm
              0.47620165 = fieldWeight in 1361, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.09375 = fieldNorm(doc=1361)
          0.5910967 = weight(abstract_txt:correlations in 1361) [ClassicSimilarity], result of:
            0.5910967 = score(doc=1361,freq=2.0), product of:
              0.60306203 = queryWeight, product of:
                4.3445415 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.018776203 = queryNorm
              0.98015904 = fieldWeight in 1361, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.09375 = fieldNorm(doc=1361)
        0.12 = coord(3/25)
    
  3. Stamatatos, E.: ¬A survey of modern authorship attribution methods (2009) 0.08
    0.0827874 = sum of:
      0.0827874 = product of:
        0.34494752 = sum of:
          0.05454868 = weight(abstract_txt:literary in 2741) [ClassicSimilarity], result of:
            0.05454868 = score(doc=2741,freq=1.0), product of:
              0.12808453 = queryWeight, product of:
                1.0011089 = boost
                6.8140855 = idf(docFreq=131, maxDocs=44218)
                0.018776203 = queryNorm
              0.42588034 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8140855 = idf(docFreq=131, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.03949504 = weight(abstract_txt:text in 2741) [ClassicSimilarity], result of:
            0.03949504 = score(doc=2741,freq=3.0), product of:
              0.09022047 = queryWeight, product of:
                1.18823 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018776203 = queryNorm
              0.4377614 = fieldWeight in 2741, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.04519084 = weight(abstract_txt:natural in 2741) [ClassicSimilarity], result of:
            0.04519084 = score(doc=2741,freq=1.0), product of:
              0.1423478 = queryWeight, product of:
                1.492531 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.018776203 = queryNorm
              0.31746778 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.06233293 = weight(abstract_txt:texts in 2741) [ClassicSimilarity], result of:
            0.06233293 = score(doc=2741,freq=1.0), product of:
              0.17638521 = queryWeight, product of:
                1.6614186 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.018776203 = queryNorm
              0.3533909 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.068144016 = weight(abstract_txt:linguistic in 2741) [ClassicSimilarity], result of:
            0.068144016 = score(doc=2741,freq=1.0), product of:
              0.1871841 = queryWeight, product of:
                1.7115219 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.018776203 = queryNorm
              0.3640481 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
          0.07523603 = weight(abstract_txt:long in 2741) [ClassicSimilarity], result of:
            0.07523603 = score(doc=2741,freq=1.0), product of:
              0.22889245 = queryWeight, product of:
                2.3179781 = boost
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.018776203 = queryNorm
              0.32869598 = fieldWeight in 2741, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2591357 = idf(docFreq=624, maxDocs=44218)
                0.0625 = fieldNorm(doc=2741)
        0.24 = coord(6/25)
    
  4. Kousha, K.; Thelwall, M.: Google Scholar citations and Google Web/URL citations : a multi-discipline exploratory analysis (2007) 0.08
    0.07938348 = sum of:
      0.07938348 = product of:
        0.49614674 = sum of:
          0.06427322 = weight(abstract_txt:correlated in 337) [ClassicSimilarity], result of:
            0.06427322 = score(doc=337,freq=1.0), product of:
              0.14288738 = queryWeight, product of:
                1.0573771 = boost
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.018776203 = queryNorm
              0.44981736 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1970778 = idf(docFreq=89, maxDocs=44218)
                0.0625 = fieldNorm(doc=337)
          0.09726555 = weight(abstract_txt:calculations in 337) [ClassicSimilarity], result of:
            0.09726555 = score(doc=337,freq=1.0), product of:
              0.1883417 = queryWeight, product of:
                1.2139652 = boost
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.018776203 = queryNorm
              0.5164313 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.2629 = idf(docFreq=30, maxDocs=44218)
                0.0625 = fieldNorm(doc=337)
          0.055962283 = weight(abstract_txt:highly in 337) [ClassicSimilarity], result of:
            0.055962283 = score(doc=337,freq=1.0), product of:
              0.16415247 = queryWeight, product of:
                1.6027719 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.018776203 = queryNorm
              0.34091648 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.0625 = fieldNorm(doc=337)
          0.27864566 = weight(abstract_txt:correlations in 337) [ClassicSimilarity], result of:
            0.27864566 = score(doc=337,freq=1.0), product of:
              0.60306203 = queryWeight, product of:
                4.3445415 = boost
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.018776203 = queryNorm
              0.4620514 = fieldWeight in 337, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3928223 = idf(docFreq=73, maxDocs=44218)
                0.0625 = fieldNorm(doc=337)
        0.16 = coord(4/25)
    
  5. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.07
    0.07483782 = sum of:
      0.07483782 = product of:
        0.37418908 = sum of:
          0.049368802 = weight(abstract_txt:text in 3015) [ClassicSimilarity], result of:
            0.049368802 = score(doc=3015,freq=3.0), product of:
              0.09022047 = queryWeight, product of:
                1.18823 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.018776203 = queryNorm
              0.54720175 = fieldWeight in 3015, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.05648855 = weight(abstract_txt:natural in 3015) [ClassicSimilarity], result of:
            0.05648855 = score(doc=3015,freq=1.0), product of:
              0.1423478 = queryWeight, product of:
                1.492531 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.018776203 = queryNorm
              0.39683473 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.06995285 = weight(abstract_txt:highly in 3015) [ClassicSimilarity], result of:
            0.06995285 = score(doc=3015,freq=1.0), product of:
              0.16415247 = queryWeight, product of:
                1.6027719 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.018776203 = queryNorm
              0.4261456 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.07791616 = weight(abstract_txt:texts in 3015) [ClassicSimilarity], result of:
            0.07791616 = score(doc=3015,freq=1.0), product of:
              0.17638521 = queryWeight, product of:
                1.6614186 = boost
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.018776203 = queryNorm
              0.44173864 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.6542544 = idf(docFreq=420, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.12046273 = weight(abstract_txt:linguistic in 3015) [ClassicSimilarity], result of:
            0.12046273 = score(doc=3015,freq=2.0), product of:
              0.1871841 = queryWeight, product of:
                1.7115219 = boost
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.018776203 = queryNorm
              0.6435522 = fieldWeight in 3015, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8247695 = idf(docFreq=354, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
        0.2 = coord(5/25)