Document (#40079)

Author
Lin, Y.-R.
Margolin, D.
Lazer, D.
Title
Uncovering social semantics from textual traces : a theory-driven approach and evidence from public statements of U.S. Members of Congress
Source
Journal of the Association for Information Science and Technology. 67(2016) no.9, S.2072-2089
Year
2016
Abstract
The increasing abundance of digital textual archives provides an opportunity for understanding human social systems. Yet the literature has not adequately considered the disparate social processes by which texts are produced. Drawing on communication theory, we identify three common processes by which documents might be detectably similar in their textual features-authors sharing subject matter, sharing goals, and sharing sources. We hypothesize that these processes produce distinct, detectable relationships between authors in different kinds of textual overlap. We develop a novel n-gram extraction technique to capture such signatures based on n-grams of different lengths. We test the hypothesis on a corpus where the author attributes are observable: the public statements of the members of the U.S. Congress. This article presents the first empirical finding that shows different social relationships are detectable through the structure of overlapping textual features. Our study has important implications for designing text modeling techniques to make sense of social phenomena from aggregate digital traces.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23540/full.
Location
USA

Similar documents (content)

  1. Gordon, M.D.; Dumais, S.: Using latent semantic indexing for literature based discovery (1998) 0.13
    0.12770887 = sum of:
      0.12770887 = product of:
        0.5321203 = sum of:
          0.14307863 = weight(abstract_txt:hypothesize in 4892) [ClassicSimilarity], result of:
            0.14307863 = score(doc=4892,freq=1.0), product of:
              0.17826211 = queryWeight, product of:
                1.078847 = boost
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.019299887 = queryNorm
              0.80263054 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.561393 = idf(docFreq=22, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.14531887 = weight(abstract_txt:uncovering in 4892) [ClassicSimilarity], result of:
            0.14531887 = score(doc=4892,freq=1.0), product of:
              0.18011804 = queryWeight, product of:
                1.0844486 = boost
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.019299887 = queryNorm
              0.8067979 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.6058445 = idf(docFreq=21, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.045804724 = weight(abstract_txt:authors in 4892) [ClassicSimilarity], result of:
            0.045804724 = score(doc=4892,freq=1.0), product of:
              0.10510565 = queryWeight, product of:
                1.1715434 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.019299887 = queryNorm
              0.43579698 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.07164552 = weight(abstract_txt:relationships in 4892) [ClassicSimilarity], result of:
            0.07164552 = score(doc=4892,freq=2.0), product of:
              0.11240921 = queryWeight, product of:
                1.2115638 = boost
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.019299887 = queryNorm
              0.63736343 = fieldWeight in 4892, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.03368706 = weight(abstract_txt:different in 4892) [ClassicSimilarity], result of:
            0.03368706 = score(doc=4892,freq=1.0), product of:
              0.09802986 = queryWeight, product of:
                1.3857031 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.019299887 = queryNorm
              0.3436408 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
          0.09258547 = weight(abstract_txt:processes in 4892) [ClassicSimilarity], result of:
            0.09258547 = score(doc=4892,freq=1.0), product of:
              0.1923438 = queryWeight, product of:
                1.9410204 = boost
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.019299887 = queryNorm
              0.48135406 = fieldWeight in 4892, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.09375 = fieldNorm(doc=4892)
        0.24 = coord(6/25)
    
  2. Milard, B.; Tanguy, L.: Citations in scientific texts : do social relations matter? (2018) 0.11
    0.11473098 = sum of:
      0.11473098 = product of:
        0.5736549 = sum of:
          0.01203484 = weight(abstract_txt:from in 4547) [ClassicSimilarity], result of:
            0.01203484 = score(doc=4547,freq=1.0), product of:
              0.055735346 = queryWeight, product of:
                1.0448557 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.019299887 = queryNorm
              0.21592833 = fieldWeight in 4547, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=4547)
          0.050260793 = weight(abstract_txt:features in 4547) [ClassicSimilarity], result of:
            0.050260793 = score(doc=4547,freq=2.0), product of:
              0.1002189 = queryWeight, product of:
                1.1439846 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.019299887 = queryNorm
              0.50151014 = fieldWeight in 4547, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.078125 = fieldNorm(doc=4547)
          0.06611342 = weight(abstract_txt:authors in 4547) [ClassicSimilarity], result of:
            0.06611342 = score(doc=4547,freq=3.0), product of:
              0.10510565 = queryWeight, product of:
                1.1715434 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.019299887 = queryNorm
              0.62901866 = fieldWeight in 4547, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=4547)
          0.15937036 = weight(abstract_txt:social in 4547) [ClassicSimilarity], result of:
            0.15937036 = score(doc=4547,freq=5.0), product of:
              0.21630593 = queryWeight, product of:
                2.6573548 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.019299887 = queryNorm
              0.73678225 = fieldWeight in 4547, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.078125 = fieldNorm(doc=4547)
          0.28587544 = weight(abstract_txt:textual in 4547) [ClassicSimilarity], result of:
            0.28587544 = score(doc=4547,freq=2.0), product of:
              0.43340573 = queryWeight, product of:
                3.761514 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.019299887 = queryNorm
              0.65960234 = fieldWeight in 4547, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.078125 = fieldNorm(doc=4547)
        0.2 = coord(5/25)
    
  3. Tang, R.; Safer, M.A.: Author-rated importance of cited references in biology and psychology publications (2008) 0.10
    0.10315573 = sum of:
      0.10315573 = product of:
        0.42981556 = sum of:
          0.008424388 = weight(abstract_txt:from in 1738) [ClassicSimilarity], result of:
            0.008424388 = score(doc=1738,freq=1.0), product of:
              0.055735346 = queryWeight, product of:
                1.0448557 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.019299887 = queryNorm
              0.15114984 = fieldWeight in 1738, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1738)
          0.08222402 = weight(abstract_txt:lengths in 1738) [ClassicSimilarity], result of:
            0.08222402 = score(doc=1738,freq=1.0), product of:
              0.17649423 = queryWeight, product of:
                1.0734841 = boost
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.019299887 = queryNorm
              0.4658737 = fieldWeight in 1738, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.518833 = idf(docFreq=23, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1738)
          0.049755648 = weight(abstract_txt:features in 1738) [ClassicSimilarity], result of:
            0.049755648 = score(doc=1738,freq=4.0), product of:
              0.1002189 = queryWeight, product of:
                1.1439846 = boost
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.019299887 = queryNorm
              0.4964697 = fieldWeight in 1738, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5391517 = idf(docFreq=1283, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1738)
          0.059746448 = weight(abstract_txt:authors in 1738) [ClassicSimilarity], result of:
            0.059746448 = score(doc=1738,freq=5.0), product of:
              0.10510565 = queryWeight, product of:
                1.1715434 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.019299887 = queryNorm
              0.5684418 = fieldWeight in 1738, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1738)
          0.029552268 = weight(abstract_txt:relationships in 1738) [ClassicSimilarity], result of:
            0.029552268 = score(doc=1738,freq=1.0), product of:
              0.11240921 = queryWeight, product of:
                1.2115638 = boost
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.019299887 = queryNorm
              0.26289898 = fieldWeight in 1738, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.807296 = idf(docFreq=981, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1738)
          0.2001128 = weight(abstract_txt:textual in 1738) [ClassicSimilarity], result of:
            0.2001128 = score(doc=1738,freq=2.0), product of:
              0.43340573 = queryWeight, product of:
                3.761514 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.019299887 = queryNorm
              0.46172166 = fieldWeight in 1738, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1738)
        0.24 = coord(6/25)
    
  4. Baumer, E.P.S.; Mimno, D.; Guha, S.; Quan, E.; Gay, G.K.: Comparing grounded theory and topic modeling : extreme divergence or unlikely convergence? (2017) 0.10
    0.102548435 = sum of:
      0.102548435 = product of:
        0.51274216 = sum of:
          0.020844955 = weight(abstract_txt:from in 3639) [ClassicSimilarity], result of:
            0.020844955 = score(doc=3639,freq=3.0), product of:
              0.055735346 = queryWeight, product of:
                1.0448557 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.019299887 = queryNorm
              0.37399885 = fieldWeight in 3639, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.078125 = fieldNorm(doc=3639)
          0.028072549 = weight(abstract_txt:different in 3639) [ClassicSimilarity], result of:
            0.028072549 = score(doc=3639,freq=1.0), product of:
              0.09802986 = queryWeight, product of:
                1.3857031 = boost
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.019299887 = queryNorm
              0.28636733 = fieldWeight in 3639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6655018 = idf(docFreq=3075, maxDocs=44218)
                0.078125 = fieldNorm(doc=3639)
          0.077154554 = weight(abstract_txt:processes in 3639) [ClassicSimilarity], result of:
            0.077154554 = score(doc=3639,freq=1.0), product of:
              0.1923438 = queryWeight, product of:
                1.9410204 = boost
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.019299887 = queryNorm
              0.40112838 = fieldWeight in 3639, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1344433 = idf(docFreq=707, maxDocs=44218)
                0.078125 = fieldNorm(doc=3639)
          0.10079466 = weight(abstract_txt:social in 3639) [ClassicSimilarity], result of:
            0.10079466 = score(doc=3639,freq=2.0), product of:
              0.21630593 = queryWeight, product of:
                2.6573548 = boost
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.019299887 = queryNorm
              0.46598196 = fieldWeight in 3639, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.2175875 = idf(docFreq=1770, maxDocs=44218)
                0.078125 = fieldNorm(doc=3639)
          0.28587544 = weight(abstract_txt:textual in 3639) [ClassicSimilarity], result of:
            0.28587544 = score(doc=3639,freq=2.0), product of:
              0.43340573 = queryWeight, product of:
                3.761514 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.019299887 = queryNorm
              0.65960234 = fieldWeight in 3639, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.078125 = fieldNorm(doc=3639)
        0.2 = coord(5/25)
    
  5. Robertson, A.M.; Willett, P.: Applications of n-grams in textual information systems (1998) 0.10
    0.10243724 = sum of:
      0.10243724 = product of:
        0.6402328 = sum of:
          0.13293578 = weight(abstract_txt:gram in 4715) [ClassicSimilarity], result of:
            0.13293578 = score(doc=4715,freq=1.0), product of:
              0.15315786 = queryWeight, product of:
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.019299887 = queryNorm
              0.86796576 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.935687 = idf(docFreq=42, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.20744601 = weight(abstract_txt:grams in 4715) [ClassicSimilarity], result of:
            0.20744601 = score(doc=4715,freq=2.0), product of:
              0.16354531 = queryWeight, product of:
                1.0333546 = boost
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.019299887 = queryNorm
              1.2684314 = fieldWeight in 4715, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.200379 = idf(docFreq=32, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.016848776 = weight(abstract_txt:from in 4715) [ClassicSimilarity], result of:
            0.016848776 = score(doc=4715,freq=1.0), product of:
              0.055735346 = queryWeight, product of:
                1.0448557 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.019299887 = queryNorm
              0.30229968 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
          0.28300226 = weight(abstract_txt:textual in 4715) [ClassicSimilarity], result of:
            0.28300226 = score(doc=4715,freq=1.0), product of:
              0.43340573 = queryWeight, product of:
                3.761514 = boost
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.019299887 = queryNorm
              0.65297306 = fieldWeight in 4715, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.9700394 = idf(docFreq=306, maxDocs=44218)
                0.109375 = fieldNorm(doc=4715)
        0.16 = coord(4/25)