Document (#34100)

Author
Kucukyilmaz, T.
Cambazoglu, B.B.
Aykanat, C.
Can, F.
Title
Chat mining : Predicting user and message attributes in computer-mediated communication
Source
Information processing and management. 44(2008) no.4, S.1448-1466
Year
2008
Abstract
The focus of this paper is to investigate the possibility of predicting several user and message attributes in text-based, real-time, online messaging services. For this purpose, a large collection of chat messages is examined. The applicability of various supervised classification techniques for extracting information from the chat messages is evaluated. Two competing models are used for defining the chat mining problem. A term-based approach is used to investigate the user and message attributes in the context of vocabulary use while a style-based approach is used to examine the chat messages according to the variations in the authors' writing styles. Among 100 authors, the identity of an author is correctly predicted with 99.7% accuracy. Moreover, the reverse problem is exploited, and the effect of author attributes on computer-mediated communications is discussed.

Similar documents (author)

  1. Cambazoglu, B. Barla => Barla Cambazoglu, B.: 5.17
    5.171237 = sum of:
      5.171237 = weight(author_txt:cambazoglu in 2506) [ClassicSimilarity], result of:
        5.171237 = fieldWeight in 2506, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.375 = fieldNorm(doc=2506)
    
  2. Arapakis, I.; Cambazoglu, B.B.; Lalmas, M.: On the feasibility of predicting popular news at cold start (2017) 3.66
    3.6566167 = sum of:
      3.6566167 = weight(author_txt:cambazoglu in 3595) [ClassicSimilarity], result of:
        3.6566167 = fieldWeight in 3595, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.375 = fieldNorm(doc=3595)
    
  3. Arapakis, I.; Lalmas, M.; Cambazoglu, B.B.; MarcosM.-C.; Jose, J.M.: User engagement in online news : under the scope of sentiment, interest, affect, and gaze (2014) 3.05
    3.0471804 = sum of:
      3.0471804 = weight(author_txt:cambazoglu in 1497) [ClassicSimilarity], result of:
        3.0471804 = fieldWeight in 1497, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.3125 = fieldNorm(doc=1497)
    
  4. Kucukyilmaz, T.; Cambazoglu, B.B.; Aykanat, C.; Baeza-Yates, R.: ¬A machine learning approach for result caching in web search engines (2017) 3.05
    3.0471804 = sum of:
      3.0471804 = weight(author_txt:cambazoglu in 5100) [ClassicSimilarity], result of:
        3.0471804 = fieldWeight in 5100, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.3125 = fieldNorm(doc=5100)
    
  5. Sarigil, E.; Sengor Altingovde, I.; Blanco, R.; Barla Cambazoglu, B.; Ozcan, R.; Ulusoy, Ö.: Characterizing, predicting, and handling web search queries that match very few or no results (2018) 2.44
    2.4377444 = sum of:
      2.4377444 = weight(author_txt:cambazoglu in 4039) [ClassicSimilarity], result of:
        2.4377444 = fieldWeight in 4039, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.25 = fieldNorm(doc=4039)
    

Similar documents (content)

  1. Zheng, R.; Li, J.; Chen, H.; Huang, Z.: ¬A framework for authorship identification of online messages : writing-style features and classification techniques (2006) 0.18
    0.17991497 = sum of:
      0.17991497 = product of:
        0.64255345 = sum of:
          0.019110158 = weight(abstract_txt:approach in 5276) [ClassicSimilarity], result of:
            0.019110158 = score(doc=5276,freq=2.0), product of:
              0.05772706 = queryWeight, product of:
                1.080168 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.014269156 = queryNorm
              0.33104333 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=5276)
          0.032281924 = weight(abstract_txt:problem in 5276) [ClassicSimilarity], result of:
            0.032281924 = score(doc=5276,freq=2.0), product of:
              0.08187969 = queryWeight, product of:
                1.2864405 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.014269156 = queryNorm
              0.39426047 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.0625 = fieldNorm(doc=5276)
          0.036537055 = weight(abstract_txt:authors in 5276) [ClassicSimilarity], result of:
            0.036537055 = score(doc=5276,freq=2.0), product of:
              0.088925354 = queryWeight, product of:
                1.3406469 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.014269156 = queryNorm
              0.4108733 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=5276)
          0.01249961 = weight(abstract_txt:based in 5276) [ClassicSimilarity], result of:
            0.01249961 = score(doc=5276,freq=1.0), product of:
              0.06273472 = queryWeight, product of:
                1.3791174 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.014269156 = queryNorm
              0.19924548 = fieldWeight in 5276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=5276)
          0.014625716 = weight(abstract_txt:used in 5276) [ClassicSimilarity], result of:
            0.014625716 = score(doc=5276,freq=1.0), product of:
              0.0696608 = queryWeight, product of:
                1.4532537 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.014269156 = queryNorm
              0.2099562 = fieldWeight in 5276, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=5276)
          0.3117216 = weight(abstract_txt:messages in 5276) [ClassicSimilarity], result of:
            0.3117216 = score(doc=5276,freq=6.0), product of:
              0.29469392 = queryWeight, product of:
                2.9890475 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.014269156 = queryNorm
              1.0577809 = fieldWeight in 5276, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0625 = fieldNorm(doc=5276)
          0.21577737 = weight(abstract_txt:message in 5276) [ClassicSimilarity], result of:
            0.21577737 = score(doc=5276,freq=2.0), product of:
              0.33258623 = queryWeight, product of:
                3.1754067 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.014269156 = queryNorm
              0.64878625 = fieldWeight in 5276, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=5276)
        0.28 = coord(7/25)
    
  2. Miah, M.W.R.; Yearwood, J.; Kulkarni, S.: Constructing an inter-post similarity measure to differentiate the psychological stages in offensive chats (2015) 0.14
    0.14187832 = sum of:
      0.14187832 = product of:
        0.7093916 = sum of:
          0.013512923 = weight(abstract_txt:approach in 1846) [ClassicSimilarity], result of:
            0.013512923 = score(doc=1846,freq=1.0), product of:
              0.05772706 = queryWeight, product of:
                1.080168 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.014269156 = queryNorm
              0.234083 = fieldWeight in 1846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=1846)
          0.01249961 = weight(abstract_txt:based in 1846) [ClassicSimilarity], result of:
            0.01249961 = score(doc=1846,freq=1.0), product of:
              0.06273472 = queryWeight, product of:
                1.3791174 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.014269156 = queryNorm
              0.19924548 = fieldWeight in 1846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.0625 = fieldNorm(doc=1846)
          0.020683885 = weight(abstract_txt:used in 1846) [ClassicSimilarity], result of:
            0.020683885 = score(doc=1846,freq=2.0), product of:
              0.0696608 = queryWeight, product of:
                1.4532537 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.014269156 = queryNorm
              0.2969229 = fieldWeight in 1846, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=1846)
          0.06057322 = weight(abstract_txt:mining in 1846) [ClassicSimilarity], result of:
            0.06057322 = score(doc=1846,freq=1.0), product of:
              0.15694001 = queryWeight, product of:
                1.7810185 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.014269156 = queryNorm
              0.38596416 = fieldWeight in 1846, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=1846)
          0.60212195 = weight(abstract_txt:chat in 1846) [ClassicSimilarity], result of:
            0.60212195 = score(doc=1846,freq=4.0), product of:
              0.6203397 = queryWeight, product of:
                5.5986896 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.014269156 = queryNorm
              0.9706327 = fieldWeight in 1846, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=1846)
        0.2 = coord(5/25)
    
  3. Lewis, K.M.; DeGroote, S.L.: Digital reference access points : an analysis of usage (2008) 0.13
    0.12636435 = sum of:
      0.12636435 = product of:
        0.63182175 = sum of:
          0.013512923 = weight(abstract_txt:approach in 551) [ClassicSimilarity], result of:
            0.013512923 = score(doc=551,freq=1.0), product of:
              0.05772706 = queryWeight, product of:
                1.080168 = boost
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.014269156 = queryNorm
              0.234083 = fieldWeight in 551, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.745328 = idf(docFreq=2839, maxDocs=44218)
                0.0625 = fieldNorm(doc=551)
          0.020683885 = weight(abstract_txt:used in 551) [ClassicSimilarity], result of:
            0.020683885 = score(doc=551,freq=2.0), product of:
              0.0696608 = queryWeight, product of:
                1.4532537 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.014269156 = queryNorm
              0.2969229 = fieldWeight in 551, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=551)
          0.019282738 = weight(abstract_txt:user in 551) [ClassicSimilarity], result of:
            0.019282738 = score(doc=551,freq=1.0), product of:
              0.083757326 = queryWeight, product of:
                1.5935241 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.014269156 = queryNorm
              0.23022151 = fieldWeight in 551, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=551)
          0.15257764 = weight(abstract_txt:message in 551) [ClassicSimilarity], result of:
            0.15257764 = score(doc=551,freq=1.0), product of:
              0.33258623 = queryWeight, product of:
                3.1754067 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.014269156 = queryNorm
              0.45876116 = fieldWeight in 551, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.0625 = fieldNorm(doc=551)
          0.42576453 = weight(abstract_txt:chat in 551) [ClassicSimilarity], result of:
            0.42576453 = score(doc=551,freq=2.0), product of:
              0.6203397 = queryWeight, product of:
                5.5986896 = boost
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.014269156 = queryNorm
              0.6863409 = fieldWeight in 551, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.7650614 = idf(docFreq=50, maxDocs=44218)
                0.0625 = fieldNorm(doc=551)
        0.2 = coord(5/25)
    
  4. Madden, A.D.: ¬A definition of information (2000) 0.10
    0.10104978 = sum of:
      0.10104978 = product of:
        0.5052489 = sum of:
          0.02853346 = weight(abstract_txt:problem in 713) [ClassicSimilarity], result of:
            0.02853346 = score(doc=713,freq=1.0), product of:
              0.08187969 = queryWeight, product of:
                1.2864405 = boost
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.014269156 = queryNorm
              0.3484803 = fieldWeight in 713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.460548 = idf(docFreq=1388, maxDocs=44218)
                0.078125 = fieldNorm(doc=713)
          0.032294497 = weight(abstract_txt:authors in 713) [ClassicSimilarity], result of:
            0.032294497 = score(doc=713,freq=1.0), product of:
              0.088925354 = queryWeight, product of:
                1.3406469 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.014269156 = queryNorm
              0.36316413 = fieldWeight in 713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.078125 = fieldNorm(doc=713)
          0.015624512 = weight(abstract_txt:based in 713) [ClassicSimilarity], result of:
            0.015624512 = score(doc=713,freq=1.0), product of:
              0.06273472 = queryWeight, product of:
                1.3791174 = boost
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.014269156 = queryNorm
              0.24905685 = fieldWeight in 713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1879277 = idf(docFreq=4958, maxDocs=44218)
                0.078125 = fieldNorm(doc=713)
          0.15907475 = weight(abstract_txt:messages in 713) [ClassicSimilarity], result of:
            0.15907475 = score(doc=713,freq=1.0), product of:
              0.29469392 = queryWeight, product of:
                2.9890475 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.014269156 = queryNorm
              0.53979653 = fieldWeight in 713, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.078125 = fieldNorm(doc=713)
          0.26972172 = weight(abstract_txt:message in 713) [ClassicSimilarity], result of:
            0.26972172 = score(doc=713,freq=2.0), product of:
              0.33258623 = queryWeight, product of:
                3.1754067 = boost
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.014269156 = queryNorm
              0.8109828 = fieldWeight in 713, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.3401785 = idf(docFreq=77, maxDocs=44218)
                0.078125 = fieldNorm(doc=713)
        0.2 = coord(5/25)
    
  5. Chuang, K.Y.; Yang, C.C.: Informational support exchanges using different computer-mediated communication formats in a social media alcoholism community (2014) 0.10
    0.09528589 = sum of:
      0.09528589 = product of:
        0.34030676 = sum of:
          0.020705765 = weight(abstract_txt:computer in 1179) [ClassicSimilarity], result of:
            0.020705765 = score(doc=1179,freq=1.0), product of:
              0.076725684 = queryWeight, product of:
                1.2452942 = boost
                4.317879 = idf(docFreq=1601, maxDocs=44218)
                0.014269156 = queryNorm
              0.26986745 = fieldWeight in 1179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.317879 = idf(docFreq=1601, maxDocs=44218)
                0.0625 = fieldNorm(doc=1179)
          0.031725727 = weight(abstract_txt:author in 1179) [ClassicSimilarity], result of:
            0.031725727 = score(doc=1179,freq=1.0), product of:
              0.10197357 = queryWeight, product of:
                1.4356395 = boost
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.014269156 = queryNorm
              0.31111714 = fieldWeight in 1179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9778743 = idf(docFreq=827, maxDocs=44218)
                0.0625 = fieldNorm(doc=1179)
          0.014625716 = weight(abstract_txt:used in 1179) [ClassicSimilarity], result of:
            0.014625716 = score(doc=1179,freq=1.0), product of:
              0.0696608 = queryWeight, product of:
                1.4532537 = boost
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.014269156 = queryNorm
              0.2099562 = fieldWeight in 1179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3592992 = idf(docFreq=4177, maxDocs=44218)
                0.0625 = fieldNorm(doc=1179)
          0.040289704 = weight(abstract_txt:investigate in 1179) [ClassicSimilarity], result of:
            0.040289704 = score(doc=1179,freq=1.0), product of:
              0.1195848 = queryWeight, product of:
                1.5546749 = boost
                5.390612 = idf(docFreq=547, maxDocs=44218)
                0.014269156 = queryNorm
              0.33691326 = fieldWeight in 1179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.390612 = idf(docFreq=547, maxDocs=44218)
                0.0625 = fieldNorm(doc=1179)
          0.019282738 = weight(abstract_txt:user in 1179) [ClassicSimilarity], result of:
            0.019282738 = score(doc=1179,freq=1.0), product of:
              0.083757326 = queryWeight, product of:
                1.5935241 = boost
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.014269156 = queryNorm
              0.23022151 = fieldWeight in 1179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6835442 = idf(docFreq=3020, maxDocs=44218)
                0.0625 = fieldNorm(doc=1179)
          0.0864173 = weight(abstract_txt:mediated in 1179) [ClassicSimilarity], result of:
            0.0864173 = score(doc=1179,freq=1.0), product of:
              0.19889036 = queryWeight, product of:
                2.0049727 = boost
                6.9519553 = idf(docFreq=114, maxDocs=44218)
                0.014269156 = queryNorm
              0.4344972 = fieldWeight in 1179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9519553 = idf(docFreq=114, maxDocs=44218)
                0.0625 = fieldNorm(doc=1179)
          0.1272598 = weight(abstract_txt:messages in 1179) [ClassicSimilarity], result of:
            0.1272598 = score(doc=1179,freq=1.0), product of:
              0.29469392 = queryWeight, product of:
                2.9890475 = boost
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.014269156 = queryNorm
              0.43183723 = fieldWeight in 1179, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9093957 = idf(docFreq=119, maxDocs=44218)
                0.0625 = fieldNorm(doc=1179)
        0.28 = coord(7/25)