Document (#39159)

Author
Egbert, J.
Biber, D.
Davies, M.
Title
Developing a bottom-up, user-based method of web register classification
Source
Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1817-1831
Year
2015
Abstract
This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Content
Vgl.: http://onlinelibrary.wiley.com/doi/10.1002/asi.23308/abstract.
Theme
Automatisches Klassifizieren
Internet

Similar documents (author)

  1. Davies, R.: Classification and ratiocination : a perennial quest (1986) 5.05
    5.0537305 = sum of:
      5.0537305 = weight(author_txt:davies in 683) [ClassicSimilarity], result of:
        5.0537305 = fieldWeight in 683, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.085969 = idf(docFreq=36, maxDocs=44218)
          0.625 = fieldNorm(doc=683)
    
  2. Davies, R.: Outlines of the emerging paradigm in cataloguing (1987) 5.05
    5.0537305 = sum of:
      5.0537305 = weight(author_txt:davies in 1091) [ClassicSimilarity], result of:
        5.0537305 = fieldWeight in 1091, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.085969 = idf(docFreq=36, maxDocs=44218)
          0.625 = fieldNorm(doc=1091)
    
  3. Davies, R.: ¬The creation of new knowledge by information retrieval and classification (1989) 5.05
    5.0537305 = sum of:
      5.0537305 = weight(author_txt:davies in 3874) [ClassicSimilarity], result of:
        5.0537305 = fieldWeight in 3874, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.085969 = idf(docFreq=36, maxDocs=44218)
          0.625 = fieldNorm(doc=3874)
    
  4. Davies, R.: Document, information or knowledge? : choices for librarians (1983) 5.05
    5.0537305 = sum of:
      5.0537305 = weight(author_txt:davies in 3895) [ClassicSimilarity], result of:
        5.0537305 = fieldWeight in 3895, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.085969 = idf(docFreq=36, maxDocs=44218)
          0.625 = fieldNorm(doc=3895)
    
  5. Davies, P.: Artificial intelligence : its role in the information industry (1991) 5.05
    5.0537305 = sum of:
      5.0537305 = weight(author_txt:davies in 7095) [ClassicSimilarity], result of:
        5.0537305 = fieldWeight in 7095, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.085969 = idf(docFreq=36, maxDocs=44218)
          0.625 = fieldNorm(doc=7095)
    

Similar documents (content)

  1. Café, L.M.A.; Souza, R.R.: Sentiment analysis and knowledge organization : an overview of the international literature (2017) 0.11
    0.11233861 = sum of:
      0.11233861 = product of:
        0.46807754 = sum of:
          0.034832884 = weight(abstract_txt:code in 3625) [ClassicSimilarity], result of:
            0.034832884 = score(doc=3625,freq=1.0), product of:
              0.09076886 = queryWeight, product of:
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.014783059 = queryNorm
              0.3837537 = fieldWeight in 3625, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.140059 = idf(docFreq=258, maxDocs=44218)
                0.0625 = fieldNorm(doc=3625)
          0.008969938 = weight(abstract_txt:this in 3625) [ClassicSimilarity], result of:
            0.008969938 = score(doc=3625,freq=2.0), product of:
              0.042056583 = queryWeight, product of:
                1.1789875 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014783059 = queryNorm
              0.21328263 = fieldWeight in 3625, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=3625)
          0.0146772675 = weight(abstract_txt:analysis in 3625) [ClassicSimilarity], result of:
            0.0146772675 = score(doc=3625,freq=1.0), product of:
              0.06427628 = queryWeight, product of:
                1.1900684 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.014783059 = queryNorm
              0.22834657 = fieldWeight in 3625, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=3625)
          0.040616658 = weight(abstract_txt:classification in 3625) [ClassicSimilarity], result of:
            0.040616658 = score(doc=3625,freq=2.0), product of:
              0.11510932 = queryWeight, product of:
                1.9505067 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014783059 = queryNorm
              0.3528529 = fieldWeight in 3625, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=3625)
          0.06825956 = weight(abstract_txt:corpus in 3625) [ClassicSimilarity], result of:
            0.06825956 = score(doc=3625,freq=1.0), product of:
              0.17908652 = queryWeight, product of:
                1.9864517 = boost
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.014783059 = queryNorm
              0.3811541 = fieldWeight in 3625, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0984654 = idf(docFreq=269, maxDocs=44218)
                0.0625 = fieldNorm(doc=3625)
          0.30072123 = weight(abstract_txt:register in 3625) [ClassicSimilarity], result of:
            0.30072123 = score(doc=3625,freq=1.0), product of:
              0.6531957 = queryWeight, product of:
                5.998439 = boost
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.014783059 = queryNorm
              0.4603846 = fieldWeight in 3625, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3661537 = idf(docFreq=75, maxDocs=44218)
                0.0625 = fieldNorm(doc=3625)
        0.24 = coord(6/25)
    
  2. Wartena, C.; Sommer, M.: Automatic classification of scientific records using the German Subject Heading Authority File (SWD) (2012) 0.10
    0.09734903 = sum of:
      0.09734903 = product of:
        0.405621 = sum of:
          0.05385478 = weight(abstract_txt:classifying in 472) [ClassicSimilarity], result of:
            0.05385478 = score(doc=472,freq=1.0), product of:
              0.10458918 = queryWeight, product of:
                1.073433 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.014783059 = queryNorm
              0.5149173 = fieldWeight in 472, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.078125 = fieldNorm(doc=472)
          0.06737227 = weight(abstract_txt:utilize in 472) [ClassicSimilarity], result of:
            0.06737227 = score(doc=472,freq=1.0), product of:
              0.12142964 = queryWeight, product of:
                1.1566286 = boost
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.014783059 = queryNorm
              0.5548256 = fieldWeight in 472, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.1017675 = idf(docFreq=98, maxDocs=44218)
                0.078125 = fieldNorm(doc=472)
          0.011212423 = weight(abstract_txt:this in 472) [ClassicSimilarity], result of:
            0.011212423 = score(doc=472,freq=2.0), product of:
              0.042056583 = queryWeight, product of:
                1.1789875 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014783059 = queryNorm
              0.2666033 = fieldWeight in 472, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.078125 = fieldNorm(doc=472)
          0.08027572 = weight(abstract_txt:classification in 472) [ClassicSimilarity], result of:
            0.08027572 = score(doc=472,freq=5.0), product of:
              0.11510932 = queryWeight, product of:
                1.9505067 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014783059 = queryNorm
              0.69738686 = fieldWeight in 472, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.078125 = fieldNorm(doc=472)
          0.039500862 = weight(abstract_txt:documents in 472) [ClassicSimilarity], result of:
            0.039500862 = score(doc=472,freq=1.0), product of:
              0.12268233 = queryWeight, product of:
                2.0136464 = boost
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.014783059 = queryNorm
              0.32197678 = fieldWeight in 472, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1213026 = idf(docFreq=1949, maxDocs=44218)
                0.078125 = fieldNorm(doc=472)
          0.15340494 = weight(abstract_txt:method in 472) [ClassicSimilarity], result of:
            0.15340494 = score(doc=472,freq=5.0), product of:
              0.19510128 = queryWeight, product of:
                2.9321866 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.014783059 = queryNorm
              0.7862836 = fieldWeight in 472, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.078125 = fieldNorm(doc=472)
        0.24 = coord(6/25)
    
  3. Classification Research Group: ¬The need for a faceted classification as the basis of all methods of information retrieval (1985) 0.09
    0.09302072 = sum of:
      0.09302072 = product of:
        0.38758636 = sum of:
          0.01253587 = weight(abstract_txt:this in 3640) [ClassicSimilarity], result of:
            0.01253587 = score(doc=3640,freq=10.0), product of:
              0.042056583 = queryWeight, product of:
                1.1789875 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014783059 = queryNorm
              0.29807153 = fieldWeight in 3640, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3640)
          0.012972994 = weight(abstract_txt:analysis in 3640) [ClassicSimilarity], result of:
            0.012972994 = score(doc=3640,freq=2.0), product of:
              0.06427628 = queryWeight, product of:
                1.1900684 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.014783059 = queryNorm
              0.20183176 = fieldWeight in 3640, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3640)
          0.03692475 = weight(abstract_txt:categories in 3640) [ClassicSimilarity], result of:
            0.03692475 = score(doc=3640,freq=2.0), product of:
              0.12909287 = queryWeight, product of:
                1.6865441 = boost
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.014783059 = queryNorm
              0.28603244 = fieldWeight in 3640, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.17774 = idf(docFreq=677, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3640)
          0.062181305 = weight(abstract_txt:classification in 3640) [ClassicSimilarity], result of:
            0.062181305 = score(doc=3640,freq=12.0), product of:
              0.11510932 = queryWeight, product of:
                1.9505067 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014783059 = queryNorm
              0.5401935 = fieldWeight in 3640, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3640)
          0.06860477 = weight(abstract_txt:method in 3640) [ClassicSimilarity], result of:
            0.06860477 = score(doc=3640,freq=4.0), product of:
              0.19510128 = queryWeight, product of:
                2.9321866 = boost
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.014783059 = queryNorm
              0.3516367 = fieldWeight in 3640, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.50095 = idf(docFreq=1333, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3640)
          0.19436666 = weight(abstract_txt:bottom in 3640) [ClassicSimilarity], result of:
            0.19436666 = score(doc=3640,freq=2.0), product of:
              0.4471661 = queryWeight, product of:
                3.8443828 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.014783059 = queryNorm
              0.43466327 = fieldWeight in 3640, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0390625 = fieldNorm(doc=3640)
        0.24 = coord(6/25)
    
  4. Bettella, C.; Capodaglio, C.; Ramous, C.; Vettore, M.C.: Declassifying the Library of Congress Classification : the case of the Department of Philosophy Library at the University of Padova (Padua, Italy) (2009) 0.09
    0.09025486 = sum of:
      0.09025486 = product of:
        0.37606192 = sum of:
          0.03355904 = weight(abstract_txt:preliminary in 3271) [ClassicSimilarity], result of:
            0.03355904 = score(doc=3271,freq=1.0), product of:
              0.096785784 = queryWeight, product of:
                1.0326124 = boost
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.014783059 = queryNorm
              0.3467352 = fieldWeight in 3271, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.340301 = idf(docFreq=211, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3271)
          0.043589015 = weight(abstract_txt:phases in 3271) [ClassicSimilarity], result of:
            0.043589015 = score(doc=3271,freq=1.0), product of:
              0.11521877 = queryWeight, product of:
                1.1266608 = boost
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.014783059 = queryNorm
              0.3783152 = fieldWeight in 3271, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9177637 = idf(docFreq=118, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3271)
          0.0055498662 = weight(abstract_txt:this in 3271) [ClassicSimilarity], result of:
            0.0055498662 = score(doc=3271,freq=1.0), product of:
              0.042056583 = queryWeight, product of:
                1.1789875 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014783059 = queryNorm
              0.1319619 = fieldWeight in 3271, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3271)
          0.043526914 = weight(abstract_txt:classification in 3271) [ClassicSimilarity], result of:
            0.043526914 = score(doc=3271,freq=3.0), product of:
              0.11510932 = queryWeight, product of:
                1.9505067 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014783059 = queryNorm
              0.37813544 = fieldWeight in 3271, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3271)
          0.057423882 = weight(abstract_txt:project in 3271) [ClassicSimilarity], result of:
            0.057423882 = score(doc=3271,freq=3.0), product of:
              0.13846295 = queryWeight, product of:
                2.1392374 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.014783059 = queryNorm
              0.4147238 = fieldWeight in 3271, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3271)
          0.1924132 = weight(abstract_txt:bottom in 3271) [ClassicSimilarity], result of:
            0.1924132 = score(doc=3271,freq=1.0), product of:
              0.4471661 = queryWeight, product of:
                3.8443828 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.014783059 = queryNorm
              0.4302947 = fieldWeight in 3271, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3271)
        0.24 = coord(6/25)
    
  5. Wagger, S.; Park, R.; Bedford, D.A.D.: Lessons learned in content architecture harmonization and metadata models (2010) 0.09
    0.08854478 = sum of:
      0.08854478 = product of:
        0.3689366 = sum of:
          0.043083824 = weight(abstract_txt:classifying in 3943) [ClassicSimilarity], result of:
            0.043083824 = score(doc=3943,freq=1.0), product of:
              0.10458918 = queryWeight, product of:
                1.073433 = boost
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.014783059 = queryNorm
              0.41193387 = fieldWeight in 3943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.590942 = idf(docFreq=164, maxDocs=44218)
                0.0625 = fieldNorm(doc=3943)
          0.008969938 = weight(abstract_txt:this in 3943) [ClassicSimilarity], result of:
            0.008969938 = score(doc=3943,freq=2.0), product of:
              0.042056583 = queryWeight, product of:
                1.1789875 = boost
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.014783059 = queryNorm
              0.21328263 = fieldWeight in 3943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4130175 = idf(docFreq=10762, maxDocs=44218)
                0.0625 = fieldNorm(doc=3943)
          0.0146772675 = weight(abstract_txt:analysis in 3943) [ClassicSimilarity], result of:
            0.0146772675 = score(doc=3943,freq=1.0), product of:
              0.06427628 = queryWeight, product of:
                1.1900684 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.014783059 = queryNorm
              0.22834657 = fieldWeight in 3943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.0625 = fieldNorm(doc=3943)
          0.028720316 = weight(abstract_txt:classification in 3943) [ClassicSimilarity], result of:
            0.028720316 = score(doc=3943,freq=1.0), product of:
              0.11510932 = queryWeight, product of:
                1.9505067 = boost
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.014783059 = queryNorm
              0.2495047 = fieldWeight in 3943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9920752 = idf(docFreq=2218, maxDocs=44218)
                0.0625 = fieldNorm(doc=3943)
          0.053584456 = weight(abstract_txt:project in 3943) [ClassicSimilarity], result of:
            0.053584456 = score(doc=3943,freq=2.0), product of:
              0.13846295 = queryWeight, product of:
                2.1392374 = boost
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.014783059 = queryNorm
              0.38699493 = fieldWeight in 3943, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.378348 = idf(docFreq=1507, maxDocs=44218)
                0.0625 = fieldNorm(doc=3943)
          0.21990079 = weight(abstract_txt:bottom in 3943) [ClassicSimilarity], result of:
            0.21990079 = score(doc=3943,freq=1.0), product of:
              0.4471661 = queryWeight, product of:
                3.8443828 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.014783059 = queryNorm
              0.49176535 = fieldWeight in 3943, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0625 = fieldNorm(doc=3943)
        0.24 = coord(6/25)