Document (#33499)

Author
Craven, T.C.
Title
Determining authorship of Web pages
Source
Knowledge organization, information systems and other essays: Professor A. Neelameghan Festschrift. Ed. by K.S. Raghavan and K.N. Prasad
Imprint
New Delhi : Ess Ess Publications
Year
2006
Pages
S.237-246
Abstract
Assignability of authors to Web pages using either normal browsing procedures or browsing assisted by simple automatic extraction was investigated. Candidate strings for 1000 pages were extracted automatically from title elements, meta-tags, and address-like and copyright-like passages; 539 of the pages produced at least one candidate: 310 candidates from titles, 66 from meta-tags, 91 from address-like passages, and 259 from copyright-like passages. An assistant attempted to identify personal authors for 943 pages by examining the pages themselves and related pages; this added 90 pages with authors to the pages from which no candidate strings were extracted. Specific problems are noted and some refinements to the extraction methods are suggested.
Theme
Informetrie

Similar documents (author)

  1. Craven, T.C.: ¬An online index entry format based on multiple search terms (1987) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:craven in 438) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 438, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=438)
    
  2. Craven, T.C.: Adapting of string indexing systems for retrieval using proximity operators (1988) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:craven in 705) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 705, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=705)
    
  3. Craven, T.C.: Customized extracts based on Boolean queries and sentence dependency structures (1989) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:craven in 789) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 789, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=789)
    
  4. Craven, T.C.: Research in document classification and indexing (Canada) 1971-1980 (1981) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:craven in 1211) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 1211, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=1211)
    
  5. Craven, T.C.: NEPHIS: a nested phrase indexing system (1977) 5.21
    5.2059946 = sum of:
      5.2059946 = weight(author_txt:craven in 1333) [ClassicSimilarity], result of:
        5.2059946 = fieldWeight in 1333, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.329592 = idf(docFreq=28, maxDocs=44218)
          0.625 = fieldNorm(doc=1333)
    

Similar documents (content)

  1. Craven, T.C.: 'DESCRIPTION' META tags in locally linked web pages (2001) 0.15
    0.15373836 = sum of:
      0.15373836 = product of:
        0.7686918 = sum of:
          0.03182029 = weight(abstract_txt:were in 701) [ClassicSimilarity], result of:
            0.03182029 = score(doc=701,freq=3.0), product of:
              0.05339476 = queryWeight, product of:
                1.1314435 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.012858556 = queryNorm
              0.59594405 = fieldWeight in 701, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.09375 = fieldNorm(doc=701)
          0.09388704 = weight(abstract_txt:tags in 701) [ClassicSimilarity], result of:
            0.09388704 = score(doc=701,freq=1.0), product of:
              0.1584188 = queryWeight, product of:
                1.9488881 = boost
                6.321609 = idf(docFreq=215, maxDocs=44218)
                0.012858556 = queryNorm
              0.59265083 = fieldWeight in 701, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.321609 = idf(docFreq=215, maxDocs=44218)
                0.09375 = fieldNorm(doc=701)
          0.09947488 = weight(abstract_txt:meta in 701) [ClassicSimilarity], result of:
            0.09947488 = score(doc=701,freq=1.0), product of:
              0.16464375 = queryWeight, product of:
                1.9868091 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.012858556 = queryNorm
              0.60418254 = fieldWeight in 701, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.09375 = fieldNorm(doc=701)
          0.033290304 = weight(abstract_txt:from in 701) [ClassicSimilarity], result of:
            0.033290304 = score(doc=701,freq=2.0), product of:
              0.09084727 = queryWeight, product of:
                2.556231 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.012858556 = queryNorm
              0.36644253 = fieldWeight in 701, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.09375 = fieldNorm(doc=701)
          0.5102193 = weight(abstract_txt:pages in 701) [ClassicSimilarity], result of:
            0.5102193 = score(doc=701,freq=3.0), product of:
              0.56053746 = queryWeight, product of:
                7.776642 = boost
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.012858556 = queryNorm
              0.9102323 = fieldWeight in 701, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.09375 = fieldNorm(doc=701)
        0.2 = coord(5/25)
    
  2. Bar-Ilan, J.: ¬The Web as an information source on informetrics? : A content analysis (2000) 0.13
    0.12970208 = sum of:
      0.12970208 = product of:
        0.64851034 = sum of:
          0.021213528 = weight(abstract_txt:were in 4587) [ClassicSimilarity], result of:
            0.021213528 = score(doc=4587,freq=3.0), product of:
              0.05339476 = queryWeight, product of:
                1.1314435 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.012858556 = queryNorm
              0.39729604 = fieldWeight in 4587, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=4587)
          0.10028585 = weight(abstract_txt:extracted in 4587) [ClassicSimilarity], result of:
            0.10028585 = score(doc=4587,freq=3.0), product of:
              0.1504007 = queryWeight, product of:
                1.8989278 = boost
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.012858556 = queryNorm
              0.66679114 = fieldWeight in 4587, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.159553 = idf(docFreq=253, maxDocs=44218)
                0.0625 = fieldNorm(doc=4587)
          0.05279302 = weight(abstract_txt:authors in 4587) [ClassicSimilarity], result of:
            0.05279302 = score(doc=4587,freq=2.0), product of:
              0.12848978 = queryWeight, product of:
                2.1496286 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.012858556 = queryNorm
              0.4108733 = fieldWeight in 4587, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=4587)
          0.03509106 = weight(abstract_txt:from in 4587) [ClassicSimilarity], result of:
            0.03509106 = score(doc=4587,freq=5.0), product of:
              0.09084727 = queryWeight, product of:
                2.556231 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.012858556 = queryNorm
              0.38626435 = fieldWeight in 4587, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=4587)
          0.43912688 = weight(abstract_txt:pages in 4587) [ClassicSimilarity], result of:
            0.43912688 = score(doc=4587,freq=5.0), product of:
              0.56053746 = queryWeight, product of:
                7.776642 = boost
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.012858556 = queryNorm
              0.7834033 = fieldWeight in 4587, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.0625 = fieldNorm(doc=4587)
        0.2 = coord(5/25)
    
  3. Turner, T.P.; Brackbill, L.: Rising to the top : evaluating the use of HTML META tag to improve retrieval of World Wide Web documents through Internet search engines (1998) 0.13
    0.12872665 = sum of:
      0.12872665 = product of:
        0.64363325 = sum of:
          0.021213528 = weight(abstract_txt:were in 5230) [ClassicSimilarity], result of:
            0.021213528 = score(doc=5230,freq=3.0), product of:
              0.05339476 = queryWeight, product of:
                1.1314435 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.012858556 = queryNorm
              0.39729604 = fieldWeight in 5230, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=5230)
          0.10841141 = weight(abstract_txt:tags in 5230) [ClassicSimilarity], result of:
            0.10841141 = score(doc=5230,freq=3.0), product of:
              0.1584188 = queryWeight, product of:
                1.9488881 = boost
                6.321609 = idf(docFreq=215, maxDocs=44218)
                0.012858556 = queryNorm
              0.6843342 = fieldWeight in 5230, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.321609 = idf(docFreq=215, maxDocs=44218)
                0.0625 = fieldNorm(doc=5230)
          0.19894975 = weight(abstract_txt:meta in 5230) [ClassicSimilarity], result of:
            0.19894975 = score(doc=5230,freq=9.0), product of:
              0.16464375 = queryWeight, product of:
                1.9868091 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.012858556 = queryNorm
              1.2083651 = fieldWeight in 5230, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.0625 = fieldNorm(doc=5230)
          0.037330303 = weight(abstract_txt:authors in 5230) [ClassicSimilarity], result of:
            0.037330303 = score(doc=5230,freq=1.0), product of:
              0.12848978 = queryWeight, product of:
                2.1496286 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.012858556 = queryNorm
              0.2905313 = fieldWeight in 5230, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=5230)
          0.27772823 = weight(abstract_txt:pages in 5230) [ClassicSimilarity], result of:
            0.27772823 = score(doc=5230,freq=2.0), product of:
              0.56053746 = queryWeight, product of:
                7.776642 = boost
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.012858556 = queryNorm
              0.49546772 = fieldWeight in 5230, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.0625 = fieldNorm(doc=5230)
        0.2 = coord(5/25)
    
  4. Ajiferuke, I.; Wolfram, D.: Analysis of Web page image tag distribution characteristics (2005) 0.12
    0.1239733 = sum of:
      0.1239733 = product of:
        0.6198665 = sum of:
          0.012247636 = weight(abstract_txt:were in 1059) [ClassicSimilarity], result of:
            0.012247636 = score(doc=1059,freq=1.0), product of:
              0.05339476 = queryWeight, product of:
                1.1314435 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.012858556 = queryNorm
              0.22937898 = fieldWeight in 1059, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.0625 = fieldNorm(doc=1059)
          0.088517554 = weight(abstract_txt:tags in 1059) [ClassicSimilarity], result of:
            0.088517554 = score(doc=1059,freq=2.0), product of:
              0.1584188 = queryWeight, product of:
                1.9488881 = boost
                6.321609 = idf(docFreq=215, maxDocs=44218)
                0.012858556 = queryNorm
              0.5587566 = fieldWeight in 1059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.321609 = idf(docFreq=215, maxDocs=44218)
                0.0625 = fieldNorm(doc=1059)
          0.05279302 = weight(abstract_txt:authors in 1059) [ClassicSimilarity], result of:
            0.05279302 = score(doc=1059,freq=2.0), product of:
              0.12848978 = queryWeight, product of:
                2.1496286 = boost
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.012858556 = queryNorm
              0.4108733 = fieldWeight in 1059, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.648501 = idf(docFreq=1150, maxDocs=44218)
                0.0625 = fieldNorm(doc=1059)
          0.027181419 = weight(abstract_txt:from in 1059) [ClassicSimilarity], result of:
            0.027181419 = score(doc=1059,freq=3.0), product of:
              0.09084727 = queryWeight, product of:
                2.556231 = boost
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.012858556 = queryNorm
              0.29919907 = fieldWeight in 1059, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.7638826 = idf(docFreq=7577, maxDocs=44218)
                0.0625 = fieldNorm(doc=1059)
          0.43912688 = weight(abstract_txt:pages in 1059) [ClassicSimilarity], result of:
            0.43912688 = score(doc=1059,freq=5.0), product of:
              0.56053746 = queryWeight, product of:
                7.776642 = boost
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.012858556 = queryNorm
              0.7834033 = fieldWeight in 1059, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.0625 = fieldNorm(doc=1059)
        0.2 = coord(5/25)
    
  5. Craven, T.C.: Variations in use of meta tag descriptions by Web pages in different languages (2004) 0.10
    0.10293323 = sum of:
      0.10293323 = product of:
        0.85777694 = sum of:
          0.036742907 = weight(abstract_txt:were in 2569) [ClassicSimilarity], result of:
            0.036742907 = score(doc=2569,freq=4.0), product of:
              0.05339476 = queryWeight, product of:
                1.1314435 = boost
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.012858556 = queryNorm
              0.68813694 = fieldWeight in 2569, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.6700637 = idf(docFreq=3061, maxDocs=44218)
                0.09375 = fieldNorm(doc=2569)
          0.09947488 = weight(abstract_txt:meta in 2569) [ClassicSimilarity], result of:
            0.09947488 = score(doc=2569,freq=1.0), product of:
              0.16464375 = queryWeight, product of:
                1.9868091 = boost
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.012858556 = queryNorm
              0.60418254 = fieldWeight in 2569, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.444614 = idf(docFreq=190, maxDocs=44218)
                0.09375 = fieldNorm(doc=2569)
          0.72155917 = weight(abstract_txt:pages in 2569) [ClassicSimilarity], result of:
            0.72155917 = score(doc=2569,freq=6.0), product of:
              0.56053746 = queryWeight, product of:
                7.776642 = boost
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.012858556 = queryNorm
              1.287263 = fieldWeight in 2569, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.6055775 = idf(docFreq=441, maxDocs=44218)
                0.09375 = fieldNorm(doc=2569)
        0.12 = coord(3/25)