Document (#20675)

Author
Wolfekuhler, M.R.
Punch, W.F.
Title
Finding salient features for personal Web pages categories
Source
Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156
Year
1997
Abstract
Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
Footnote
Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
Theme
Internet
Automatisches Indexieren
Metadaten

Similar documents (content)

  1. Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.18
    0.17740154 = sum of:
      0.17740154 = product of:
        0.6335769 = sum of:
          0.057011493 = weight(abstract_txt:accuracy in 1381) [ClassicSimilarity], result of:
            0.057011493 = score(doc=1381,freq=1.0), product of:
              0.15139823 = queryWeight, product of:
                1.1567633 = boost
                6.025063 = idf(docFreq=277, maxDocs=42306)
                0.021722743 = queryNorm
              0.37656644 = fieldWeight in 1381, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.025063 = idf(docFreq=277, maxDocs=42306)
                0.0625 = fieldNorm(doc=1381)
          0.12563841 = weight(abstract_txt:clusters in 1381) [ClassicSimilarity], result of:
            0.12563841 = score(doc=1381,freq=3.0), product of:
              0.17776805 = queryWeight, product of:
                1.2534615 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.021722743 = queryNorm
              0.7067547 = fieldWeight in 1381, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.0625 = fieldNorm(doc=1381)
          0.0875881 = weight(abstract_txt:extracting in 1381) [ClassicSimilarity], result of:
            0.0875881 = score(doc=1381,freq=1.0), product of:
              0.20157775 = queryWeight, product of:
                1.3347669 = boost
                6.9522038 = idf(docFreq=109, maxDocs=42306)
                0.021722743 = queryNorm
              0.43451273 = fieldWeight in 1381, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9522038 = idf(docFreq=109, maxDocs=42306)
                0.0625 = fieldNorm(doc=1381)
          0.018837638 = weight(abstract_txt:that in 1381) [ClassicSimilarity], result of:
            0.018837638 = score(doc=1381,freq=3.0), product of:
              0.07235971 = queryWeight, product of:
                1.3851384 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.021722743 = queryNorm
              0.26033324 = fieldWeight in 1381, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=1381)
          0.048378736 = weight(abstract_txt:techniques in 1381) [ClassicSimilarity], result of:
            0.048378736 = score(doc=1381,freq=1.0), product of:
              0.17097221 = queryWeight, product of:
                1.7384487 = boost
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.021722743 = queryNorm
              0.28296256 = fieldWeight in 1381, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.0625 = fieldNorm(doc=1381)
          0.1470019 = weight(abstract_txt:word in 1381) [ClassicSimilarity], result of:
            0.1470019 = score(doc=1381,freq=3.0), product of:
              0.2486933 = queryWeight, product of:
                2.0966752 = boost
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.021722743 = queryNorm
              0.5910972 = fieldWeight in 1381, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.0625 = fieldNorm(doc=1381)
          0.14912061 = weight(abstract_txt:features in 1381) [ClassicSimilarity], result of:
            0.14912061 = score(doc=1381,freq=4.0), product of:
              0.26113078 = queryWeight, product of:
                2.6313207 = boost
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.021722743 = queryNorm
              0.5710572 = fieldWeight in 1381, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.0625 = fieldNorm(doc=1381)
        0.28 = coord(7/25)
    
  2. Duwairi, R.; Al-Refai, M.N.; Khasawneh, N.: Feature reduction techniques for Arabic text categorization (2009) 0.17
    0.17247541 = sum of:
      0.17247541 = product of:
        0.6159836 = sum of:
          0.0368323 = weight(abstract_txt:categories in 170) [ClassicSimilarity], result of:
            0.0368323 = score(doc=170,freq=1.0), product of:
              0.113144055 = queryWeight, product of:
                5.208553 = idf(docFreq=628, maxDocs=42306)
                0.021722743 = queryNorm
              0.32553455 = fieldWeight in 170, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.208553 = idf(docFreq=628, maxDocs=42306)
                0.0625 = fieldNorm(doc=170)
          0.09874679 = weight(abstract_txt:accuracy in 170) [ClassicSimilarity], result of:
            0.09874679 = score(doc=170,freq=3.0), product of:
              0.15139823 = queryWeight, product of:
                1.1567633 = boost
                6.025063 = idf(docFreq=277, maxDocs=42306)
                0.021722743 = queryNorm
              0.65223217 = fieldWeight in 170, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.025063 = idf(docFreq=277, maxDocs=42306)
                0.0625 = fieldNorm(doc=170)
          0.14507474 = weight(abstract_txt:clusters in 170) [ClassicSimilarity], result of:
            0.14507474 = score(doc=170,freq=4.0), product of:
              0.17776805 = queryWeight, product of:
                1.2534615 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.021722743 = queryNorm
              0.81609005 = fieldWeight in 170, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.0625 = fieldNorm(doc=170)
          0.018837638 = weight(abstract_txt:that in 170) [ClassicSimilarity], result of:
            0.018837638 = score(doc=170,freq=3.0), product of:
              0.07235971 = queryWeight, product of:
                1.3851384 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.021722743 = queryNorm
              0.26033324 = fieldWeight in 170, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=170)
          0.062954515 = weight(abstract_txt:documents in 170) [ClassicSimilarity], result of:
            0.062954515 = score(doc=170,freq=3.0), product of:
              0.14129713 = queryWeight, product of:
                1.5803956 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.021722743 = queryNorm
              0.445547 = fieldWeight in 170, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.0625 = fieldNorm(doc=170)
          0.08379442 = weight(abstract_txt:techniques in 170) [ClassicSimilarity], result of:
            0.08379442 = score(doc=170,freq=3.0), product of:
              0.17097221 = queryWeight, product of:
                1.7384487 = boost
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.021722743 = queryNorm
              0.4901055 = fieldWeight in 170, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.0625 = fieldNorm(doc=170)
          0.16974318 = weight(abstract_txt:word in 170) [ClassicSimilarity], result of:
            0.16974318 = score(doc=170,freq=4.0), product of:
              0.2486933 = queryWeight, product of:
                2.0966752 = boost
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.021722743 = queryNorm
              0.68254024 = fieldWeight in 170, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.460322 = idf(docFreq=488, maxDocs=42306)
                0.0625 = fieldNorm(doc=170)
        0.28 = coord(7/25)
    
  3. Lee, Y.-H.; Wei, C.-P.; Hu, P.J.-H.: ¬An ontology-based technique for preserving user preferences in document-category evolutions (2011) 0.17
    0.16578865 = sum of:
      0.16578865 = product of:
        0.51808953 = sum of:
          0.055820987 = weight(abstract_txt:categories in 1354) [ClassicSimilarity], result of:
            0.055820987 = score(doc=1354,freq=3.0), product of:
              0.113144055 = queryWeight, product of:
                5.208553 = idf(docFreq=628, maxDocs=42306)
                0.021722743 = queryNorm
              0.49336207 = fieldWeight in 1354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.208553 = idf(docFreq=628, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
          0.1050827 = weight(abstract_txt:technique in 1354) [ClassicSimilarity], result of:
            0.1050827 = score(doc=1354,freq=7.0), product of:
              0.13005547 = queryWeight, product of:
                1.0721325 = boost
                5.5842586 = idf(docFreq=431, maxDocs=42306)
                0.021722743 = queryNorm
              0.8079837 = fieldWeight in 1354, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.5842586 = idf(docFreq=431, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
          0.016375776 = weight(abstract_txt:which in 1354) [ClassicSimilarity], result of:
            0.016375776 = score(doc=1354,freq=2.0), product of:
              0.07204576 = queryWeight, product of:
                1.1285046 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.021722743 = queryNorm
              0.22729687 = fieldWeight in 1354, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
          0.095419355 = weight(abstract_txt:clustering in 1354) [ClassicSimilarity], result of:
            0.095419355 = score(doc=1354,freq=3.0), product of:
              0.161755 = queryWeight, product of:
                1.1956745 = boost
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.021722743 = queryNorm
              0.5899005 = fieldWeight in 1354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.227734 = idf(docFreq=226, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
          0.016482932 = weight(abstract_txt:that in 1354) [ClassicSimilarity], result of:
            0.016482932 = score(doc=1354,freq=3.0), product of:
              0.07235971 = queryWeight, product of:
                1.3851384 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.021722743 = queryNorm
              0.22779158 = fieldWeight in 1354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
          0.10050248 = weight(abstract_txt:salient in 1354) [ClassicSimilarity], result of:
            0.10050248 = score(doc=1354,freq=1.0), product of:
              0.24150439 = queryWeight, product of:
                1.4609879 = boost
                7.609633 = idf(docFreq=56, maxDocs=42306)
                0.021722743 = queryNorm
              0.4161518 = fieldWeight in 1354, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.609633 = idf(docFreq=56, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
          0.0550852 = weight(abstract_txt:documents in 1354) [ClassicSimilarity], result of:
            0.0550852 = score(doc=1354,freq=3.0), product of:
              0.14129713 = queryWeight, product of:
                1.5803956 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.021722743 = queryNorm
              0.38985363 = fieldWeight in 1354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
          0.07332012 = weight(abstract_txt:techniques in 1354) [ClassicSimilarity], result of:
            0.07332012 = score(doc=1354,freq=3.0), product of:
              0.17097221 = queryWeight, product of:
                1.7384487 = boost
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.021722743 = queryNorm
              0.4288423 = fieldWeight in 1354, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.0546875 = fieldNorm(doc=1354)
        0.32 = coord(8/25)
    
  4. Lin, Y.-R.; Margolin, D.; Lazer, D.: Uncovering social semantics from textual traces : a theory-driven approach and evidence from public statements of U.S. Members of Congress (2016) 0.16
    0.15531604 = sum of:
      0.15531604 = product of:
        0.48536265 = sum of:
          0.054397658 = weight(abstract_txt:finding in 79) [ClassicSimilarity], result of:
            0.054397658 = score(doc=79,freq=1.0), product of:
              0.12645207 = queryWeight, product of:
                1.0571755 = boost
                5.506355 = idf(docFreq=466, maxDocs=42306)
                0.021722743 = queryNorm
              0.43018398 = fieldWeight in 79, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.506355 = idf(docFreq=466, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
          0.056739327 = weight(abstract_txt:technique in 79) [ClassicSimilarity], result of:
            0.056739327 = score(doc=79,freq=1.0), product of:
              0.13005547 = queryWeight, product of:
                1.0721325 = boost
                5.5842586 = idf(docFreq=431, maxDocs=42306)
                0.021722743 = queryNorm
              0.4362702 = fieldWeight in 79, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5842586 = idf(docFreq=431, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
          0.023393966 = weight(abstract_txt:which in 79) [ClassicSimilarity], result of:
            0.023393966 = score(doc=79,freq=2.0), product of:
              0.07204576 = queryWeight, product of:
                1.1285046 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.021722743 = queryNorm
              0.32470983 = fieldWeight in 79, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
          0.019226084 = weight(abstract_txt:that in 79) [ClassicSimilarity], result of:
            0.019226084 = score(doc=79,freq=2.0), product of:
              0.07235971 = queryWeight, product of:
                1.3851384 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.021722743 = queryNorm
              0.2657015 = fieldWeight in 79, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
          0.04543351 = weight(abstract_txt:documents in 79) [ClassicSimilarity], result of:
            0.04543351 = score(doc=79,freq=1.0), product of:
              0.14129713 = queryWeight, product of:
                1.5803956 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.021722743 = queryNorm
              0.32154587 = fieldWeight in 79, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
          0.06047342 = weight(abstract_txt:techniques in 79) [ClassicSimilarity], result of:
            0.06047342 = score(doc=79,freq=1.0), product of:
              0.17097221 = queryWeight, product of:
                1.7384487 = boost
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.021722743 = queryNorm
              0.3537032 = fieldWeight in 79, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
          0.093893446 = weight(abstract_txt:similar in 79) [ClassicSimilarity], result of:
            0.093893446 = score(doc=79,freq=1.0), product of:
              0.22924826 = queryWeight, product of:
                2.0130389 = boost
                5.2425094 = idf(docFreq=607, maxDocs=42306)
                0.021722743 = queryNorm
              0.40957105 = fieldWeight in 79, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.2425094 = idf(docFreq=607, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
          0.13180524 = weight(abstract_txt:features in 79) [ClassicSimilarity], result of:
            0.13180524 = score(doc=79,freq=2.0), product of:
              0.26113078 = queryWeight, product of:
                2.6313207 = boost
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.021722743 = queryNorm
              0.504748 = fieldWeight in 79, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.078125 = fieldNorm(doc=79)
        0.32 = coord(8/25)
    
  5. Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.15
    0.15477285 = sum of:
      0.15477285 = product of:
        0.5527602 = sum of:
          0.087355465 = weight(abstract_txt:categories in 3742) [ClassicSimilarity], result of:
            0.087355465 = score(doc=3742,freq=10.0), product of:
              0.113144055 = queryWeight, product of:
                5.208553 = idf(docFreq=628, maxDocs=42306)
                0.021722743 = queryNorm
              0.772073 = fieldWeight in 3742, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.208553 = idf(docFreq=628, maxDocs=42306)
                0.046875 = fieldNorm(doc=3742)
          0.10880605 = weight(abstract_txt:clusters in 3742) [ClassicSimilarity], result of:
            0.10880605 = score(doc=3742,freq=4.0), product of:
              0.17776805 = queryWeight, product of:
                1.2534615 = boost
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.021722743 = queryNorm
              0.6120675 = fieldWeight in 3742, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.5287204 = idf(docFreq=167, maxDocs=42306)
                0.046875 = fieldNorm(doc=3742)
          0.021581225 = weight(abstract_txt:that in 3742) [ClassicSimilarity], result of:
            0.021581225 = score(doc=3742,freq=7.0), product of:
              0.07235971 = queryWeight, product of:
                1.3851384 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.021722743 = queryNorm
              0.29824919 = fieldWeight in 3742, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.046875 = fieldNorm(doc=3742)
          0.03855161 = weight(abstract_txt:documents in 3742) [ClassicSimilarity], result of:
            0.03855161 = score(doc=3742,freq=2.0), product of:
              0.14129713 = queryWeight, product of:
                1.5803956 = boost
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.021722743 = queryNorm
              0.2728407 = fieldWeight in 3742, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.115787 = idf(docFreq=1875, maxDocs=42306)
                0.046875 = fieldNorm(doc=3742)
          0.03628405 = weight(abstract_txt:techniques in 3742) [ClassicSimilarity], result of:
            0.03628405 = score(doc=3742,freq=1.0), product of:
              0.17097221 = queryWeight, product of:
                1.7384487 = boost
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.021722743 = queryNorm
              0.21222192 = fieldWeight in 3742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.527401 = idf(docFreq=1242, maxDocs=42306)
                0.046875 = fieldNorm(doc=3742)
          0.20426156 = weight(abstract_txt:pages in 3742) [ClassicSimilarity], result of:
            0.20426156 = score(doc=3742,freq=9.0), product of:
              0.26011094 = queryWeight, product of:
                2.144265 = boost
                5.5842586 = idf(docFreq=431, maxDocs=42306)
                0.021722743 = queryNorm
              0.7852863 = fieldWeight in 3742, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.5842586 = idf(docFreq=431, maxDocs=42306)
                0.046875 = fieldNorm(doc=3742)
          0.05592023 = weight(abstract_txt:features in 3742) [ClassicSimilarity], result of:
            0.05592023 = score(doc=3742,freq=1.0), product of:
              0.26113078 = queryWeight, product of:
                2.6313207 = boost
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.021722743 = queryNorm
              0.21414645 = fieldWeight in 3742, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5684576 = idf(docFreq=1192, maxDocs=42306)
                0.046875 = fieldNorm(doc=3742)
        0.28 = coord(7/25)