Search (80 results, page 1 of 4)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Klassifizieren"
  • × type_ss:"a"
  1. Dubin, D.: Dimensions and discriminability (1998) 0.11
    0.10835254 = product of:
      0.16252881 = sum of:
        0.03873757 = weight(_text_:science in 2338) [ClassicSimilarity], result of:
          0.03873757 = score(doc=2338,freq=4.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.2881068 = fieldWeight in 2338, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.12379125 = sum of:
          0.075381085 = weight(_text_:index in 2338) [ClassicSimilarity], result of:
            0.075381085 = score(doc=2338,freq=2.0), product of:
              0.22304957 = queryWeight, product of:
                4.369764 = idf(docFreq=1520, maxDocs=44218)
                0.05104385 = queryNorm
              0.33795667 = fieldWeight in 2338, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.369764 = idf(docFreq=1520, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2338)
          0.04841016 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
            0.04841016 = score(doc=2338,freq=2.0), product of:
              0.17874686 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05104385 = queryNorm
              0.2708308 = fieldWeight in 2338, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2338)
      0.6666667 = coord(2/3)
    
    Abstract
    Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
    Date
    22. 9.1997 19:16:05
    Imprint
    Urbana-Champaign, IL : Illinois University at Urbana-Champaign, Graduate School of Library and Information Science
    Source
    Visualizing subject access for 21st century information resources: Papers presented at the 1997 Clinic on Library Applications of Data Processing, 2-4 Mar 1997, Graduate School of Library and Information Science, University of Illinois at Urbana-Champaign. Ed.: P.A. Cochrane et al
  2. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.07
    0.06787892 = product of:
      0.10181837 = sum of:
        0.08107116 = product of:
          0.24321347 = sum of:
            0.24321347 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.24321347 = score(doc=562,freq=2.0), product of:
                0.4327503 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05104385 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.02074721 = product of:
          0.04149442 = sum of:
            0.04149442 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.04149442 = score(doc=562,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  3. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.05
    0.049139693 = product of:
      0.07370954 = sum of:
        0.039130855 = weight(_text_:science in 2748) [ClassicSimilarity], result of:
          0.039130855 = score(doc=2748,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.2910318 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
        0.034578685 = product of:
          0.06915737 = sum of:
            0.06915737 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.06915737 = score(doc=2748,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    1. 2.2016 18:25:22
    Series
    Lecture notes in computer science ; 9398
  4. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.03
    0.03439779 = product of:
      0.05159668 = sum of:
        0.027391598 = weight(_text_:science in 5273) [ClassicSimilarity], result of:
          0.027391598 = score(doc=5273,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.20372227 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
        0.02420508 = product of:
          0.04841016 = sum of:
            0.04841016 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.04841016 = score(doc=5273,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    22. 7.2006 16:24:52
    Source
    Journal of the American Society for Information Science and Technology. 57(2006) no.3, S.431-442
  5. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.03
    0.029483816 = product of:
      0.044225723 = sum of:
        0.023478512 = weight(_text_:science in 2760) [ClassicSimilarity], result of:
          0.023478512 = score(doc=2760,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.17461908 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
        0.02074721 = product of:
          0.04149442 = sum of:
            0.04149442 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
              0.04149442 = score(doc=2760,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.23214069 = fieldWeight in 2760, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2760)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    22. 3.2009 19:11:54
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.803-813
  6. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.03
    0.029483816 = product of:
      0.044225723 = sum of:
        0.023478512 = weight(_text_:science in 690) [ClassicSimilarity], result of:
          0.023478512 = score(doc=690,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.17461908 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
        0.02074721 = product of:
          0.04149442 = sum of:
            0.04149442 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.04149442 = score(doc=690,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    23. 3.2013 13:22:36
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.4, S.844-860
  7. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.03
    0.029483816 = product of:
      0.044225723 = sum of:
        0.023478512 = weight(_text_:science in 2158) [ClassicSimilarity], result of:
          0.023478512 = score(doc=2158,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.17461908 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
        0.02074721 = product of:
          0.04149442 = sum of:
            0.04149442 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.04149442 = score(doc=2158,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    4. 8.2015 19:22:04
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1817-1831
  8. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.02
    0.024569847 = product of:
      0.03685477 = sum of:
        0.019565428 = weight(_text_:science in 2765) [ClassicSimilarity], result of:
          0.019565428 = score(doc=2765,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.1455159 = fieldWeight in 2765, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
        0.017289342 = product of:
          0.034578685 = sum of:
            0.034578685 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
              0.034578685 = score(doc=2765,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.19345059 = fieldWeight in 2765, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2765)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    22. 3.2009 19:14:43
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.4, S.814-825
  9. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.02
    0.024569847 = product of:
      0.03685477 = sum of:
        0.019565428 = weight(_text_:science in 1107) [ClassicSimilarity], result of:
          0.019565428 = score(doc=1107,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.1455159 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
        0.017289342 = product of:
          0.034578685 = sum of:
            0.034578685 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
              0.034578685 = score(doc=1107,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.19345059 = fieldWeight in 1107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1107)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    28.10.2013 19:22:57
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2265-2277
  10. Borko, H.: Research in computer based classification systems (1985) 0.02
    0.021694047 = product of:
      0.03254107 = sum of:
        0.013695799 = weight(_text_:science in 3647) [ClassicSimilarity], result of:
          0.013695799 = score(doc=3647,freq=2.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.101861134 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
        0.018845271 = product of:
          0.037690543 = sum of:
            0.037690543 = weight(_text_:index in 3647) [ClassicSimilarity], result of:
              0.037690543 = score(doc=3647,freq=2.0), product of:
                0.22304957 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.05104385 = queryNorm
                0.16897833 = fieldWeight in 3647, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=3647)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.
  11. Ardö, A.; Koch, T.: Automatic classification applied to full-text Internet documents in a robot-generated subject index (1999) 0.02
    0.021537453 = product of:
      0.06461236 = sum of:
        0.06461236 = product of:
          0.12922472 = sum of:
            0.12922472 = weight(_text_:index in 382) [ClassicSimilarity], result of:
              0.12922472 = score(doc=382,freq=2.0), product of:
                0.22304957 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.05104385 = queryNorm
                0.5793543 = fieldWeight in 382, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.09375 = fieldNorm(doc=382)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
  12. Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.02
    0.020305708 = product of:
      0.06091712 = sum of:
        0.06091712 = product of:
          0.12183424 = sum of:
            0.12183424 = weight(_text_:index in 4088) [ClassicSimilarity], result of:
              0.12183424 = score(doc=4088,freq=4.0), product of:
                0.22304957 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.05104385 = queryNorm
                0.5462205 = fieldWeight in 4088, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4088)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Authors describe the background and the work involved in setting up Engine-e, a Web index that uses automatic classification as a mean for the selection of resources in Engineering. Considerations in offering a robot-generated Web index as a successor to a manually indexed quality-controlled subject gateway are also discussed
  13. Losee, R.M.; Haas, S.W.: Sublanguage terms : dictionaries, usage, and automatic classification (1995) 0.01
    0.014757169 = product of:
      0.044271506 = sum of:
        0.044271506 = weight(_text_:science in 2650) [ClassicSimilarity], result of:
          0.044271506 = score(doc=2650,freq=4.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.3292649 = fieldWeight in 2650, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0625 = fieldNorm(doc=2650)
      0.33333334 = coord(1/3)
    
    Abstract
    The use of terms from natural and social science titles and abstracts is studied from the perspective of sublanguages and their specialized dictionaries. Explores different notions of sublanguage distinctiveness. Object methods for separating hard and soft sciences are suggested based on measures of sublanguage use, dictionary characteristics, and sublanguage distinctiveness. Abstracts were automatically classified with a high degree of accuracy by using a formula that condsiders the degree of uniqueness of terms in each sublanguage. This may prove useful for text filtering of information retrieval systems
    Source
    Journal of the American Society for Information Science. 46(1995) no.7, S.519-529
  14. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.013831474 = product of:
      0.04149442 = sum of:
        0.04149442 = product of:
          0.08298884 = sum of:
            0.08298884 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.08298884 = score(doc=1046,freq=2.0), product of:
                0.17874686 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05104385 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    5. 5.2003 14:17:22
  15. Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.01
    0.013043619 = product of:
      0.039130855 = sum of:
        0.039130855 = weight(_text_:science in 2194) [ClassicSimilarity], result of:
          0.039130855 = score(doc=2194,freq=8.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.2910318 = fieldWeight in 2194, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2194)
      0.33333334 = coord(1/3)
    
    Abstract
    In the Thomson Reuters Web of Science database, the subject categories of a journal are applied to all articles in the journal. However, many articles in multidisciplinary Sciences journals may only be represented by a small number of subject categories. To provide more accurate information on the research areas of articles in such journals, we can classify articles in these journals into subject categories as defined by Web of Science based on their references. For an article in a multidisciplinary sciences journal, the method counts the subject categories in all of the article's references indexed by Web of Science, and uses the most numerous subject categories of the references to determine the most appropriate classification of the article. We used articles in an issue of Proceedings of the National Academy of Sciences (PNAS) to validate the correctness of the method by comparing the obtained results with the categories of the articles as defined by PNAS and their content. This study shows that the method provides more precise search results for the subject category of interest in bibliometric investigations through recognition of articles in multidisciplinary sciences journals whose work relates to a particular subject category.
    Object
    Web of science
  16. Suominen, A.; Toivanen, H.: Map of science with topic modeling : comparison of unsupervised learning and human-assigned subject classification (2016) 0.01
    0.013043619 = product of:
      0.039130855 = sum of:
        0.039130855 = weight(_text_:science in 3121) [ClassicSimilarity], result of:
          0.039130855 = score(doc=3121,freq=8.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.2910318 = fieldWeight in 3121, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3121)
      0.33333334 = coord(1/3)
    
    Abstract
    The delineation of coordinates is fundamental for the cartography of science, and accurate and credible classification of scientific knowledge presents a persistent challenge in this regard. We present a map of Finnish science based on unsupervised-learning classification, and discuss the advantages and disadvantages of this approach vis-à-vis those generated by human reasoning. We conclude that from theoretical and practical perspectives there exist several challenges for human reasoning-based classification frameworks of scientific knowledge, as they typically try to fit new-to-the-world knowledge into historical models of scientific knowledge, and cannot easily be deployed for new large-scale data sets. Automated classification schemes, in contrast, generate classification models only from the available text corpus, thereby identifying credibly novel bodies of knowledge. They also lend themselves to versatile large-scale data analysis, and enable a range of Big Data possibilities. However, we also argue that it is neither possible nor fruitful to declare one or another method a superior approach in terms of realism to classify scientific knowledge, and we believe that the merits of each approach are dependent on the practical objectives of analysis.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.10, S.2464-2476
  17. Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.01
    0.0126910675 = product of:
      0.0380732 = sum of:
        0.0380732 = product of:
          0.0761464 = sum of:
            0.0761464 = weight(_text_:index in 3614) [ClassicSimilarity], result of:
              0.0761464 = score(doc=3614,freq=4.0), product of:
                0.22304957 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.05104385 = queryNorm
                0.3413878 = fieldWeight in 3614, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3614)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.
    Object
    Engineering Index Classification
  18. Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.01
    0.0125635145 = product of:
      0.037690543 = sum of:
        0.037690543 = product of:
          0.075381085 = sum of:
            0.075381085 = weight(_text_:index in 7209) [ClassicSimilarity], result of:
              0.075381085 = score(doc=7209,freq=2.0), product of:
                0.22304957 = queryWeight, product of:
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.05104385 = queryNorm
                0.33795667 = fieldWeight in 7209, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.369764 = idf(docFreq=1520, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7209)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    The Nordic WAIS/WWW project sponsored by NORDINFO is a joint project between Lund University Library and the National Technological Library of Denmark. It aims to improve the existing networked information discovery and retrieval tools Wide Area Information System (WAIS) and World Wide Web (WWW), and to move towards unifying WWW and WAIS. Details current results focusing on the WAIS side of the project. Describes research into automatic indexing and classification of WAIS sources, development of an orientation tool for WAIS, and development of a WAIS index of WWW resources
  19. Golub, K.: Automated subject classification of textual web documents (2006) 0.01
    0.011296105 = product of:
      0.033888314 = sum of:
        0.033888314 = weight(_text_:science in 5600) [ClassicSimilarity], result of:
          0.033888314 = score(doc=5600,freq=6.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.25204095 = fieldWeight in 5600, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5600)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such. Design/methodology/approach - A range of works dealing with automated classification of full-text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages. Findings - Provides major similarities and differences between the three approaches: document pre-processing and utilization of web-specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized. Research limitations/implications - The paper does not attempt to provide an exhaustive bibliography of related resources. Practical implications - As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities. Originality/value - To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
  20. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.01
    0.0110678775 = product of:
      0.03320363 = sum of:
        0.03320363 = weight(_text_:science in 3464) [ClassicSimilarity], result of:
          0.03320363 = score(doc=3464,freq=4.0), product of:
            0.13445559 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.05104385 = queryNorm
            0.24694869 = fieldWeight in 3464, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.33333334 = coord(1/3)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.6, S.1105-1119

Years