Search (34 results, page 1 of 2)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.27
    0.26777846 = product of:
      0.46861225 = sum of:
        0.064583495 = product of:
          0.19375047 = sum of:
            0.19375047 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.19375047 = score(doc=562,freq=2.0), product of:
                0.34474066 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.04066292 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.19375047 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.19375047 = score(doc=562,freq=2.0), product of:
            0.34474066 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04066292 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.19375047 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.19375047 = score(doc=562,freq=2.0), product of:
            0.34474066 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.04066292 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.016527792 = product of:
          0.033055585 = sum of:
            0.033055585 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.033055585 = score(doc=562,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.5714286 = coord(4/7)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Borko, H.: Research in computer based classification systems (1985) 0.03
    0.027762707 = product of:
      0.097169474 = sum of:
        0.04948969 = weight(_text_:united in 3647) [ClassicSimilarity], result of:
          0.04948969 = score(doc=3647,freq=2.0), product of:
            0.22812355 = queryWeight, product of:
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.04066292 = queryNorm
            0.2169425 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6101127 = idf(docFreq=439, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
        0.04767978 = weight(_text_:states in 3647) [ClassicSimilarity], result of:
          0.04767978 = score(doc=3647,freq=2.0), product of:
            0.22391328 = queryWeight, product of:
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.04066292 = queryNorm
            0.21293859 = fieldWeight in 3647, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3647)
      0.2857143 = coord(2/7)
    
    Abstract
    The selection in this reader by R. M. Needham and K. Sparck Jones reports an early approach to automatic classification that was taken in England. The following selection reviews various approaches that were being pursued in the United States at about the same time. It then discusses a particular approach initiated in the early 1960s by Harold Borko, at that time Head of the Language Processing and Retrieval Research Staff at the System Development Corporation, Santa Monica, California and, since 1966, a member of the faculty at the Graduate School of Library and Information Science, University of California, Los Angeles. As was described earlier, there are two steps in automatic classification, the first being to identify pairs of terms that are similar by virtue of co-occurring as index terms in the same documents, and the second being to form equivalence classes of intersubstitutable terms. To compute similarities, Borko and his associates used a standard correlation formula; to derive classification categories, where Needham and Sparck Jones used clumping, the Borko team used the statistical technique of factor analysis. The fact that documents can be classified automatically, and in any number of ways, is worthy of passing notice. Worthy of serious attention would be a demonstra tion that a computer-based classification system was effective in the organization and retrieval of documents. One reason for the inclusion of the following selection in the reader is that it addresses the question of evaluation. To evaluate the effectiveness of their automatically derived classification, Borko and his team asked three questions. The first was Is the classification reliable? in other words, could the categories derived from one sample of texts be used to classify other texts? Reliability was assessed by a case-study comparison of the classes derived from three different samples of abstracts. The notso-surprising conclusion reached was that automatically derived classes were reliable only to the extent that the sample from which they were derived was representative of the total document collection. The second evaluation question asked whether the classification was reasonable, in the sense of adequately describing the content of the document collection. The answer was sought by comparing the automatically derived categories with categories in a related classification system that was manually constructed. Here the conclusion was that the automatic method yielded categories that fairly accurately reflected the major area of interest in the sample collection of texts; however, since there were only eleven such categories and they were quite broad, they could not be regarded as suitable for use in a university or any large general library. The third evaluation question asked whether automatic classification was accurate, in the sense of producing results similar to those obtainabie by human cIassifiers. When using human classification as a criterion, automatic classification was found to be 50 percent accurate.
  3. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
    0.01052368 = product of:
      0.07366575 = sum of:
        0.07366575 = weight(_text_:sites in 316) [ClassicSimilarity], result of:
          0.07366575 = score(doc=316,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.14285715 = coord(1/7)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
  4. Cortez, E.; Herrera, M.R.; Silva, A.S. da; Moura, E.S. de; Neubert, M.: Lightweight methods for large-scale product categorization (2011) 0.01
    0.01052368 = product of:
      0.07366575 = sum of:
        0.07366575 = weight(_text_:sites in 4758) [ClassicSimilarity], result of:
          0.07366575 = score(doc=4758,freq=2.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.34654665 = fieldWeight in 4758, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=4758)
      0.14285715 = coord(1/7)
    
    Abstract
    In this article, we present a study about classification methods for large-scale categorization of product offers on e-shopping web sites. We present a study about the performance of previously proposed approaches and deployed a probabilistic approach to model the classification problem. We also studied an alternative way of modeling information about the description of product offers and investigated the usage of price and store of product offers as features adopted in the classification process. Our experiments used two collections of over a million product offers previously categorized by human editors and taxonomies of hundreds of categories from a real e-shopping web site. In these experiments, our method achieved an improvement of up to 9% in the quality of the categorization in comparison with the best baseline we have found.
  5. Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
    0.007441365 = product of:
      0.052089553 = sum of:
        0.052089553 = weight(_text_:sites in 1253) [ClassicSimilarity], result of:
          0.052089553 = score(doc=1253,freq=4.0), product of:
            0.21257097 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.04066292 = queryNorm
            0.24504548 = fieldWeight in 1253, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.14285715 = coord(1/7)
    
    Abstract
    Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.
  6. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.00
    0.0047222264 = product of:
      0.033055585 = sum of:
        0.033055585 = product of:
          0.06611117 = sum of:
            0.06611117 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.06611117 = score(doc=1046,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    5. 5.2003 14:17:22
  7. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.00
    0.003935189 = product of:
      0.027546322 = sum of:
        0.027546322 = product of:
          0.055092644 = sum of:
            0.055092644 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.055092644 = score(doc=611,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    22. 8.2009 12:54:24
  8. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00
    0.003935189 = product of:
      0.027546322 = sum of:
        0.027546322 = product of:
          0.055092644 = sum of:
            0.055092644 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.055092644 = score(doc=2748,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    1. 2.2016 18:25:22
  9. Sommer, M.: Automatische Generierung von DDC-Notationen für Hochschulveröffentlichungen (2012) 0.00
    0.0038493504 = product of:
      0.026945451 = sum of:
        0.026945451 = product of:
          0.053890903 = sum of:
            0.053890903 = weight(_text_:design in 587) [ClassicSimilarity], result of:
              0.053890903 = score(doc=587,freq=4.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.3524878 = fieldWeight in 587, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=587)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Content
    Vgl. unter: http://opus.bsz-bw.de/fhhv/volltexte/2012/397/pdf/Bachelorarbeit_final_Korrektur01.pdf. Bachelorarbeit, Hochschule Hannover, Fakultät III - Medien, Information und Design, Abteilung Information und Kommunikation, Studiengang Informationsmanagement
    Imprint
    Hannover : Hochschule Hannover, Fakultät III - Medien, Information und Design, Abteilung Information und Kommunikation
  10. Cheng, P.T.K.; Wu, A.K.W.: ACS: an automatic classification system (1995) 0.00
    0.0036292023 = product of:
      0.025404414 = sum of:
        0.025404414 = product of:
          0.05080883 = sum of:
            0.05080883 = weight(_text_:design in 2188) [ClassicSimilarity], result of:
              0.05080883 = score(doc=2188,freq=2.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.33232868 = fieldWeight in 2188, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2188)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Abstract
    In this paper, we introduce ACS, an automatic classification system for school libraries. First, various approaches towards automatic classification, namely (i) rule-based, (ii) browse and search, and (iii) partial match, are critically reviewed. The central issues of scheme selection, text analysis and similarity measures are discussed. A novel approach towards detecting book-class similarity with Modified Overlap Coefficient (MOC) is also proposed. Finally, the design and implementation of ACS is presented. The test result of over 80% correctness in automatic classification and a cost reduction of 75% compared to manual classification suggest that ACS is highly adoptable
  11. Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.00
    0.0031755518 = product of:
      0.022228861 = sum of:
        0.022228861 = product of:
          0.044457722 = sum of:
            0.044457722 = weight(_text_:design in 1595) [ClassicSimilarity], result of:
              0.044457722 = score(doc=1595,freq=2.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.29078758 = fieldWeight in 1595, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1595)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
  12. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.00
    0.0027546322 = product of:
      0.019282425 = sum of:
        0.019282425 = product of:
          0.03856485 = sum of:
            0.03856485 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.03856485 = score(doc=141,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Pages
    S.1-22
  13. Dubin, D.: Dimensions and discriminability (1998) 0.00
    0.0027546322 = product of:
      0.019282425 = sum of:
        0.019282425 = product of:
          0.03856485 = sum of:
            0.03856485 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.03856485 = score(doc=2338,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    22. 9.1997 19:16:05
  14. Automatic classification research at OCLC (2002) 0.00
    0.0027546322 = product of:
      0.019282425 = sum of:
        0.019282425 = product of:
          0.03856485 = sum of:
            0.03856485 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.03856485 = score(doc=1563,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    5. 5.2003 9:22:09
  15. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.00
    0.0027546322 = product of:
      0.019282425 = sum of:
        0.019282425 = product of:
          0.03856485 = sum of:
            0.03856485 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.03856485 = score(doc=1673,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    1. 8.1996 22:08:06
  16. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.00
    0.0027546322 = product of:
      0.019282425 = sum of:
        0.019282425 = product of:
          0.03856485 = sum of:
            0.03856485 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.03856485 = score(doc=5273,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    22. 7.2006 16:24:52
  17. Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.00
    0.0027546322 = product of:
      0.019282425 = sum of:
        0.019282425 = product of:
          0.03856485 = sum of:
            0.03856485 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
              0.03856485 = score(doc=2560,freq=2.0), product of:
                0.14239462 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04066292 = queryNorm
                0.2708308 = fieldWeight in 2560, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2560)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Date
    22. 9.2008 18:31:54
  18. Prabowo, R.; Jackson, M.; Burden, P.; Knoell, H.-D.: Ontology-based automatic classification for the Web pages : design, implementation and evaluation (2002) 0.00
    0.0027219015 = product of:
      0.01905331 = sum of:
        0.01905331 = product of:
          0.03810662 = sum of:
            0.03810662 = weight(_text_:design in 3383) [ClassicSimilarity], result of:
              0.03810662 = score(doc=3383,freq=2.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.24924651 = fieldWeight in 3383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3383)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
  19. Cosh, K.J.; Burns, R.; Daniel, T.: Content clouds : classifying content in Web 2.0 (2008) 0.00
    0.0027219015 = product of:
      0.01905331 = sum of:
        0.01905331 = product of:
          0.03810662 = sum of:
            0.03810662 = weight(_text_:design in 2013) [ClassicSimilarity], result of:
              0.03810662 = score(doc=2013,freq=2.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.24924651 = fieldWeight in 2013, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2013)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Abstract
    Purpose - With increasing amounts of user generated content being produced electronically in the form of wikis, blogs, forums etc. the purpose of this paper is to investigate a new approach to classifying ad hoc content. Design/methodology/approach - The approach applies natural language processing (NLP) tools to automatically extract the content of some text, visualizing the results in a content cloud. Findings - Content clouds share the visual simplicity of a tag cloud, but display the details of an article at a different level of abstraction, providing a complimentary classification. Research limitations/implications - Provides the general approach to creating a content cloud. In the future, the process can be refined and enhanced by further evaluation of results. Further work is also required to better identify closely related articles. Practical implications - Being able to automatically classify the content generated by web users will enable others to find more appropriate content. Originality/value - The approach is original. Other researchers have produced a cloud, simply by using skiplists to filter unwanted words, this paper's approach improves this by applying appropriate NLP techniques.
  20. Ozmutlu, S.; Cosar, G.C.: Analyzing the results of automatic new topic identification (2008) 0.00
    0.0027219015 = product of:
      0.01905331 = sum of:
        0.01905331 = product of:
          0.03810662 = sum of:
            0.03810662 = weight(_text_:design in 2604) [ClassicSimilarity], result of:
              0.03810662 = score(doc=2604,freq=2.0), product of:
                0.15288728 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.04066292 = queryNorm
                0.24924651 = fieldWeight in 2604, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2604)
          0.5 = coord(1/2)
      0.14285715 = coord(1/7)
    
    Abstract
    Purpose - Identification of topic changes within a user search session is a key issue in content analysis of search engine user queries. Recently, various studies have focused on new topic identification/session identification of search engine transaction logs, and several problems regarding the estimation of topic shifts and continuations were observed in these studies. This study aims to analyze the reasons for the problems that were encountered as a result of applying automatic new topic identification. Design/methodology/approach - Measures, such as cleaning the data of common words and analyzing the errors of automatic new topic identification, are applied to eliminate the problems in estimating topic shifts and continuations. Findings - The findings show that the resulting errors of automatic new topic identification have a pattern, and further research is required to improve the performance of automatic new topic identification. Originality/value - Improving the performance of automatic new topic identification would be valuable to search engine designers, so that they can develop new clustering and query recommendation algorithms, as well as custom-tailored graphical user interfaces for search engine users.