Search (187 results, page 1 of 10)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.20
    0.19724123 = product of:
      0.29586184 = sum of:
        0.068278834 = product of:
          0.20483649 = sum of:
            0.20483649 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.20483649 = score(doc=562,freq=2.0), product of:
                0.36446604 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.042989567 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.005273023 = weight(_text_:in in 562) [ClassicSimilarity], result of:
          0.005273023 = score(doc=562,freq=2.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.09017298 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.20483649 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.20483649 = score(doc=562,freq=2.0), product of:
            0.36446604 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.042989567 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.017473478 = product of:
          0.034946956 = sum of:
            0.034946956 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.034946956 = score(doc=562,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.6666667 = coord(4/6)
    
    Abstract
    Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Cui, H.; Heidorn, P.B.; Zhang, H.: ¬An approach to automatic classification of text for information retrieval (2002) 0.04
    0.039723717 = product of:
      0.11917115 = sum of:
        0.01375598 = weight(_text_:in in 174) [ClassicSimilarity], result of:
          0.01375598 = score(doc=174,freq=10.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.23523843 = fieldWeight in 174, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
        0.10541517 = weight(_text_:great in 174) [ClassicSimilarity], result of:
          0.10541517 = score(doc=174,freq=2.0), product of:
            0.24206476 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042989567 = queryNorm
            0.43548337 = fieldWeight in 174, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0546875 = fieldNorm(doc=174)
      0.33333334 = coord(2/6)
    
    Abstract
    In this paper, we explore an approach to make better use of semi-structured documents in information retrieval in the domain of biology. Using machine learning techniques, we make those inherent structures explicit by XML markups. This marking up has great potentials in improving task performance in specimen identification and the usability of online flora and fauna.
  3. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.04
    0.038549475 = product of:
      0.07709895 = sum of:
        0.0098257 = weight(_text_:in in 1107) [ClassicSimilarity], result of:
          0.0098257 = score(doc=1107,freq=10.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.16802745 = fieldWeight in 1107, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
        0.052712012 = weight(_text_:education in 1107) [ClassicSimilarity], result of:
          0.052712012 = score(doc=1107,freq=2.0), product of:
            0.2025344 = queryWeight, product of:
              4.7112455 = idf(docFreq=1080, maxDocs=44218)
              0.042989567 = queryNorm
            0.260262 = fieldWeight in 1107, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7112455 = idf(docFreq=1080, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1107)
        0.014561232 = product of:
          0.029122464 = sum of:
            0.029122464 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
              0.029122464 = score(doc=1107,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.19345059 = fieldWeight in 1107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1107)
          0.5 = coord(1/2)
      0.5 = coord(3/6)
    
    Abstract
    Retrieval of disease information is often based on several key aspects such as etiology, diagnosis, treatment, prevention, and symptoms of diseases. Automatic identification of disease aspect information is thus essential. In this article, I model the aspect identification problem as a text classification (TC) problem in which a disease aspect corresponds to a category. The disease aspect classification problem poses two challenges to classifiers: (a) a medical text often contains information about multiple aspects of a disease and hence produces noise for the classifiers and (b) text classifiers often cannot extract the textual parts (i.e., passages) about the categories of interest. I thus develop a technique, PETC (Passage Extractor for Text Classification), that extracts passages (from medical texts) for the underlying text classifiers to classify. Case studies on thousands of Chinese and English medical texts show that PETC enhances a support vector machine (SVM) classifier in classifying disease aspect information. PETC also performs better than three state-of-the-art classifier enhancement techniques, including two passage extraction techniques for text classifiers and a technique that employs term proximity information to enhance text classifiers. The contribution is of significance to evidence-based medicine, health education, and healthcare decision support. PETC can be used in those application domains in which a text to be classified may have several parts about different categories.
    Date
    28.10.2013 19:22:57
  4. Xu, Y.; Bernard, A.: Knowledge organization through statistical computation : a new approach (2009) 0.03
    0.034048904 = product of:
      0.10214671 = sum of:
        0.01179084 = weight(_text_:in in 3252) [ClassicSimilarity], result of:
          0.01179084 = score(doc=3252,freq=10.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.20163295 = fieldWeight in 3252, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3252)
        0.090355866 = weight(_text_:great in 3252) [ClassicSimilarity], result of:
          0.090355866 = score(doc=3252,freq=2.0), product of:
            0.24206476 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042989567 = queryNorm
            0.37327147 = fieldWeight in 3252, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.046875 = fieldNorm(doc=3252)
      0.33333334 = coord(2/6)
    
    Abstract
    Knowledge organization (KO) is an interdisciplinary issue which includes some problems in knowledge classification such as how to classify newly emerged knowledge. With the great complexity and ambiguity of knowledge, it is becoming sometimes inefficient to classify knowledge by logical reasoning. This paper attempts to propose a statistical approach to knowledge organization in order to resolve the problems in classifying complex and mass knowledge. By integrating the classification process into a mathematical model, a knowledge classifier, based on the maximum entropy theory, is constructed and the experimental results show that the classification results acquired from the classifier are reliable. The approach proposed in this paper is quite formal and is not dependent on specific contexts, so it could easily be adapted to the use of knowledge classification in other domains within KO.
  5. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.03
    0.02777814 = product of:
      0.083334416 = sum of:
        0.008788372 = weight(_text_:in in 5041) [ClassicSimilarity], result of:
          0.008788372 = score(doc=5041,freq=8.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.15028831 = fieldWeight in 5041, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
        0.07454605 = weight(_text_:education in 5041) [ClassicSimilarity], result of:
          0.07454605 = score(doc=5041,freq=4.0), product of:
            0.2025344 = queryWeight, product of:
              4.7112455 = idf(docFreq=1080, maxDocs=44218)
              0.042989567 = queryNorm
            0.36806607 = fieldWeight in 5041, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.7112455 = idf(docFreq=1080, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
      0.33333334 = coord(2/6)
    
    Abstract
    Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.
  6. Barthel, S.; Tönnies, S.; Balke, W.-T.: Large-scale experiments for mathematical document classification (2013) 0.03
    0.027635835 = product of:
      0.082907505 = sum of:
        0.0076109543 = weight(_text_:in in 1056) [ClassicSimilarity], result of:
          0.0076109543 = score(doc=1056,freq=6.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.1301535 = fieldWeight in 1056, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1056)
        0.07529655 = weight(_text_:great in 1056) [ClassicSimilarity], result of:
          0.07529655 = score(doc=1056,freq=2.0), product of:
            0.24206476 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042989567 = queryNorm
            0.31105953 = fieldWeight in 1056, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1056)
      0.33333334 = coord(2/6)
    
    Abstract
    The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry CAS, medicine MeSH, or mathematics MSC. But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.
  7. Wang, J.: ¬An extensive study on automated Dewey Decimal Classification (2009) 0.03
    0.026563581 = product of:
      0.07969074 = sum of:
        0.004394186 = weight(_text_:in in 3172) [ClassicSimilarity], result of:
          0.004394186 = score(doc=3172,freq=2.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.07514416 = fieldWeight in 3172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3172)
        0.07529655 = weight(_text_:great in 3172) [ClassicSimilarity], result of:
          0.07529655 = score(doc=3172,freq=2.0), product of:
            0.24206476 = queryWeight, product of:
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.042989567 = queryNorm
            0.31105953 = fieldWeight in 3172, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.6307793 = idf(docFreq=430, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3172)
      0.33333334 = coord(2/6)
    
    Abstract
    In this paper, we present a theoretical analysis and extensive experiments on the automated assignment of Dewey Decimal Classification (DDC) classes to bibliographic data with a supervised machine-learning approach. Library classification systems, such as the DDC, impose great obstacles on state-of-art text categorization (TC) technologies, including deep hierarchy, data sparseness, and skewed distribution. We first analyze statistically the document and category distributions over the DDC, and discuss the obstacles imposed by bibliographic corpora and library classification schemes on TC technology. To overcome these obstacles, we propose an innovative algorithm to reshape the DDC structure into a balanced virtual tree by balancing the category distribution and flattening the hierarchy. To improve the classification effectiveness to a level acceptable to real-world applications, we propose an interactive classification model that is able to predict a class of any depth within a limited number of user interactions. The experiments are conducted on a large bibliographic collection created by the Library of Congress within the science and technology domains over 10 years. With no more than three interactions, a classification accuracy of nearly 90% is achieved, thus providing a practical solution to the automatic bibliographic classification problem.
  8. Hoffmann, R.: Entwicklung einer benutzerunterstützten automatisierten Klassifikation von Web - Dokumenten : Untersuchung gegenwärtiger Methoden zur automatisierten Dokumentklassifikation und Implementierung eines Prototyps zum verbesserten Information Retrieval für das xFIND System (2002) 0.02
    0.016400103 = product of:
      0.049200308 = sum of:
        0.007030698 = weight(_text_:in in 4197) [ClassicSimilarity], result of:
          0.007030698 = score(doc=4197,freq=8.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.120230645 = fieldWeight in 4197, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=4197)
        0.04216961 = weight(_text_:education in 4197) [ClassicSimilarity], result of:
          0.04216961 = score(doc=4197,freq=2.0), product of:
            0.2025344 = queryWeight, product of:
              4.7112455 = idf(docFreq=1080, maxDocs=44218)
              0.042989567 = queryNorm
            0.2082096 = fieldWeight in 4197, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.7112455 = idf(docFreq=1080, maxDocs=44218)
              0.03125 = fieldNorm(doc=4197)
      0.33333334 = coord(2/6)
    
    Abstract
    Das unüberschaubare und permanent wachsende Angebot von Informationen im Internet ermöglicht es den Menschen nicht mehr, dieses inhaltlich zu erfassen oder gezielt nach Informationen zu suchen. Einen Lösungsweg zur verbesserten Informationsauffindung stellt hierbei die Kategorisierung bzw. Klassifikation der Informationen auf Basis ihres thematischen Inhaltes dar. Diese thematische Klassifikation kann sowohl anhand manueller (intellektueller) Methoden als auch durch automatisierte Verfahren erfolgen. Doch beide Ansätze für sich konnten die an sie gestellten Erwartungen bis zum heutigen Tag nur unzureichend erfüllen. Im Rahmen dieser Arbeit soll daher der naheliegende Ansatz, die beiden Methoden sinnvoll zu verknüpfen, untersucht werden. Im ersten Teil dieser Arbeit, dem Untersuchungsbereich, wird einleitend das Problem des Informationsüberangebots in unserer Gesellschaft erläutert und gezeigt, dass die Kategorisierung bzw. Klassifikation dieser Informationen speziell im Internet sinnvoll erscheint. Die prinzipiellen Möglichkeiten der Themenzuordnung von Dokumenten zur Verbesserung der Wissensverwaltung und Wissensauffindung werden beschrieben. Dabei werden unter anderem verschiedene Klassifikationsschemata, Topic Maps und semantische Netze vorgestellt. Schwerpunkt des Untersuchungsbereiches ist die Beschreibung automatisierter Methoden zur Themenzuordnung. Neben einem Überblick über die gebräuchlichsten Klassifikations-Algorithmen werden sowohl am Markt existierende Systeme sowie Forschungsansätze und frei verfügbare Module zur automatischen Klassifikation vorgestellt. Berücksichtigt werden auch Systeme, die zumindest teilweise den erwähnten Ansatz der Kombination von manuellen und automatischen Methoden unterstützen. Auch die in Zusammenhang mit der Klassifikation von Dokumenten im Internet auftretenden Probleme werden aufgezeigt. Die im Untersuchungsbereich gewonnenen Erkenntnisse fließen in die Entwicklung eines Moduls zur benutzerunterstützten, automatischen Dokumentklassifikation im Rahmen des xFIND Systems (extended Framework for Information Discovery) ein. Dieses an der technischen Universität Graz konzipierte Framework stellt die Basis für eine Vielzahl neuer Ideen zur Verbesserung des Information Retrieval dar. Der im Gestaltungsbereich entwickelte Lösungsansatz sieht zunächst die Verwendung bereits im System vorhandener, manuell klassifizierter Dokumente, Server oder Serverbereiche als Grundlage für die automatische Klassifikation vor. Nach erfolgter automatischer Klassifikation können in einem nächsten Schritt dann Autoren und Administratoren die Ergebnisse im Rahmen einer Benutzerunterstützung anpassen. Dabei kann das kollektive Benutzerverhalten durch die Möglichkeit eines Votings - mittels Zustimmung bzw. Ablehnung der Klassifikationsergebnisse - Einfluss finden. Das Wissen von Fachexperten und Benutzern trägt somit letztendlich zur Verbesserung der automatischen Klassifikation bei. Im Gestaltungsbereich werden die grundlegenden Konzepte, der Aufbau und die Funktionsweise des entwickelten Moduls beschrieben, sowie eine Reihe von Vorschlägen und Ideen zur Weiterentwicklung der benutzerunterstützten automatischen Dokumentklassifikation präsentiert.
    Content
    Auch unter: http://www2.iicm.edu/cguetl/education/thesis/rhoff
  9. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.01
    0.0126369465 = product of:
      0.037910838 = sum of:
        0.008788372 = weight(_text_:in in 611) [ClassicSimilarity], result of:
          0.008788372 = score(doc=611,freq=2.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.15028831 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
        0.029122464 = product of:
          0.05824493 = sum of:
            0.05824493 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.05824493 = score(doc=611,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Content
    Präsentation zum Vortrag anlässlich des 98. Deutscher Bibliothekartag in Erfurt: Ein neuer Blick auf Bibliotheken; TK10: Information erschließen und recherchieren Inhalte erschließen - mit neuen Tools
    Date
    22. 8.2009 12:54:24
  10. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.0126369465 = product of:
      0.037910838 = sum of:
        0.008788372 = weight(_text_:in in 2748) [ClassicSimilarity], result of:
          0.008788372 = score(doc=2748,freq=2.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.15028831 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
        0.029122464 = product of:
          0.05824493 = sum of:
            0.05824493 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.05824493 = score(doc=2748,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Date
    1. 2.2016 18:25:22
    Series
    Lecture notes in computer science ; 9398
  11. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01
    0.010896482 = product of:
      0.032689445 = sum of:
        0.012303721 = weight(_text_:in in 5273) [ClassicSimilarity], result of:
          0.012303721 = score(doc=5273,freq=8.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.21040362 = fieldWeight in 5273, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
        0.020385725 = product of:
          0.04077145 = sum of:
            0.04077145 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.04077145 = score(doc=5273,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
    Date
    22. 7.2006 16:24:52
  12. Dubin, D.: Dimensions and discriminability (1998) 0.01
    0.009695257 = product of:
      0.02908577 = sum of:
        0.008700045 = weight(_text_:in in 2338) [ClassicSimilarity], result of:
          0.008700045 = score(doc=2338,freq=4.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.14877784 = fieldWeight in 2338, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
        0.020385725 = product of:
          0.04077145 = sum of:
            0.04077145 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.04077145 = score(doc=2338,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
    Date
    22. 9.1997 19:16:05
  13. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.009695257 = product of:
      0.02908577 = sum of:
        0.008700045 = weight(_text_:in in 1673) [ClassicSimilarity], result of:
          0.008700045 = score(doc=1673,freq=4.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.14877784 = fieldWeight in 1673, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
        0.020385725 = product of:
          0.04077145 = sum of:
            0.04077145 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.04077145 = score(doc=1673,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    The Wolverhampton Web Library (WWLib) is a WWW search engine that provides access to UK based information. The experimental version developed in 1995, was a success but highlighted the need for a much higher degree of automation. An interesting feature of the experimental WWLib was that it organised information according to DDC. Discusses the advantages of classification and describes the automatic classifier that is being developed in Java as part of the new, fully automated WWLib
    Date
    1. 8.1996 22:08:06
  14. Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.01
    0.009339842 = product of:
      0.028019525 = sum of:
        0.010546046 = weight(_text_:in in 3051) [ClassicSimilarity], result of:
          0.010546046 = score(doc=3051,freq=8.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.18034597 = fieldWeight in 3051, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
        0.017473478 = product of:
          0.034946956 = sum of:
            0.034946956 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
              0.034946956 = score(doc=3051,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.23214069 = fieldWeight in 3051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3051)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Klassifikation von bibliografischen Einheiten ist für einen systematischen Zugang zu den Beständen einer Bibliothek und deren Aufstellung unumgänglich. Bislang wurde diese Aufgabe von Fachexperten manuell erledigt, sei es individuell nach einer selbst entwickelten Systematik oder kooperativ nach einer gemeinsamen Systematik. In dieser Arbeit wird ein Verfahren zur Automatisierung des Klassifikationsvorgangs vorgestellt. Dabei kommt das Verfahren des fallbasierten Schließens zum Einsatz, das im Kontext der Forschung zur künstlichen Intelligenz entwickelt wurde. Das Verfahren liefert für jedes Werk, für das bibliografische Daten vorliegen, eine oder mehrere mögliche Klassifikationen. In Experimenten werden die Ergebnisse der automatischen Klassifikation mit der durch Fachexperten verglichen. Diese Experimente belegen die hohe Qualität der automatischen Klassifikation und dass das Verfahren geeignet ist, Fachexperten bei der Klassifikationsarbeit signifikant zu entlasten. Auch die nahezu vollständige Resystematisierung eines Bibliothekskataloges ist - mit gewissen Abstrichen - möglich.
    Date
    22. 8.2009 19:51:28
    Source
    Wissen bewegen - Bibliotheken in der Informationsgesellschaft / 97. Deutscher Bibliothekartag in Mannheim, 2008. Hrsg. von Ulrich Hohoff und Per Knudsen. Bearb. von Stefan Siebert
  15. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01
    0.008868875 = product of:
      0.026606623 = sum of:
        0.009133145 = weight(_text_:in in 2158) [ClassicSimilarity], result of:
          0.009133145 = score(doc=2158,freq=6.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.1561842 = fieldWeight in 2158, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
        0.017473478 = product of:
          0.034946956 = sum of:
            0.034946956 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.034946956 = score(doc=2158,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
    Date
    4. 8.2015 19:22:04
  16. Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01
    0.008845862 = product of:
      0.026537586 = sum of:
        0.0061518606 = weight(_text_:in in 141) [ClassicSimilarity], result of:
          0.0061518606 = score(doc=141,freq=2.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.10520181 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
        0.020385725 = product of:
          0.04077145 = sum of:
            0.04077145 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
              0.04077145 = score(doc=141,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.2708308 = fieldWeight in 141, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=141)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Aufgabe der Datenanalyse ist es, Daten zu ordnen, übersichtlich darzustellen, verborgene und natürlich Strukturen zu entdecken, die diesbezüglich wesentlichen Eigenschaften herauszukristallisieren und zweckmäßige Modelle zur Beschreibung von Daten aufzustellen. Es wird ein Einblick in die Methoden und Prinzipien der Datenanalyse vermittelt. Anhand typischer Beispiele wird gezeigt, welche Daten analysiert, welche Strukturen betrachtet, welche Darstellungs- bzw. Ordnungsmethoden verwendet, welche Zielsetzungen verfolgt und welche Bewertungskriterien dabei angewendet werden können. Diskutiert wird auch die angemessene Verwendung der unterschiedlichen Methoden, wobei auf die gefahr und Art von Fehlinterpretationen hingewiesen wird
    Pages
    S.1-22
  17. Automatic classification research at OCLC (2002) 0.01
    0.008845862 = product of:
      0.026537586 = sum of:
        0.0061518606 = weight(_text_:in in 1563) [ClassicSimilarity], result of:
          0.0061518606 = score(doc=1563,freq=2.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.10520181 = fieldWeight in 1563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1563)
        0.020385725 = product of:
          0.04077145 = sum of:
            0.04077145 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.04077145 = score(doc=1563,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    OCLC enlists the cooperation of the world's libraries to make the written record of humankind's cultural heritage more accessible through electronic media. Part of this goal can be accomplished through the application of the principles of knowledge organization. We believe that cultural artifacts are effectively lost unless they are indexed, cataloged and classified. Accordingly, OCLC has developed products, sponsored research projects, and encouraged the participation in international standards communities whose outcome has been improved library classification schemes, cataloging productivity tools, and new proposals for the creation and maintenance of metadata. Though cataloging and classification requires expert intellectual effort, we recognize that at least some of the work must be automated if we hope to keep pace with cultural change
    Date
    5. 5.2003 9:22:09
  18. Mengle, S.; Goharian, N.: Passage detection using text classification (2009) 0.01
    0.008441582 = product of:
      0.025324747 = sum of:
        0.010763514 = weight(_text_:in in 2765) [ClassicSimilarity], result of:
          0.010763514 = score(doc=2765,freq=12.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.18406484 = fieldWeight in 2765, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2765)
        0.014561232 = product of:
          0.029122464 = sum of:
            0.029122464 = weight(_text_:22 in 2765) [ClassicSimilarity], result of:
              0.029122464 = score(doc=2765,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.19345059 = fieldWeight in 2765, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2765)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organizations. Passage retrieval is well studied; we posit, however, that passage detection is not. Passage retrieval is the determination of the degree of relevance of blocks of text, namely passages, comprising a document. Rather than determining the relevance of a document in its entirety, passage retrieval determines the relevance of the individual passages. As such, modified traditional information-retrieval techniques compare terms found in user queries with the individual passages to determine a similarity score for passages of interest. In passage detection, passages are classified into predetermined categories. More often than not, passage detection techniques are deployed to detect hidden paragraphs in documents. That is, to hide information, documents are injected with hidden text into passages. Rather than matching query terms against passages to determine their relevance, using text-mining techniques, the passages are classified. Those documents with hidden passages are defined as infected. Thus, simply stated, passage retrieval is the search for passages relevant to a user query, while passage detection is the classification of passages. That is, in passage detection, passages are labeled with one or more categories from a set of predetermined categories. We present a keyword-based dynamic passage approach (KDP) and demonstrate that KDP outperforms statistically significantly (99% confidence) the other document-splitting approaches by 12% to 18% in the passage detection and passage category-prediction tasks. Furthermore, we evaluate the effects of the feature selection, passage length, ambiguous passages, and finally training-data category distribution on passage-detection accuracy.
    Date
    22. 3.2009 19:14:43
  19. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.01
    0.00831022 = product of:
      0.02493066 = sum of:
        0.007457182 = weight(_text_:in in 2760) [ClassicSimilarity], result of:
          0.007457182 = score(doc=2760,freq=4.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.12752387 = fieldWeight in 2760, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
        0.017473478 = product of:
          0.034946956 = sum of:
            0.034946956 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
              0.034946956 = score(doc=2760,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.23214069 = fieldWeight in 2760, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2760)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    Information is often organized as a text hierarchy. A hierarchical text-classification system is thus essential for the management, sharing, and dissemination of information. It aims to automatically classify each incoming document into zero, one, or several categories in the text hierarchy. In this paper, we present a technique called CRHTC (context recognition for hierarchical text classification) that performs hierarchical text classification by recognizing the context of discussion (COD) of each category. A category's COD is governed by its ancestor categories, whose contents indicate contextual backgrounds of the category. A document may be classified into a category only if its content matches the category's COD. CRHTC does not require any trials to manually set parameters, and hence is more portable and easier to implement than other methods. It is empirically evaluated under various conditions. The results show that CRHTC achieves both better and more stable performance than several hierarchical and nonhierarchical text-classification methodologies.
    Date
    22. 3.2009 19:11:54
  20. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01
    0.007582167 = product of:
      0.022746501 = sum of:
        0.005273023 = weight(_text_:in in 690) [ClassicSimilarity], result of:
          0.005273023 = score(doc=690,freq=2.0), product of:
            0.058476754 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.042989567 = queryNorm
            0.09017298 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
        0.017473478 = product of:
          0.034946956 = sum of:
            0.034946956 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.034946956 = score(doc=690,freq=2.0), product of:
                0.15054214 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042989567 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.5 = coord(1/2)
      0.33333334 = coord(2/6)
    
    Abstract
    We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
    Date
    23. 3.2013 13:22:36

Years

Languages

  • e 153
  • d 32
  • a 1
  • chi 1
  • More… Less…

Types

  • a 158
  • el 23
  • x 6
  • m 4
  • r 3
  • s 2
  • d 1
  • More… Less…