Search (1346 results, page 1 of 68)

  • × year_i:[2000 TO 2010}
  1. Yang, C.C.; Liu, N.: Web site topic-hierarchy generation based on link structure (2009) 0.14
    0.1355013 = product of:
      0.2710026 = sum of:
        0.2710026 = sum of:
          0.23716255 = weight(_text_:tree in 2738) [ClassicSimilarity], result of:
            0.23716255 = score(doc=2738,freq=8.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.7242567 = fieldWeight in 2738, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2738)
          0.03384006 = weight(_text_:22 in 2738) [ClassicSimilarity], result of:
            0.03384006 = score(doc=2738,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.19345059 = fieldWeight in 2738, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2738)
      0.5 = coord(1/2)
    
    Abstract
    Navigating through hyperlinks within a Web site to look for information from one of its Web pages without the support of a site map can be inefficient and ineffective. Although the content of a Web site is usually organized with an inherent structure like a topic hierarchy, which is a directed tree rooted at a Web site's homepage whose vertices and edges correspond to Web pages and hyperlinks, such a topic hierarchy is not always available to the user. In this work, we studied the problem of automatic generation of Web sites' topic hierarchies. We modeled a Web site's link structure as a weighted directed graph and proposed methods for estimating edge weights based on eight types of features and three learning algorithms, namely decision trees, naïve Bayes classifiers, and logistic regression. Three graph algorithms, namely breadth-first search, shortest-path search, and directed minimum-spanning tree, were adapted to generate the topic hierarchy based on the graph model. We have tested the model and algorithms on real Web sites. It is found that the directed minimum-spanning tree algorithm with the decision tree as the weight learning algorithm achieves the highest performance with an average accuracy of 91.9%.
    Date
    22. 3.2009 12:51:47
  2. Trotman, A.: Searching structured documents (2004) 0.11
    0.10669494 = product of:
      0.21338987 = sum of:
        0.21338987 = sum of:
          0.16601379 = weight(_text_:tree in 2538) [ClassicSimilarity], result of:
            0.16601379 = score(doc=2538,freq=2.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.5069797 = fieldWeight in 2538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
          0.047376085 = weight(_text_:22 in 2538) [ClassicSimilarity], result of:
            0.047376085 = score(doc=2538,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.2708308 = fieldWeight in 2538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
      0.5 = coord(1/2)
    
    Abstract
    Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.
    Date
    14. 8.2004 10:39:22
  3. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.11
    0.10669494 = product of:
      0.21338987 = sum of:
        0.21338987 = sum of:
          0.16601379 = weight(_text_:tree in 5273) [ClassicSimilarity], result of:
            0.16601379 = score(doc=5273,freq=2.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.5069797 = fieldWeight in 5273, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5273)
          0.047376085 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
            0.047376085 = score(doc=5273,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.2708308 = fieldWeight in 5273, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
    
    Abstract
    In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
    Date
    22. 7.2006 16:24:52
  4. Rao, R.: ¬Der 'Hyperbolic tree' und seine Verwandten : 3D-Interfaces erleichtern den Umgang mit grossen Datenmengen (2000) 0.10
    0.10061955 = product of:
      0.2012391 = sum of:
        0.2012391 = product of:
          0.4024782 = sum of:
            0.4024782 = weight(_text_:tree in 5053) [ClassicSimilarity], result of:
              0.4024782 = score(doc=5053,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                1.2291044 = fieldWeight in 5053, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5053)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Object
    Hyperbolic tree
  5. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10
    0.09964347 = sum of:
      0.07933943 = product of:
        0.23801827 = sum of:
          0.23801827 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.23801827 = score(doc=562,freq=2.0), product of:
              0.42350647 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.049953517 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.33333334 = coord(1/3)
      0.020304035 = product of:
        0.04060807 = sum of:
          0.04060807 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.04060807 = score(doc=562,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  6. Homeopathic thesaurus : keyterms to be used in homeopathy (2000) 0.09
    0.09486502 = product of:
      0.18973003 = sum of:
        0.18973003 = product of:
          0.37946007 = sum of:
            0.37946007 = weight(_text_:tree in 3808) [ClassicSimilarity], result of:
              0.37946007 = score(doc=3808,freq=2.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                1.1588107 = fieldWeight in 3808, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.125 = fieldNorm(doc=3808)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Issue
    Tree structure and alphabetical list.
  7. Siva, S.: Document identification and classification using transform coding of gray scale projections and neural tree network (2000) 0.08
    0.083006896 = product of:
      0.16601379 = sum of:
        0.16601379 = product of:
          0.33202758 = sum of:
            0.33202758 = weight(_text_:tree in 1970) [ClassicSimilarity], result of:
              0.33202758 = score(doc=1970,freq=2.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                1.0139594 = fieldWeight in 1970, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1970)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  8. Zhang, M.; Zhou, G.D.; Aw, A.: Exploring syntactic structured features over parse trees for relation extraction using kernel methods (2008) 0.08
    0.07843415 = product of:
      0.1568683 = sum of:
        0.1568683 = product of:
          0.3137366 = sum of:
            0.3137366 = weight(_text_:tree in 2055) [ClassicSimilarity], result of:
              0.3137366 = score(doc=2055,freq=14.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.95810163 = fieldWeight in 2055, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2055)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effectively explore implicitly huge syntactic structured features embedded in a parse tree. Our study reveals that the syntactic structured features embedded in a parse tree are very effective in relation extraction and can be well captured by the convolution tree kernel. Evaluation on the ACE benchmark corpora shows that using the convolution tree kernel only can achieve comparable performance with previous best-reported feature-based methods. It also shows that our method significantly outperforms previous two dependency tree kernels for relation extraction. Moreover, this paper proposes a composite kernel for relation extraction by combining the convolution tree kernel with a simple linear kernel. Our study reveals that the composite kernel can effectively capture both flat and structured features without extensive feature engineering, and easily scale to include more features. Evaluation on the ACE benchmark corpora shows that the composite kernel outperforms previous best-reported methods in relation extraction.
  9. Yager, R.R.: Knowledge trees and protoforms in question-answering systems (2006) 0.08
    0.07621066 = product of:
      0.15242133 = sum of:
        0.15242133 = sum of:
          0.11858127 = weight(_text_:tree in 5281) [ClassicSimilarity], result of:
            0.11858127 = score(doc=5281,freq=2.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.36212835 = fieldWeight in 5281, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5281)
          0.03384006 = weight(_text_:22 in 5281) [ClassicSimilarity], result of:
            0.03384006 = score(doc=5281,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.19345059 = fieldWeight in 5281, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5281)
      0.5 = coord(1/2)
    
    Abstract
    We point out that question-answering systems differ from other information-seeking applications, such as search engines, by having a deduction capability, an ability to answer questions by a synthesis of information residing in different parts of its knowledge base. This capability requires appropriate representation of various types of human knowledge, rules for locally manipulating this knowledge, and a framework for providing a global plan for appropriately mobilizing the information in the knowledge to address the question posed. In this article we suggest tools to provide these capabilities. We describe how the fuzzy set-based theory of approximate reasoning can aid in the process of representing knowledge. We discuss how protoforms can be used to aid in deduction and local manipulation of knowledge. The idea of a knowledge tree is introduced to provide a global framework for mobilizing the knowledge base in response to a query. We look at some types of commonsense and default knowledge. This requires us to address the complexity of the nonmonotonicity that these types of knowledge often display. We also briefly discuss the role that Dempster-Shafer structures can play in representing knowledge.
    Date
    22. 7.2006 17:10:27
  10. Craig, A.; Schriar, S.: ¬The Find-It! Illinois controlled vocabulary : improving access to government information through the Jessica subject tree (2001) 0.07
    0.07114877 = product of:
      0.14229754 = sum of:
        0.14229754 = product of:
          0.28459507 = sum of:
            0.28459507 = weight(_text_:tree in 6319) [ClassicSimilarity], result of:
              0.28459507 = score(doc=6319,freq=2.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.8691081 = fieldWeight in 6319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6319)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  11. White, K.J.; Sutcliffe, R.F.E.: Applying incremental tree induction to retrieval : from manuals and medical texts (2006) 0.07
    0.07114877 = product of:
      0.14229754 = sum of:
        0.14229754 = product of:
          0.28459507 = sum of:
            0.28459507 = weight(_text_:tree in 5044) [ClassicSimilarity], result of:
              0.28459507 = score(doc=5044,freq=8.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.8691081 = fieldWeight in 5044, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5044)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (IIS) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
  12. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.07
    0.07114877 = product of:
      0.14229754 = sum of:
        0.14229754 = product of:
          0.28459507 = sum of:
            0.28459507 = weight(_text_:tree in 1611) [ClassicSimilarity], result of:
              0.28459507 = score(doc=1611,freq=8.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.8691081 = fieldWeight in 1611, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1611)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction.
  13. Diaz, I.; Morato, J.; Lioréns, J.: ¬An algorithm for term conflation based on tree structures (2002) 0.07
    0.0670797 = product of:
      0.1341594 = sum of:
        0.1341594 = product of:
          0.2683188 = sum of:
            0.2683188 = weight(_text_:tree in 246) [ClassicSimilarity], result of:
              0.2683188 = score(doc=246,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.81940293 = fieldWeight in 246, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.0625 = fieldNorm(doc=246)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This work presents a new stemming algorithm. This algorithm stores the stemming information in tree structures. This storage allows us to enhance the performance of the algorithm due to the reduction of the search space and the overall complexity. The final result of that stemming algorithm is a normalized concept, understanding this process as the automatic extraction of the generic form (or a lexeme) for a selected term.
  14. Philipkoski, K.: Growing the tree of life (2007) 0.07
    0.0670797 = product of:
      0.1341594 = sum of:
        0.1341594 = product of:
          0.2683188 = sum of:
            0.2683188 = weight(_text_:tree in 1209) [ClassicSimilarity], result of:
              0.2683188 = score(doc=1209,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.81940293 = fieldWeight in 1209, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1209)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Content
    2 Abbildungen: 1: "tree of life," crowded as it is, lays out just 3,000 of the 1.8 million known species based on RNA sequences. The fuzzy-looking edge of the circle is made of species names. 2: Carolus Linnaeus' first edition of Systema Naturae, published in 1735, summarized all of nature in just 11 pages.
  15. Sieber, W.: Visualisierung von Thesaurus-Strukturen unter besonderer Berücksichtigung eines Hyperbolic Tree Views (2004) 0.06
    0.05869474 = product of:
      0.11738948 = sum of:
        0.11738948 = product of:
          0.23477896 = sum of:
            0.23477896 = weight(_text_:tree in 1456) [ClassicSimilarity], result of:
              0.23477896 = score(doc=1456,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.7169776 = fieldWeight in 1456, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1456)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Object
    Hyperbolic tree
  16. Ou, S.; Khoo, C.; Goh, D.H.; Heng, H.-Y.: Automatic discourse parsing of sociology dissertation abstracts as sentence categorization (2004) 0.06
    0.058092725 = product of:
      0.11618545 = sum of:
        0.11618545 = product of:
          0.2323709 = sum of:
            0.2323709 = weight(_text_:tree in 2676) [ClassicSimilarity], result of:
              0.2323709 = score(doc=2676,freq=12.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.70962375 = fieldWeight in 2676, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2676)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We investigated an approach to automatic discourse parsing of sociology dissertation abstracts as a sentence categorization task. Decision tree induction was used for the automatic categorization. Three models were developed. Model 1 made use of word tokens found in the sentences. Model 2 made use of both word tokens and sentence position in the abstract. In addition to the attributes used in Model 2, Model 3 also considered information regarding the presence of indicator words in surrounding sentences. Model 3 obtained the highest accuracy rate of 74.5 % when applied to a test sample, compared to 71.6% for Model 2 and 60.8% for Model 1. The results indicated that information about sentence position can substantially increase the accuracy of categorization, and indicator words in earlier sentences (before the sentence being processed) also contribute to the categorization accuracy.
    Content
    1. Introduction This paper reports our initial effort to develop an automatic method for parsing the discourse structure of sociology dissertation abstracts. This study is part of a broader study to develop a method for multi-document summarization. Accurate discourse parsing will make it easier to perform automatic multi-document summarization of dissertation abstracts. In a previous study, we determined that the macro-level structure of dissertation abstracts typically has five sections (Khoo et al., 2002). In this study, we treated discourse parsing as a text categorization problem - assigning each sentence in a dissertation abstract to one of the five predefined sections or categories. Decision tree induction, a machine-learning method, was applied to word tokens found in the abstracts to construct a decision tree model for the categorization purpose. Decision tree induction was selected primarily because decision tree models are easy to interpret and can be converted to rules that can be incorporated in other computer programs. A well-known decision-tree induction program, C5.0 (Quinlan, 1993), was used in this study.
  17. Schrodt, R.: Tiefen und Untiefen im wissenschaftlichen Sprachgebrauch (2008) 0.05
    0.052892953 = product of:
      0.10578591 = sum of:
        0.10578591 = product of:
          0.31735772 = sum of:
            0.31735772 = weight(_text_:3a in 140) [ClassicSimilarity], result of:
              0.31735772 = score(doc=140,freq=2.0), product of:
                0.42350647 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.049953517 = queryNorm
                0.7493574 = fieldWeight in 140, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0625 = fieldNorm(doc=140)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Content
    Vgl. auch: https://studylibde.com/doc/13053640/richard-schrodt. Vgl. auch: http%3A%2F%2Fwww.univie.ac.at%2FGermanistik%2Fschrodt%2Fvorlesung%2Fwissenschaftssprache.doc&usg=AOvVaw1lDLDR6NFf1W0-oC9mEUJf.
  18. Schlieder, T.; Meuss, H.: Querying and ranking XML documents (2002) 0.05
    0.050309774 = product of:
      0.10061955 = sum of:
        0.10061955 = product of:
          0.2012391 = sum of:
            0.2012391 = weight(_text_:tree in 459) [ClassicSimilarity], result of:
              0.2012391 = score(doc=459,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.6145522 = fieldWeight in 459, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=459)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    XML represents both content and structure of documents. Taking advantage of the document structure promises to greatly improve the retrieval precision. In this article, we present a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Our query model is based on tree matching as a simple and elegant means to formulate queries without knowing the exact structure of the data. Using this query model we propose a logical document concept by deciding on the document boundaries at query time. We combine structured queries and term-based ranking by extending the term concept to structural terms that include substructures of queries and documents. The notions of term frequency and inverse document frequency are adapted to logical documents and structural terms. We introduce an efficient technique to calculate all necessary term frequencies and inverse document frequencies at query time. By adjusting parameters of the retrieval process we are able to model two contrary approaches: the classical vector space model, and the original tree matching approach.
  19. Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.05
    0.050309774 = product of:
      0.10061955 = sum of:
        0.10061955 = product of:
          0.2012391 = sum of:
            0.2012391 = weight(_text_:tree in 2218) [ClassicSimilarity], result of:
              0.2012391 = score(doc=2218,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.6145522 = fieldWeight in 2218, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2218)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
  20. Sieber, W.: Thesaurus-Arbeit versus Informationsvisualisierung : Analyse und Evaluation am Maßstab der Usability (2007) 0.05
    0.050309774 = product of:
      0.10061955 = sum of:
        0.10061955 = product of:
          0.2012391 = sum of:
            0.2012391 = weight(_text_:tree in 1457) [ClassicSimilarity], result of:
              0.2012391 = score(doc=1457,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.6145522 = fieldWeight in 1457, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1457)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    Basiert auf einer Diplomarbeit "Visualisierung von Thesaurus-Strukturen unter besonderer Berücksichtigung eines Hyperbolic Tree Views" an der FH Köln, Institut für Informationswissenschaft, Studiengang Informationswirtschaft, 2004.
    Object
    Hyperbolic tree

Languages

Types

  • a 1125
  • m 151
  • el 68
  • s 52
  • b 26
  • x 15
  • i 9
  • n 2
  • r 1
  • More… Less…

Themes

Subjects

Classifications