Search (1125 results, page 1 of 57)

  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Yang, C.C.; Liu, N.: Web site topic-hierarchy generation based on link structure (2009) 0.14
    0.1355013 = product of:
      0.2710026 = sum of:
        0.2710026 = sum of:
          0.23716255 = weight(_text_:tree in 2738) [ClassicSimilarity], result of:
            0.23716255 = score(doc=2738,freq=8.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.7242567 = fieldWeight in 2738, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2738)
          0.03384006 = weight(_text_:22 in 2738) [ClassicSimilarity], result of:
            0.03384006 = score(doc=2738,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.19345059 = fieldWeight in 2738, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2738)
      0.5 = coord(1/2)
    
    Abstract
    Navigating through hyperlinks within a Web site to look for information from one of its Web pages without the support of a site map can be inefficient and ineffective. Although the content of a Web site is usually organized with an inherent structure like a topic hierarchy, which is a directed tree rooted at a Web site's homepage whose vertices and edges correspond to Web pages and hyperlinks, such a topic hierarchy is not always available to the user. In this work, we studied the problem of automatic generation of Web sites' topic hierarchies. We modeled a Web site's link structure as a weighted directed graph and proposed methods for estimating edge weights based on eight types of features and three learning algorithms, namely decision trees, naïve Bayes classifiers, and logistic regression. Three graph algorithms, namely breadth-first search, shortest-path search, and directed minimum-spanning tree, were adapted to generate the topic hierarchy based on the graph model. We have tested the model and algorithms on real Web sites. It is found that the directed minimum-spanning tree algorithm with the decision tree as the weight learning algorithm achieves the highest performance with an average accuracy of 91.9%.
    Date
    22. 3.2009 12:51:47
  2. Trotman, A.: Searching structured documents (2004) 0.11
    0.10669494 = product of:
      0.21338987 = sum of:
        0.21338987 = sum of:
          0.16601379 = weight(_text_:tree in 2538) [ClassicSimilarity], result of:
            0.16601379 = score(doc=2538,freq=2.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.5069797 = fieldWeight in 2538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
          0.047376085 = weight(_text_:22 in 2538) [ClassicSimilarity], result of:
            0.047376085 = score(doc=2538,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.2708308 = fieldWeight in 2538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
      0.5 = coord(1/2)
    
    Abstract
    Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.
    Date
    14. 8.2004 10:39:22
  3. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.11
    0.10669494 = product of:
      0.21338987 = sum of:
        0.21338987 = sum of:
          0.16601379 = weight(_text_:tree in 5273) [ClassicSimilarity], result of:
            0.16601379 = score(doc=5273,freq=2.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.5069797 = fieldWeight in 5273, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5273)
          0.047376085 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
            0.047376085 = score(doc=5273,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.2708308 = fieldWeight in 5273, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
    
    Abstract
    In text categorization tasks, classification on some class hierarchies has better results than in cases without the hierarchy. Currently, because a large number of documents are divided into several subgroups in a hierarchy, we can appropriately use a hierarchical classification method. However, we have no systematic method to build a hierarchical classification system that performs well with large collections of practical data. In this article, we introduce a new evaluation scheme for internal node classifiers, which can be used effectively to develop a hierarchical classification system. We also show that our method for constructing the hierarchical classification system is very effective, especially for the task of constructing classifiers applied to hierarchy tree with a lot of levels.
    Date
    22. 7.2006 16:24:52
  4. Rao, R.: ¬Der 'Hyperbolic tree' und seine Verwandten : 3D-Interfaces erleichtern den Umgang mit grossen Datenmengen (2000) 0.10
    0.10061955 = product of:
      0.2012391 = sum of:
        0.2012391 = product of:
          0.4024782 = sum of:
            0.4024782 = weight(_text_:tree in 5053) [ClassicSimilarity], result of:
              0.4024782 = score(doc=5053,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                1.2291044 = fieldWeight in 5053, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5053)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Object
    Hyperbolic tree
  5. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10
    0.09964347 = sum of:
      0.07933943 = product of:
        0.23801827 = sum of:
          0.23801827 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.23801827 = score(doc=562,freq=2.0), product of:
              0.42350647 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.049953517 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.33333334 = coord(1/3)
      0.020304035 = product of:
        0.04060807 = sum of:
          0.04060807 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.04060807 = score(doc=562,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  6. Siva, S.: Document identification and classification using transform coding of gray scale projections and neural tree network (2000) 0.08
    0.083006896 = product of:
      0.16601379 = sum of:
        0.16601379 = product of:
          0.33202758 = sum of:
            0.33202758 = weight(_text_:tree in 1970) [ClassicSimilarity], result of:
              0.33202758 = score(doc=1970,freq=2.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                1.0139594 = fieldWeight in 1970, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1970)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  7. Zhang, M.; Zhou, G.D.; Aw, A.: Exploring syntactic structured features over parse trees for relation extraction using kernel methods (2008) 0.08
    0.07843415 = product of:
      0.1568683 = sum of:
        0.1568683 = product of:
          0.3137366 = sum of:
            0.3137366 = weight(_text_:tree in 2055) [ClassicSimilarity], result of:
              0.3137366 = score(doc=2055,freq=14.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.95810163 = fieldWeight in 2055, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2055)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effectively explore implicitly huge syntactic structured features embedded in a parse tree. Our study reveals that the syntactic structured features embedded in a parse tree are very effective in relation extraction and can be well captured by the convolution tree kernel. Evaluation on the ACE benchmark corpora shows that using the convolution tree kernel only can achieve comparable performance with previous best-reported feature-based methods. It also shows that our method significantly outperforms previous two dependency tree kernels for relation extraction. Moreover, this paper proposes a composite kernel for relation extraction by combining the convolution tree kernel with a simple linear kernel. Our study reveals that the composite kernel can effectively capture both flat and structured features without extensive feature engineering, and easily scale to include more features. Evaluation on the ACE benchmark corpora shows that the composite kernel outperforms previous best-reported methods in relation extraction.
  8. Yager, R.R.: Knowledge trees and protoforms in question-answering systems (2006) 0.08
    0.07621066 = product of:
      0.15242133 = sum of:
        0.15242133 = sum of:
          0.11858127 = weight(_text_:tree in 5281) [ClassicSimilarity], result of:
            0.11858127 = score(doc=5281,freq=2.0), product of:
              0.32745647 = queryWeight, product of:
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.049953517 = queryNorm
              0.36212835 = fieldWeight in 5281, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.5552235 = idf(docFreq=170, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5281)
          0.03384006 = weight(_text_:22 in 5281) [ClassicSimilarity], result of:
            0.03384006 = score(doc=5281,freq=2.0), product of:
              0.17492871 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.049953517 = queryNorm
              0.19345059 = fieldWeight in 5281, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5281)
      0.5 = coord(1/2)
    
    Abstract
    We point out that question-answering systems differ from other information-seeking applications, such as search engines, by having a deduction capability, an ability to answer questions by a synthesis of information residing in different parts of its knowledge base. This capability requires appropriate representation of various types of human knowledge, rules for locally manipulating this knowledge, and a framework for providing a global plan for appropriately mobilizing the information in the knowledge to address the question posed. In this article we suggest tools to provide these capabilities. We describe how the fuzzy set-based theory of approximate reasoning can aid in the process of representing knowledge. We discuss how protoforms can be used to aid in deduction and local manipulation of knowledge. The idea of a knowledge tree is introduced to provide a global framework for mobilizing the knowledge base in response to a query. We look at some types of commonsense and default knowledge. This requires us to address the complexity of the nonmonotonicity that these types of knowledge often display. We also briefly discuss the role that Dempster-Shafer structures can play in representing knowledge.
    Date
    22. 7.2006 17:10:27
  9. Craig, A.; Schriar, S.: ¬The Find-It! Illinois controlled vocabulary : improving access to government information through the Jessica subject tree (2001) 0.07
    0.07114877 = product of:
      0.14229754 = sum of:
        0.14229754 = product of:
          0.28459507 = sum of:
            0.28459507 = weight(_text_:tree in 6319) [ClassicSimilarity], result of:
              0.28459507 = score(doc=6319,freq=2.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.8691081 = fieldWeight in 6319, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6319)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  10. White, K.J.; Sutcliffe, R.F.E.: Applying incremental tree induction to retrieval : from manuals and medical texts (2006) 0.07
    0.07114877 = product of:
      0.14229754 = sum of:
        0.14229754 = product of:
          0.28459507 = sum of:
            0.28459507 = weight(_text_:tree in 5044) [ClassicSimilarity], result of:
              0.28459507 = score(doc=5044,freq=8.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.8691081 = fieldWeight in 5044, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5044)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Decision Tree Forest (DTF) is an architecture for information retrieval that uses a separate decision tree for each document in a collection. Experiments were conducted in which DTFs working with the incremental tree induction (ITI) algorithm of Utgoff, Berkman, and Clouse (1997) were trained and evaluated in the medical and word processing domains using the Cystic Fibrosis and SIFT collections. Performance was compared with that of a conventional inverted index system (IIS) using a BM25-derived probabilistic matching function. Initial results using DTF were poor compared to those obtained with IIS. We then simulated scenarios in which large quantities of training data were available, by using only those parts of the document collection that were well covered by the data sets. Consequently, the retrieval effectiveness of DTF improved substantially. In one particular experiment, precision and recall for DTF were 0.65 and 0.67 respectively, values that compared favorably with values of 0.49 and 0.56 for IIS.
  11. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.07
    0.07114877 = product of:
      0.14229754 = sum of:
        0.14229754 = product of:
          0.28459507 = sum of:
            0.28459507 = weight(_text_:tree in 1611) [ClassicSimilarity], result of:
              0.28459507 = score(doc=1611,freq=8.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.8691081 = fieldWeight in 1611, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1611)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction.
  12. Diaz, I.; Morato, J.; Lioréns, J.: ¬An algorithm for term conflation based on tree structures (2002) 0.07
    0.0670797 = product of:
      0.1341594 = sum of:
        0.1341594 = product of:
          0.2683188 = sum of:
            0.2683188 = weight(_text_:tree in 246) [ClassicSimilarity], result of:
              0.2683188 = score(doc=246,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.81940293 = fieldWeight in 246, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.0625 = fieldNorm(doc=246)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This work presents a new stemming algorithm. This algorithm stores the stemming information in tree structures. This storage allows us to enhance the performance of the algorithm due to the reduction of the search space and the overall complexity. The final result of that stemming algorithm is a normalized concept, understanding this process as the automatic extraction of the generic form (or a lexeme) for a selected term.
  13. Ou, S.; Khoo, C.; Goh, D.H.; Heng, H.-Y.: Automatic discourse parsing of sociology dissertation abstracts as sentence categorization (2004) 0.06
    0.058092725 = product of:
      0.11618545 = sum of:
        0.11618545 = product of:
          0.2323709 = sum of:
            0.2323709 = weight(_text_:tree in 2676) [ClassicSimilarity], result of:
              0.2323709 = score(doc=2676,freq=12.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.70962375 = fieldWeight in 2676, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2676)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We investigated an approach to automatic discourse parsing of sociology dissertation abstracts as a sentence categorization task. Decision tree induction was used for the automatic categorization. Three models were developed. Model 1 made use of word tokens found in the sentences. Model 2 made use of both word tokens and sentence position in the abstract. In addition to the attributes used in Model 2, Model 3 also considered information regarding the presence of indicator words in surrounding sentences. Model 3 obtained the highest accuracy rate of 74.5 % when applied to a test sample, compared to 71.6% for Model 2 and 60.8% for Model 1. The results indicated that information about sentence position can substantially increase the accuracy of categorization, and indicator words in earlier sentences (before the sentence being processed) also contribute to the categorization accuracy.
    Content
    1. Introduction This paper reports our initial effort to develop an automatic method for parsing the discourse structure of sociology dissertation abstracts. This study is part of a broader study to develop a method for multi-document summarization. Accurate discourse parsing will make it easier to perform automatic multi-document summarization of dissertation abstracts. In a previous study, we determined that the macro-level structure of dissertation abstracts typically has five sections (Khoo et al., 2002). In this study, we treated discourse parsing as a text categorization problem - assigning each sentence in a dissertation abstract to one of the five predefined sections or categories. Decision tree induction, a machine-learning method, was applied to word tokens found in the abstracts to construct a decision tree model for the categorization purpose. Decision tree induction was selected primarily because decision tree models are easy to interpret and can be converted to rules that can be incorporated in other computer programs. A well-known decision-tree induction program, C5.0 (Quinlan, 1993), was used in this study.
  14. Schrodt, R.: Tiefen und Untiefen im wissenschaftlichen Sprachgebrauch (2008) 0.05
    0.052892953 = product of:
      0.10578591 = sum of:
        0.10578591 = product of:
          0.31735772 = sum of:
            0.31735772 = weight(_text_:3a in 140) [ClassicSimilarity], result of:
              0.31735772 = score(doc=140,freq=2.0), product of:
                0.42350647 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.049953517 = queryNorm
                0.7493574 = fieldWeight in 140, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0625 = fieldNorm(doc=140)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Content
    Vgl. auch: https://studylibde.com/doc/13053640/richard-schrodt. Vgl. auch: http%3A%2F%2Fwww.univie.ac.at%2FGermanistik%2Fschrodt%2Fvorlesung%2Fwissenschaftssprache.doc&usg=AOvVaw1lDLDR6NFf1W0-oC9mEUJf.
  15. Schlieder, T.; Meuss, H.: Querying and ranking XML documents (2002) 0.05
    0.050309774 = product of:
      0.10061955 = sum of:
        0.10061955 = product of:
          0.2012391 = sum of:
            0.2012391 = weight(_text_:tree in 459) [ClassicSimilarity], result of:
              0.2012391 = score(doc=459,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.6145522 = fieldWeight in 459, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=459)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    XML represents both content and structure of documents. Taking advantage of the document structure promises to greatly improve the retrieval precision. In this article, we present a retrieval technique that adopts the similarity measure of the vector space model, incorporates the document structure, and supports structured queries. Our query model is based on tree matching as a simple and elegant means to formulate queries without knowing the exact structure of the data. Using this query model we propose a logical document concept by deciding on the document boundaries at query time. We combine structured queries and term-based ranking by extending the term concept to structural terms that include substructures of queries and documents. The notions of term frequency and inverse document frequency are adapted to logical documents and structural terms. We introduce an efficient technique to calculate all necessary term frequencies and inverse document frequencies at query time. By adjusting parameters of the retrieval process we are able to model two contrary approaches: the classical vector space model, and the original tree matching approach.
  16. Frank, E.; Paynter, G.W.: Predicting Library of Congress Classifications from Library of Congress Subject Headings (2004) 0.05
    0.050309774 = product of:
      0.10061955 = sum of:
        0.10061955 = product of:
          0.2012391 = sum of:
            0.2012391 = weight(_text_:tree in 2218) [ClassicSimilarity], result of:
              0.2012391 = score(doc=2218,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.6145522 = fieldWeight in 2218, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2218)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper addresses the problem of automatically assigning a Library of Congress Classification (LCC) to a work given its set of Library of Congress Subject Headings (LCSH). LCCs are organized in a tree: The root node of this hierarchy comprises all possible topics, and leaf nodes correspond to the most specialized topic areas defined. We describe a procedure that, given a resource identified by its LCSH, automatically places that resource in the LCC hierarchy. The procedure uses machine learning techniques and training data from a large library catalog to learn a model that maps from sets of LCSH to classifications from the LCC tree. We present empirical results for our technique showing its accuracy an an independent collection of 50,000 LCSH/LCC pairs.
  17. Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.05
    0.050309774 = product of:
      0.10061955 = sum of:
        0.10061955 = product of:
          0.2012391 = sum of:
            0.2012391 = weight(_text_:tree in 2555) [ClassicSimilarity], result of:
              0.2012391 = score(doc=2555,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.6145522 = fieldWeight in 2555, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2555)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.
  18. Gao, K.; Wang, Y.-C.; Wang, Z.-Q.: Similar interest clustering and partial back-propagation-based recommendation in digital library (2005) 0.05
    0.050309774 = product of:
      0.10061955 = sum of:
        0.10061955 = product of:
          0.2012391 = sum of:
            0.2012391 = weight(_text_:tree in 2582) [ClassicSimilarity], result of:
              0.2012391 = score(doc=2582,freq=4.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.6145522 = fieldWeight in 2582, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2582)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - This purpose of this paper is to propose a recommendation approach for information retrieval. Design/methodology/approach - Relevant results are presented on the basis of a novel data structure named FPT-tree, which is used to get common interests. Then, data is trained by using a partial back-propagation neural network. The learning is guided by users' click behaviors. Findings - Experimental results have shown the effectiveness of the approach. Originality/value - The approach attempts to integrate metric of interests (e.g., click behavior, ranking) into the strategy of the recommendation system. Relevant results are first presented on the basis of a novel data structure named FPT-tree, and then, those results are trained through a partial back-propagation neural network. The learning is guided by users' click behaviors.
  19. Yi, K.; Chan, L.M.: Linking folksonomy to Library of Congress subject headings : an exploratory study (2009) 0.05
    0.04743251 = product of:
      0.09486502 = sum of:
        0.09486502 = product of:
          0.18973003 = sum of:
            0.18973003 = weight(_text_:tree in 3616) [ClassicSimilarity], result of:
              0.18973003 = score(doc=3616,freq=8.0), product of:
                0.32745647 = queryWeight, product of:
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.049953517 = queryNorm
                0.57940537 = fieldWeight in 3616, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  6.5552235 = idf(docFreq=170, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3616)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose - The purpose of this paper is to investigate the linking of a folksonomy (user vocabulary) and LCSH (controlled vocabulary) on the basis of word matching, for the potential use of LCSH in bringing order to folksonomies. Design/methodology/approach - A selected sample of a folksonomy from a popular collaborative tagging system, Delicious, was word-matched with LCSH. LCSH was transformed into a tree structure called an LCSH tree for the matching. A close examination was conducted on the characteristics of folksonomies, the overlap of folksonomies with LCSH, and the distribution of folksonomies over the LCSH tree. Findings - The experimental results showed that the total proportion of tags being matched with LC subject headings constituted approximately two-thirds of all tags involved, with an additional 10 percent of the remaining tags having potential matches. A number of barriers for the linking as well as two areas in need of improving the matching are identified and described. Three important tag distribution patterns over the LCSH tree were identified and supported: skewedness, multifacet, and Zipfian-pattern. Research limitations/implications - The results of the study can be adopted for the development of innovative methods of mapping between folksonomy and LCSH, which directly contributes to effective access and retrieval of tagged web resources and to the integration of multiple information repositories based on the two vocabularies. Practical implications - The linking of controlled vocabularies can be applicable to enhance information retrieval capability within collaborative tagging systems as well as across various tagging system information depositories and bibliographic databases. Originality/value - This is among frontier works that examines the potential of linking a folksonomy, extracted from a collaborative tagging system, to an authority-maintained subject heading system. It provides exploratory data to support further advanced mapping methods for linking the two vocabularies.
  20. Vetere, G.; Lenzerini, M.: Models for semantic interoperability in service-oriented architectures (2005) 0.05
    0.046281334 = product of:
      0.09256267 = sum of:
        0.09256267 = product of:
          0.277688 = sum of:
            0.277688 = weight(_text_:3a in 306) [ClassicSimilarity], result of:
              0.277688 = score(doc=306,freq=2.0), product of:
                0.42350647 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.049953517 = queryNorm
                0.65568775 = fieldWeight in 306, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=306)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Content
    Vgl.: http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5386707&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5386707.

Languages

Types

Themes