Search (4 results, page 1 of 1)

Zeng, Q.; Yu, M.; Yu, W.; Xiong, J.; Shi, Y.; Jiang, M.: Faceted hierarchy : a new graph type to organize scientific concepts and a construction method (2019) 0.07
```
0.069151685 = product of:
  0.10372752 = sum of:
    0.082684144 = product of:
      0.24805243 = sum of:
        0.24805243 = weight(_text_:3a in 400) [ClassicSimilarity], result of:
          0.24805243 = score(doc=400,freq=2.0), product of:
            0.44136027 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.052059412 = queryNorm
            0.56201804 = fieldWeight in 400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=400)
      0.33333334 = coord(1/3)
    0.021043373 = weight(_text_:the in 400) [ClassicSimilarity], result of:
      0.021043373 = score(doc=400,freq=12.0), product of:
        0.08213748 = queryWeight, product of:
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.052059412 = queryNorm
        0.25619698 = fieldWeight in 400, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.046875 = fieldNorm(doc=400)
  0.6666667 = coord(2/3)
```
Abstract

On a scientific concept hierarchy, a parent concept may have a few attributes, each of which has multiple values being a group of child concepts. We call these attributes facets: classification has a few facets such as application (e.g., face recognition), model (e.g., svm, knn), and metric (e.g., precision). In this work, we aim at building faceted concept hierarchies from scientific literature. Hierarchy construction methods heavily rely on hypernym detection, however, the faceted relations are parent-to-child links but the hypernym relation is a multi-hop, i.e., ancestor-to-descendent link with a specific facet "type-of". We use information extraction techniques to find synonyms, sibling concepts, and ancestor-descendent relations from a data science corpus. And we propose a hierarchy growth algorithm to infer the parent-child links from the three types of relationships. It resolves conflicts by maintaining the acyclic structure of a hierarchy.

Content

Vgl.: https%3A%2F%2Faclanthology.org%2FD19-5317.pdf&usg=AOvVaw0ZZFyq5wWTtNTvNkrvjlGA.

Source

Graph-Based Methods for Natural Language Processing - proceedings of the Thirteenth Workshop (TextGraphs-13): November 4, 2019, Hong Kong : EMNLP-IJCNLP 2019. Ed.: Dmitry Ustalov
Farazi, M.: Faceted lightweight ontologies : a formalization and some experiments (2010) 0.07
```
0.06618464 = product of:
  0.09927695 = sum of:
    0.06890346 = product of:
      0.20671037 = sum of:
        0.20671037 = weight(_text_:3a in 4997) [ClassicSimilarity], result of:
          0.20671037 = score(doc=4997,freq=2.0), product of:
            0.44136027 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.052059412 = queryNorm
            0.46834838 = fieldWeight in 4997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4997)
      0.33333334 = coord(1/3)
    0.030373493 = weight(_text_:the in 4997) [ClassicSimilarity], result of:
      0.030373493 = score(doc=4997,freq=36.0), product of:
        0.08213748 = queryWeight, product of:
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.052059412 = queryNorm
        0.36978847 = fieldWeight in 4997, product of:
          6.0 = tf(freq=36.0), with freq of:
            36.0 = termFreq=36.0
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4997)
  0.6666667 = coord(2/3)
```
Abstract

While classifications are heavily used to categorize web content, the evolution of the web foresees a more formal structure - ontology - which can serve this purpose. Ontologies are core artifacts of the Semantic Web which enable machines to use inference rules to conduct automated reasoning on data. Lightweight ontologies bridge the gap between classifications and ontologies. A lightweight ontology (LO) is an ontology representing a backbone taxonomy where the concept of the child node is more specific than the concept of the parent node. Formal lightweight ontologies can be generated from their informal ones. The key applications of formal lightweight ontologies are document classification, semantic search, and data integration. However, these applications suffer from the following problems: the disambiguation accuracy of the state of the art NLP tools used in generating formal lightweight ontologies from their informal ones; the lack of background knowledge needed for the formal lightweight ontologies; and the limitation of ontology reuse. In this dissertation, we propose a novel solution to these problems in formal lightweight ontologies; namely, faceted lightweight ontology (FLO). FLO is a lightweight ontology in which terms, present in each node label, and their concepts, are available in the background knowledge (BK), which is organized as a set of facets. A facet can be defined as a distinctive property of the groups of concepts that can help in differentiating one group from another. Background knowledge can be defined as a subset of a knowledge base, such as WordNet, and often represents a specific domain.

Content

PhD Dissertation at International Doctorate School in Information and Communication Technology. Vgl.: https%3A%2F%2Fcore.ac.uk%2Fdownload%2Fpdf%2F150083013.pdf&usg=AOvVaw2n-qisNagpyT0lli_6QbAQ.
Piros, A.: Az ETO-jelzetek automatikus interpretálásának és elemzésének kérdései (2018) 0.06
```
0.063793585 = product of:
  0.09569037 = sum of:
    0.06890346 = product of:
      0.20671037 = sum of:
        0.20671037 = weight(_text_:3a in 855) [ClassicSimilarity], result of:
          0.20671037 = score(doc=855,freq=2.0), product of:
            0.44136027 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.052059412 = queryNorm
            0.46834838 = fieldWeight in 855, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=855)
      0.33333334 = coord(1/3)
    0.026786905 = weight(_text_:the in 855) [ClassicSimilarity], result of:
      0.026786905 = score(doc=855,freq=28.0), product of:
        0.08213748 = queryWeight, product of:
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.052059412 = queryNorm
        0.3261228 = fieldWeight in 855, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.0390625 = fieldNorm(doc=855)
  0.6666667 = coord(2/3)
```
Abstract

Converting UDC numbers manually to a complex format such as the one mentioned above is an unrealistic expectation; supporting building these representations, as far as possible automatically, is a well-founded requirement. An additional advantage of this approach is that the existing records could also be processed and converted. In my dissertation I would like to prove also that it is possible to design and implement an algorithm that is able to convert pre-coordinated UDC numbers into the introduced format by identifying all their elements and revealing their whole syntactic structure as well. In my dissertation I will discuss a feasible way of building a UDC-specific XML schema for describing the most detailed and complicated UDC numbers (containing not only the common auxiliary signs and numbers, but also the different types of special auxiliaries). The schema definition is available online at: http://piros.udc-interpreter.hu#xsd. The primary goal of my research is to prove that it is possible to support building, retrieving, and analyzing UDC numbers without compromises, by taking the whole syntactic richness of the scheme by storing the UDC numbers reserving the meaning of pre-coordination. The research has also included the implementation of a software that parses UDC classmarks attended to prove that such solution can be applied automatically without any additional effort or even retrospectively on existing collections.

Content

Vgl. auch: New automatic interpreter for complex UDC numbers. Unter: <https%3A%2F%2Fudcc.org%2Ffiles%2FAttilaPiros_EC_36-37_2014-2015.pdf&usg=AOvVaw3kc9CwDDCWP7aArpfjrs5b>
Xiong, C.: Knowledge based text representations for information retrieval (2016) 0.05
```
0.054245643 = product of:
  0.08136846 = sum of:
    0.055122763 = product of:
      0.16536829 = sum of:
        0.16536829 = weight(_text_:3a in 5820) [ClassicSimilarity], result of:
          0.16536829 = score(doc=5820,freq=2.0), product of:
            0.44136027 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.052059412 = queryNorm
            0.3746787 = fieldWeight in 5820, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03125 = fieldNorm(doc=5820)
      0.33333334 = coord(1/3)
    0.026245698 = weight(_text_:the in 5820) [ClassicSimilarity], result of:
      0.026245698 = score(doc=5820,freq=42.0), product of:
        0.08213748 = queryWeight, product of:
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.052059412 = queryNorm
        0.31953377 = fieldWeight in 5820, product of:
          6.4807405 = tf(freq=42.0), with freq of:
            42.0 = termFreq=42.0
          1.5777643 = idf(docFreq=24812, maxDocs=44218)
          0.03125 = fieldNorm(doc=5820)
  0.6666667 = coord(2/3)
```
Abstract

The successes of information retrieval (IR) in recent decades were built upon bag-of-words representations. Effective as it is, bag-of-words is only a shallow text understanding; there is a limited amount of information for document ranking in the word space. This dissertation goes beyond words and builds knowledge based text representations, which embed the external and carefully curated information from knowledge bases, and provide richer and structured evidence for more advanced information retrieval systems. This thesis research first builds query representations with entities associated with the query. Entities' descriptions are used by query expansion techniques that enrich the query with explanation terms. Then we present a general framework that represents a query with entities that appear in the query, are retrieved by the query, or frequently show up in the top retrieved documents. A latent space model is developed to jointly learn the connections from query to entities and the ranking of documents, modeling the external evidence from knowledge bases and internal ranking features cooperatively. To further improve the quality of relevant entities, a defining factor of our query representations, we introduce learning to rank to entity search and retrieve better entities from knowledge bases. In the document representation part, this thesis research also moves one step forward with a bag-of-entities model, in which documents are represented by their automatic entity annotations, and the ranking is performed in the entity space.
This proposal includes plans to improve the quality of relevant entities with a co-learning framework that learns from both entity labels and document labels. We also plan to develop a hybrid ranking system that combines word based and entity based representations together with their uncertainties considered. At last, we plan to enrich the text representations with connections between entities. We propose several ways to infer entity graph representations for texts, and to rank documents using their structure representations. This dissertation overcomes the limitation of word based representations with external and carefully curated information from knowledge bases. We believe this thesis research is a solid start towards the new generation of intelligent, semantic, and structured information retrieval.

Content

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Language and Information Technologies. Vgl.: https%3A%2F%2Fwww.cs.cmu.edu%2F~cx%2Fpapers%2Fknowledge_based_text_representation.pdf&usg=AOvVaw0SaTSvhWLTh__Uz_HtOtl3.

Search (4 results, page 1 of 1)

Authors

Languages

Types