Search (17 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Semantisches Umfeld in Indexierung u. Retrieval"
  • × year_i:[2010 TO 2020}
  1. Chebil, W.; Soualmia, L.F.; Omri, M.N.; Darmoni, S.F.: Indexing biomedical documents with a possibilistic network (2016) 0.02
    0.01993374 = product of:
      0.03986748 = sum of:
        0.03986748 = product of:
          0.07973496 = sum of:
            0.07973496 = weight(_text_:network in 2854) [ClassicSimilarity], result of:
              0.07973496 = score(doc=2854,freq=4.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.34791988 = fieldWeight in 2854, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2854)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article, we propose a new approach for indexing biomedical documents based on a possibilistic network that carries out partial matching between documents and biomedical vocabulary. The main contribution of our approach is to deal with the imprecision and uncertainty of the indexing task using possibility theory. We enhance estimation of the similarity between a document and a given concept using the two measures of possibility and necessity. Possibility estimates the extent to which a document is not similar to the concept. The second measure can provide confirmation that the document is similar to the concept. Our contribution also reduces the limitation of partial matching. Although the latter allows extracting from the document other variants of terms than those in dictionaries, it also generates irrelevant information. Our objective is to filter the index using the knowledge provided by the Unified Medical Language System®. Experiments were carried out on different corpora, showing encouraging results (the improvement rate is +26.37% in terms of main average precision when compared with the baseline).
  2. Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.02
    0.01993374 = product of:
      0.03986748 = sum of:
        0.03986748 = product of:
          0.07973496 = sum of:
            0.07973496 = weight(_text_:network in 5055) [ClassicSimilarity], result of:
              0.07973496 = score(doc=5055,freq=4.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.34791988 = fieldWeight in 5055, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5055)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.
  3. Rekabsaz, N. et al.: Toward optimized multimodal concept indexing (2016) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 2751) [ClassicSimilarity], result of:
              0.06972289 = score(doc=2751,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 2751, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2751)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  4. Kozikowski, P. et al.: Support of part-whole relations in query answering (2016) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 2754) [ClassicSimilarity], result of:
              0.06972289 = score(doc=2754,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 2754, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2754)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    1. 2.2016 18:25:22
  5. Marx, E. et al.: Exploring term networks for semantic search over RDF knowledge graphs (2016) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 3279) [ClassicSimilarity], result of:
              0.06972289 = score(doc=3279,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 3279, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3279)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Metadata and semantics research: 10th International Conference, MTSR 2016, Göttingen, Germany, November 22-25, 2016, Proceedings. Eds.: E. Garoufallou
  6. Kopácsi, S. et al.: Development of a classification server to support metadata harmonization in a long term preservation system (2016) 0.02
    0.017430723 = product of:
      0.034861445 = sum of:
        0.034861445 = product of:
          0.06972289 = sum of:
            0.06972289 = weight(_text_:22 in 3280) [ClassicSimilarity], result of:
              0.06972289 = score(doc=3280,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.38690117 = fieldWeight in 3280, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3280)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Metadata and semantics research: 10th International Conference, MTSR 2016, Göttingen, Germany, November 22-25, 2016, Proceedings. Eds.: E. Garoufallou
  7. Sebastian, Y.: Literature-based discovery by learning heterogeneous bibliographic information networks (2017) 0.02
    0.015946992 = product of:
      0.031893983 = sum of:
        0.031893983 = product of:
          0.06378797 = sum of:
            0.06378797 = weight(_text_:network in 535) [ClassicSimilarity], result of:
              0.06378797 = score(doc=535,freq=4.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2783359 = fieldWeight in 535, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.03125 = fieldNorm(doc=535)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Literature-based discovery (LBD) research aims at finding effective computational methods for predicting previously unknown connections between clusters of research papers from disparate research areas. Existing methods encompass two general approaches. The first approach searches for these unknown connections by examining the textual contents of research papers. In addition to the existing textual features, the second approach incorporates structural features of scientific literatures, such as citation structures. These approaches, however, have not considered research papers' latent bibliographic metadata structures as important features that can be used for predicting previously unknown relationships between them. This thesis investigates a new graph-based LBD method that exploits the latent bibliographic metadata connections between pairs of research papers. The heterogeneous bibliographic information network is proposed as an efficient graph-based data structure for modeling the complex relationships between these metadata. In contrast to previous approaches, this method seamlessly combines textual and citation information in the form of pathbased metadata features for predicting future co-citation links between research papers from disparate research fields. The results reported in this thesis provide evidence that the method is effective for reconstructing the historical literature-based discovery hypotheses. This thesis also investigates the effects of semantic modeling and topic modeling on the performance of the proposed method. For semantic modeling, a general-purpose word sense disambiguation technique is proposed to reduce the lexical ambiguity in the title and abstract of research papers. The experimental results suggest that the reduced lexical ambiguity did not necessarily lead to a better performance of the method. This thesis discusses some of the possible contributing factors to these results. Finally, topic modeling is used for learning the latent topical relations between research papers. The learned topic model is incorporated into the heterogeneous bibliographic information network graph and allows new predictive features to be learned. The results in this thesis suggest that topic modeling improves the performance of the proposed method by increasing the overall accuracy for predicting the future co-citation links between disparate research papers.
  8. Jiang, Y.; Zhang, X.; Tang, Y.; Nie, R.: Feature-based approaches to semantic similarity assessment of concepts using Wikipedia (2015) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 2682) [ClassicSimilarity], result of:
              0.05638113 = score(doc=2682,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 2682, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2682)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Semantic similarity assessment between concepts is an important task in many language related applications. In the past, several approaches to assess similarity by evaluating the knowledge modeled in an (or multiple) ontology (or ontologies) have been proposed. However, there are some limitations such as the facts of relying on predefined ontologies and fitting non-dynamic domains in the existing measures. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing semantic similarity of concepts with more coverage than usual ontologies. In this paper, we propose some novel feature based similarity assessment methods that are fully dependent on Wikipedia and can avoid most of the limitations and drawbacks introduced above. To implement similarity assessment based on feature by making use of Wikipedia, firstly a formal representation of Wikipedia concepts is presented. We then give a framework for feature based similarity based on the formal representation of Wikipedia concepts. Lastly, we investigate several feature based approaches to semantic similarity measures resulting from instantiations of the framework. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgements. Overall, several methods proposed in this paper have good human correlation and constitute some effective ways of determining similarity between Wikipedia concepts.
  9. Jiang, Y.; Bai, W.; Zhang, X.; Hu, J.: Wikipedia-based information content and semantic similarity computation (2017) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 2877) [ClassicSimilarity], result of:
              0.05638113 = score(doc=2877,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 2877, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2877)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Information Content (IC) of a concept is a fundamental dimension in computational linguistics. It enables a better understanding of concept's semantics. In the past, several approaches to compute IC of a concept have been proposed. However, there are some limitations such as the facts of relying on corpora availability, manual tagging, or predefined ontologies and fitting non-dynamic domains in the existing methods. Wikipedia provides a very large domain-independent encyclopedic repository and semantic network for computing IC of concepts with more coverage than usual ontologies. In this paper, we propose some novel methods to IC computation of a concept to solve the shortcomings of existing approaches. The presented methods focus on the IC computation of a concept (i.e., Wikipedia category) drawn from the Wikipedia category structure. We propose several new IC-based measures to compute the semantic similarity between concepts. The evaluation, based on several widely used benchmarks and a benchmark developed in ourselves, sustains the intuitions with respect to human judgments. Overall, some methods proposed in this paper have a good human correlation and constitute some effective ways of determining IC values for concepts and semantic similarity between concepts.
  10. Roy, R.S.; Agarwal, S.; Ganguly, N.; Choudhury, M.: Syntactic complexity of Web search queries through the lenses of language models, networks and users (2016) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 3188) [ClassicSimilarity], result of:
              0.05638113 = score(doc=3188,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 3188, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3188)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Across the world, millions of users interact with search engines every day to satisfy their information needs. As the Web grows bigger over time, such information needs, manifested through user search queries, also become more complex. However, there has been no systematic study that quantifies the structural complexity of Web search queries. In this research, we make an attempt towards understanding and characterizing the syntactic complexity of search queries using a multi-pronged approach. We use traditional statistical language modeling techniques to quantify and compare the perplexity of queries with natural language (NL). We then use complex network analysis for a comparative analysis of the topological properties of queries issued by real Web users and those generated by statistical models. Finally, we conduct experiments to study whether search engine users are able to identify real queries, when presented along with model-generated ones. The three complementary studies show that the syntactic structure of Web queries is more complex than what n-grams can capture, but simpler than NL. Queries, thus, seem to represent an intermediate stage between syntactic and non-syntactic communication.
  11. Qu, R.; Fang, Y.; Bai, W.; Jiang, Y.: Computing semantic similarity based on novel models of semantic representation using Wikipedia (2018) 0.01
    0.014095282 = product of:
      0.028190564 = sum of:
        0.028190564 = product of:
          0.05638113 = sum of:
            0.05638113 = weight(_text_:network in 5052) [ClassicSimilarity], result of:
              0.05638113 = score(doc=5052,freq=2.0), product of:
                0.22917621 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2460165 = fieldWeight in 5052, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5052)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Computing Semantic Similarity (SS) between concepts is one of the most critical issues in many domains such as Natural Language Processing and Artificial Intelligence. Over the years, several SS measurement methods have been proposed by exploiting different knowledge resources. Wikipedia provides a large domain-independent encyclopedic repository and a semantic network for computing SS between concepts. Traditional feature-based measures rely on linear combinations of different properties with two main limitations, the insufficient information and the loss of semantic information. In this paper, we propose several hybrid SS measurement approaches by using the Information Content (IC) and features of concepts, which avoid the limitations introduced above. Considering integrating discrete properties into one component, we present two models of semantic representation, called CORM and CARM. Then, we compute SS based on these models and take the IC of categories as a supplement of SS measurement. The evaluation, based on several widely used benchmarks and a benchmark developed by ourselves, sustains the intuitions with respect to human judgments. In summary, our approaches are more efficient in determining SS between concepts and have a better human correlation than previous methods such as Word2Vec and NASARI.
  12. Salaba, A.; Zeng, M.L.: Extending the "Explore" user task beyond subject authority data into the linked data sphere (2014) 0.01
    0.012201506 = product of:
      0.024403011 = sum of:
        0.024403011 = product of:
          0.048806023 = sum of:
            0.048806023 = weight(_text_:22 in 1465) [ClassicSimilarity], result of:
              0.048806023 = score(doc=1465,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2708308 = fieldWeight in 1465, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1465)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  13. Mlodzka-Stybel, A.: Towards continuous improvement of users' access to a library catalogue (2014) 0.01
    0.012201506 = product of:
      0.024403011 = sum of:
        0.024403011 = product of:
          0.048806023 = sum of:
            0.048806023 = weight(_text_:22 in 1466) [ClassicSimilarity], result of:
              0.048806023 = score(doc=1466,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.2708308 = fieldWeight in 1466, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1466)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  14. Zeng, M.L.; Gracy, K.F.; Zumer, M.: Using a semantic analysis tool to generate subject access points : a study using Panofsky's theory and two research samples (2014) 0.01
    0.010458433 = product of:
      0.020916866 = sum of:
        0.020916866 = product of:
          0.041833732 = sum of:
            0.041833732 = weight(_text_:22 in 1464) [ClassicSimilarity], result of:
              0.041833732 = score(doc=1464,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.23214069 = fieldWeight in 1464, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1464)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  15. Brandão, W.C.; Santos, R.L.T.; Ziviani, N.; Moura, E.S. de; Silva, A.S. da: Learning to expand queries using entities (2014) 0.01
    0.008715361 = product of:
      0.017430723 = sum of:
        0.017430723 = product of:
          0.034861445 = sum of:
            0.034861445 = weight(_text_:22 in 1343) [ClassicSimilarity], result of:
              0.034861445 = score(doc=1343,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.19345059 = fieldWeight in 1343, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1343)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 8.2014 17:07:50
  16. Brunetti, J.M.; Roberto García, R.: User-centered design and evaluation of overview components for semantic data exploration (2014) 0.01
    0.006972289 = product of:
      0.013944578 = sum of:
        0.013944578 = product of:
          0.027889157 = sum of:
            0.027889157 = weight(_text_:22 in 1626) [ClassicSimilarity], result of:
              0.027889157 = score(doc=1626,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.15476047 = fieldWeight in 1626, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1626)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20. 1.2015 18:30:22
  17. Thenmalar, S.; Geetha, T.V.: Enhanced ontology-based indexing and searching (2014) 0.01
    0.006100753 = product of:
      0.012201506 = sum of:
        0.012201506 = product of:
          0.024403011 = sum of:
            0.024403011 = weight(_text_:22 in 1633) [ClassicSimilarity], result of:
              0.024403011 = score(doc=1633,freq=2.0), product of:
                0.18020853 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05146125 = queryNorm
                0.1354154 = fieldWeight in 1633, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=1633)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    20. 1.2015 18:30:22