Search (97 results, page 1 of 5)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.36

0.3649658 = product of:
  0.6386901 = sum of:
    0.041844364 = weight(_text_:web in 563) [ClassicSimilarity], result of:
      0.041844364 = score(doc=563,freq=8.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.43268442 = fieldWeight in 563, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.14119683 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.14119683 = score(doc=563,freq=2.0), product of:
        0.25123185 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.029633347 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.14119683 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.14119683 = score(doc=563,freq=2.0), product of:
        0.25123185 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.029633347 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.0060537956 = weight(_text_:information in 563) [ClassicSimilarity], result of:
      0.0060537956 = score(doc=563,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.116372846 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.01797477 = weight(_text_:retrieval in 563) [ClassicSimilarity], result of:
      0.01797477 = score(doc=563,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.20052543 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.14119683 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.14119683 = score(doc=563,freq=2.0), product of:
        0.25123185 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.029633347 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.14119683 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.14119683 = score(doc=563,freq=2.0), product of:
        0.25123185 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.029633347 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.008029819 = product of:
      0.024089456 = sum of:
        0.024089456 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.024089456 = score(doc=563,freq=2.0), product of:
            0.103770934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.029633347 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.33333334 = coord(1/3)
  0.5714286 = coord(8/14)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.03

0.027869733 = product of:
  0.09754406 = sum of:
    0.032137483 = weight(_text_:wide in 1338) [ClassicSimilarity], result of:
      0.032137483 = score(doc=1338,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.24476713 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.017435152 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
      0.017435152 = score(doc=1338,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.18028519 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.011280581 = weight(_text_:information in 1338) [ClassicSimilarity], result of:
      0.011280581 = score(doc=1338,freq=10.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.21684799 = fieldWeight in 1338, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.036690846 = weight(_text_:retrieval in 1338) [ClassicSimilarity], result of:
      0.036690846 = score(doc=1338,freq=12.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.40932083 = fieldWeight in 1338, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
  0.2857143 = coord(4/14)

Abstract: A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.
Source: Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1577-1596
Theme: Semantisches Umfeld in Indexierung u. Retrieval

Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.01
```
0.014390454 = product of:
  0.06715545 = sum of:
    0.017435152 = weight(_text_:web in 2335) [ClassicSimilarity], result of:
      0.017435152 = score(doc=2335,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.18028519 = fieldWeight in 2335, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2335)
    0.010089659 = weight(_text_:information in 2335) [ClassicSimilarity], result of:
      0.010089659 = score(doc=2335,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.19395474 = fieldWeight in 2335, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2335)
    0.03963064 = weight(_text_:retrieval in 2335) [ClassicSimilarity], result of:
      0.03963064 = score(doc=2335,freq=14.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.442117 = fieldWeight in 2335, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2335)
  0.21428572 = coord(3/14)
```
Abstract

Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2522-2539

Korman, D.Z.; Mack, E.; Jett, J.; Renear, A.H.: Defining textual entailment (2018) 0.01

0.014362549 = product of:
  0.06702523 = sum of:
    0.03856498 = weight(_text_:wide in 4284) [ClassicSimilarity], result of:
      0.03856498 = score(doc=4284,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.29372054 = fieldWeight in 4284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
    0.0104854815 = weight(_text_:information in 4284) [ClassicSimilarity], result of:
      0.0104854815 = score(doc=4284,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.20156369 = fieldWeight in 4284, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
    0.01797477 = weight(_text_:retrieval in 4284) [ClassicSimilarity], result of:
      0.01797477 = score(doc=4284,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.20052543 = fieldWeight in 4284, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4284)
  0.21428572 = coord(3/14)

Abstract: Textual entailment is a relationship that obtains between fragments of text when one fragment in some sense implies the other fragment. The automation of textual entailment recognition supports a wide variety of text-based tasks, including information retrieval, information extraction, question answering, text summarization, and machine translation. Much ingenuity has been devoted to developing algorithms for identifying textual entailments, but relatively little to saying what textual entailment actually is. This article is a review of the logical and philosophical issues involved in providing an adequate definition of textual entailment. We show that many natural definitions of textual entailment are refuted by counterexamples, including the most widely cited definition of Dagan et al. We then articulate and defend the following revised definition: T textually entails H?=?df typically, a human reading T would be justified in inferring the proposition expressed by H from the proposition expressed by T. We also show that textual entailment is context-sensitive, nontransitive, and nonmonotonic.
Source: Journal of the Association for Information Science and Technology. 69(2018) no.6, S.763-772

Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.01

0.013883932 = product of:
  0.06479168 = sum of:
    0.034870304 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
      0.034870304 = score(doc=2861,freq=8.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.36057037 = fieldWeight in 2861, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
    0.008737902 = weight(_text_:information in 2861) [ClassicSimilarity], result of:
      0.008737902 = score(doc=2861,freq=6.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16796975 = fieldWeight in 2861, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
    0.021183468 = weight(_text_:retrieval in 2861) [ClassicSimilarity], result of:
      0.021183468 = score(doc=2861,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.23632148 = fieldWeight in 2861, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
  0.21428572 = coord(3/14)

Abstract: Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.

Clark, M.; Kim, Y.; Kruschwitz, U.; Song, D.; Albakour, D.; Dignum, S.; Beresi, U.C.; Fasli, M.; Roeck, A De: Automatically structuring domain knowledge from text : an overview of current research (2012) 0.01

0.012026694 = product of:
  0.056124568 = sum of:
    0.029588435 = weight(_text_:web in 2738) [ClassicSimilarity], result of:
      0.029588435 = score(doc=2738,freq=4.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.3059541 = fieldWeight in 2738, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2738)
    0.00856136 = weight(_text_:information in 2738) [ClassicSimilarity], result of:
      0.00856136 = score(doc=2738,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16457605 = fieldWeight in 2738, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2738)
    0.01797477 = weight(_text_:retrieval in 2738) [ClassicSimilarity], result of:
      0.01797477 = score(doc=2738,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.20052543 = fieldWeight in 2738, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2738)
  0.21428572 = coord(3/14)

Abstract: This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.
Content: Beitrag in einem Themenheft "Soft Approaches to IA on the Web". Vgl.: doi:10.1016/j.ipm.2011.07.002.
Source: Information processing and management. 48(2012) no.3, S.552-568

Hmeidi, I.I.; Al-Shalabi, R.F.; Al-Taani, A.T.; Najadat, H.; Al-Hazaimeh, S.A.: ¬A novel approach to the extraction of roots from Arabic words using bigrams (2010) 0.01

0.011625199 = product of:
  0.054250926 = sum of:
    0.032137483 = weight(_text_:wide in 3426) [ClassicSimilarity], result of:
      0.032137483 = score(doc=3426,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.24476713 = fieldWeight in 3426, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
    0.0071344664 = weight(_text_:information in 3426) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=3426,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 3426, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
    0.014978974 = weight(_text_:retrieval in 3426) [ClassicSimilarity], result of:
      0.014978974 = score(doc=3426,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 3426, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
  0.21428572 = coord(3/14)

Abstract: Root extraction is one of the most important topics in information retrieval (IR), natural language processing (NLP), text summarization, and many other important fields. In the last two decades, several algorithms have been proposed to extract Arabic roots. Most of these algorithms dealt with triliteral roots only, and some with fixed length words only. In this study, a novel approach to the extraction of roots from Arabic words using bigrams is proposed. Two similarity measures are used, the dissimilarity measure called the Manhattan distance, and Dice's measure of similarity. The proposed algorithm is tested on the Holy Qu'ran and on a corpus of 242 abstracts from the Proceedings of the Saudi Arabian National Computer Conferences. The two files used contain a wide range of data: the Holy Qu'ran contains most of the ancient Arabic words while the other file contains some modern Arabic words and some words borrowed from foreign languages in addition to the original Arabic words. The results of this study showed that combining N-grams with the Dice measure gives better results than using the Manhattan distance measure.
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.583-591

Belbachir, F.; Boughanem, M.: Using language models to improve opinion detection (2018) 0.01
```
0.011008398 = product of:
  0.05137252 = sum of:
    0.013948122 = weight(_text_:web in 5044) [ClassicSimilarity], result of:
      0.013948122 = score(doc=5044,freq=2.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.14422815 = fieldWeight in 5044, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=5044)
    0.008071727 = weight(_text_:information in 5044) [ClassicSimilarity], result of:
      0.008071727 = score(doc=5044,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1551638 = fieldWeight in 5044, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=5044)
    0.029352674 = weight(_text_:retrieval in 5044) [ClassicSimilarity], result of:
      0.029352674 = score(doc=5044,freq=12.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.32745665 = fieldWeight in 5044, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=5044)
  0.21428572 = coord(3/14)
```
Abstract

Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.

Source

Information processing and management. 54(2018) no.6, S.958-968

Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.01

0.0100695435 = product of:
  0.0704868 = sum of:
    0.06039714 = weight(_text_:web in 2027) [ClassicSimilarity], result of:
      0.06039714 = score(doc=2027,freq=6.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.6245262 = fieldWeight in 2027, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
    0.010089659 = weight(_text_:information in 2027) [ClassicSimilarity], result of:
      0.010089659 = score(doc=2027,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.19395474 = fieldWeight in 2027, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
  0.14285715 = coord(2/14)

Series: Information Systems and Applications, incl. Internet/Web, and HCI; Bd. 9088
Source: The Semantic Web: latest advances and new domains. 12th European Semantic Web Conference, ESWC 2015 Portoroz, Slovenia, May 31 -- June 4, 2015. Proceedings. Eds.: F. Gandon u.a

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2015) 0.01

0.009451089 = product of:
  0.066157624 = sum of:
    0.014268933 = weight(_text_:information in 1172) [ClassicSimilarity], result of:
      0.014268933 = score(doc=1172,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.27429342 = fieldWeight in 1172, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1172)
    0.05188869 = weight(_text_:retrieval in 1172) [ClassicSimilarity], result of:
      0.05188869 = score(doc=1172,freq=6.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.5788671 = fieldWeight in 1172, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=1172)
  0.14285715 = coord(2/14)

RSWK: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>
Subject: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>

Spitkovsky, V.; Norvig, P.: From words to concepts and back : dictionaries for linking text, entities and ideas (2012) 0.01
```
0.00860955 = product of:
  0.0401779 = sum of:
    0.024158856 = weight(_text_:web in 337) [ClassicSimilarity], result of:
      0.024158856 = score(doc=337,freq=6.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.24981049 = fieldWeight in 337, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=337)
    0.0040358636 = weight(_text_:information in 337) [ClassicSimilarity], result of:
      0.0040358636 = score(doc=337,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.0775819 = fieldWeight in 337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=337)
    0.0119831795 = weight(_text_:retrieval in 337) [ClassicSimilarity], result of:
      0.0119831795 = score(doc=337,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.13368362 = fieldWeight in 337, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=337)
  0.21428572 = coord(3/14)
```
Abstract

Human language is both rich and ambiguous. When we hear or read words, we resolve meanings to mental representations, for example recognizing and linking names to the intended persons, locations or organizations. Bridging words and meaning - from turning search queries into relevant results to suggesting targeted keywords for advertisers - is also Google's core competency, and important for many other tasks in information retrieval and natural language processing. We are happy to release a resource, spanning 7,560,141 concepts and 175,100,788 unique text strings, that we hope will help everyone working in these areas. How do we represent concepts? Our approach piggybacks on the unique titles of entries from an encyclopedia, which are mostly proper and common noun phrases. We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia's groupings of articles into hierarchical categories. The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article's canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept's url. Our database thus includes weights that measure degrees of association. For example, the top two entries for football indicate that it is an ambiguous term, which is almost twice as likely to refer to what we in the US call soccer. Vgl. auch: Spitkovsky, V.I., A.X. Chang: A cross-lingual dictionary for english Wikipedia concepts. In: http://nlp.stanford.edu/pubs/crosswikis.pdf.
Fernández, R.T.; Losada, D.E.: Effective sentence retrieval based on query-independent evidence (2012) 0.01
```
0.008485955 = product of:
  0.059401684 = sum of:
    0.00856136 = weight(_text_:information in 2728) [ClassicSimilarity], result of:
      0.00856136 = score(doc=2728,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16457605 = fieldWeight in 2728, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2728)
    0.050840326 = weight(_text_:retrieval in 2728) [ClassicSimilarity], result of:
      0.050840326 = score(doc=2728,freq=16.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.5671716 = fieldWeight in 2728, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2728)
  0.14285715 = coord(2/14)
```
Abstract

In this paper we propose an effective sentence retrieval method that consists of incorporating query-independent features into standard sentence retrieval models. To meet this aim, we apply a formal methodology and consider different query-independent features. In particular, we show that opinion-based features are promising. Opinion mining is an increasingly important research topic but little is known about how to improve retrieval algorithms with opinion-based components. In this respect, we consider here different kinds of opinion-based features to act as query-independent evidence and study whether this incorporation improves retrieval performance. On the other hand, information needs are usually related to people, locations or organizations. We hypothesize here that using these named entities as query-independent features may also improve the sentence relevance estimation. Finally, the length of the retrieval unit has been shown to be an important component in different retrieval scenarios. We therefore include length-based features in our study. Our evaluation demonstrates that, either in isolation or in combination, these query-independent features help to improve substantially the performance of state-of-the-art sentence retrieval methods.

Source

Information processing and management. 48(2012) no.6, S.1203-1229
Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.01
```
0.007581752 = product of:
  0.05307226 = sum of:
    0.01712272 = weight(_text_:information in 2339) [ClassicSimilarity], result of:
      0.01712272 = score(doc=2339,freq=16.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.3291521 = fieldWeight in 2339, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
    0.03594954 = weight(_text_:retrieval in 2339) [ClassicSimilarity], result of:
      0.03594954 = score(doc=2339,freq=8.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.40105087 = fieldWeight in 2339, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
  0.14285715 = coord(2/14)
```
Abstract

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2553-2565

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2013) 0.01

0.0074937996 = product of:
  0.052456595 = sum of:
    0.010089659 = weight(_text_:information in 1810) [ClassicSimilarity], result of:
      0.010089659 = score(doc=1810,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.19395474 = fieldWeight in 1810, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1810)
    0.042366937 = weight(_text_:retrieval in 1810) [ClassicSimilarity], result of:
      0.042366937 = score(doc=1810,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.47264296 = fieldWeight in 1810, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=1810)
  0.14285715 = coord(2/14)

Content: Trägerin des VFI-Dissertationspreises 2014: "Überzeugende gründliche linguistische und quantitative Analyse eines im Information Retrieval bisher wenig beachteten Textelementes anhand eines eigens erstellten grossen Hypertextkorpus, einschliesslich der Evaluation selbsterstellter Auflösungsregeln für die Nutzung in künftigen IR-Systemen.".

Doko, A.; Stula, , M.; Seric, L.: Improved sentence retrieval using local context and sentence length (2013) 0.01
```
0.007154687 = product of:
  0.050082806 = sum of:
    0.0060537956 = weight(_text_:information in 2705) [ClassicSimilarity], result of:
      0.0060537956 = score(doc=2705,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.116372846 = fieldWeight in 2705, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2705)
    0.044029012 = weight(_text_:retrieval in 2705) [ClassicSimilarity], result of:
      0.044029012 = score(doc=2705,freq=12.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.49118498 = fieldWeight in 2705, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=2705)
  0.14285715 = coord(2/14)
```
Abstract

In this paper we propose improved variants of the sentence retrieval method TF-ISF (a TF-IDF or Term Frequency-Inverse Document Frequency variant for sentence retrieval). The improvement is achieved by using context consisting of neighboring sentences and at the same time promoting the retrieval of longer sentences. We thoroughly compare new modified TF-ISF methods to the TF-ISF baseline, to an earlier attempt to include context into TF-ISF named tfmix and to a language modeling based method that uses context and promoting retrieval of long sentences named 3MMPDS. Experimental results show that the TF-ISF method can be improved using local context. Results also show that the TF-ISF method can be improved by promoting the retrieval of longer sentences. Finally we show that the best results are achieved when combining both modifications. All new methods (TF-ISF variants) also show statistically significant better results than the other tested methods.

Source

Information processing and management. 49(2013) no.6, S.1301-1312
Vasalou, A.; Gill, A.J.; Mazanderani, F.; Papoutsi, C.; Joinson, A.: Privacy dictionary : a new resource for the automated content analysis of privacy (2011) 0.01
```
0.006374111 = product of:
  0.044618774 = sum of:
    0.03856498 = weight(_text_:wide in 4915) [ClassicSimilarity], result of:
      0.03856498 = score(doc=4915,freq=2.0), product of:
        0.1312982 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.029633347 = queryNorm
        0.29372054 = fieldWeight in 4915, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4915)
    0.0060537956 = weight(_text_:information in 4915) [ClassicSimilarity], result of:
      0.0060537956 = score(doc=4915,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.116372846 = fieldWeight in 4915, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4915)
  0.14285715 = coord(2/14)
```
Abstract

This article presents the privacy dictionary, a new linguistic resource for automated content analysis on privacy-related texts. To overcome the definitional challenges inherent in privacy research, the dictionary was informed by an inclusive set of relevant theoretical perspectives. Using methods from corpus linguistics, we constructed and validated eight dictionary categories on empirical material from a wide range of privacy-sensitive contexts. It was shown that the dictionary categories are able to measure unique linguistic patterns within privacy discussions. At a time when privacy considerations are increasing and online resources provide ever-growing quantities of textual data, the privacy dictionary can play a significant role not only for research in the social sciences but also in technology design and policymaking.

Source

Journal of the American Society for Information Science and Technology. 62(2011) no.11, S.2095-2105
Engerer, V.: Exploring interdisciplinary relationships between linguistics and information retrieval from the 1960s to today (2017) 0.01
```
0.006366278 = product of:
  0.044563945 = sum of:
    0.019143783 = weight(_text_:information in 3434) [ClassicSimilarity], result of:
      0.019143783 = score(doc=3434,freq=20.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.36800325 = fieldWeight in 3434, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3434)
    0.025420163 = weight(_text_:retrieval in 3434) [ClassicSimilarity], result of:
      0.025420163 = score(doc=3434,freq=4.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.2835858 = fieldWeight in 3434, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=3434)
  0.14285715 = coord(2/14)
```
Abstract

This article explores how linguistics has influenced information retrieval (IR) and attempts to explain the impact of linguistics through an analysis of internal developments in information science generally, and IR in particular. It notes that information science/IR has been evolving from a case science into a fully fledged, "disciplined"/disciplinary science. The article establishes correspondences between linguistics and information science/IR using the three established IR paradigms-physical, cognitive, and computational-as a frame of reference. The current relationship between information science/IR and linguistics is elucidated through discussion of some recent information science publications dealing with linguistic topics and a novel technique, "keyword collocation analysis," is introduced. Insights from interdisciplinarity research and case theory are also discussed. It is demonstrated that the three stages of interdisciplinarity, namely multidisciplinarity, interdisciplinarity (in the narrow sense), and transdisciplinarity, can be linked to different phases of the information science/IR-linguistics relationship and connected to different ways of using linguistic theory in information science and IR.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.3, S.660-680

Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.01

0.0063583 = product of:
  0.044508096 = sum of:
    0.034519844 = weight(_text_:web in 4733) [ClassicSimilarity], result of:
      0.034519844 = score(doc=4733,freq=4.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.35694647 = fieldWeight in 4733, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4733)
    0.009988253 = weight(_text_:information in 4733) [ClassicSimilarity], result of:
      0.009988253 = score(doc=4733,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.1920054 = fieldWeight in 4733, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4733)
  0.14285715 = coord(2/14)

Abstract: Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.

Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.01

0.006000682 = product of:
  0.04200477 = sum of:
    0.034870304 = weight(_text_:web in 3488) [ClassicSimilarity], result of:
      0.034870304 = score(doc=3488,freq=8.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.36057037 = fieldWeight in 3488, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
    0.0071344664 = weight(_text_:information in 3488) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=3488,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 3488, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
  0.14285715 = coord(2/14)

Abstract: There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.
Series: Information Systems and Applications, incl. Internet/Web, and HCI; 10151

Ye, Z.; He, B.; Wang, L.; Luo, T.: Utilizing term proximity for blog post retrieval (2013) 0.01
```
0.005721087 = product of:
  0.04004761 = sum of:
    0.010089659 = weight(_text_:information in 1126) [ClassicSimilarity], result of:
      0.010089659 = score(doc=1126,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.19395474 = fieldWeight in 1126, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1126)
    0.029957948 = weight(_text_:retrieval in 1126) [ClassicSimilarity], result of:
      0.029957948 = score(doc=1126,freq=8.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.33420905 = fieldWeight in 1126, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1126)
  0.14285715 = coord(2/14)
```
Abstract

Term proximity is effective for many information retrieval (IR) research fields yet remains unexplored in blogosphere IR. The blogosphere is characterized by large amounts of noise, including incohesive, off-topic content and spam. Consequently, the classical bag-of-words unigram IR models are not reliable enough to provide robust and effective retrieval performance. In this article, we propose to boost the blog postretrieval performance by employing term proximity information. We investigate a variety of popular and state-of-the-art proximity-based statistical IR models, including a proximity-based counting model, the Markov random field (MRF) model, and the divergence from randomness (DFR) multinomial model. Extensive experimentation on the standard TREC Blog06 test dataset demonstrates that the introduction of term proximity information is indeed beneficial to retrieval from the blogosphere. Results also indicate the superiority of the unordered bi-gram model with the sequential-dependence phrases over other variants of the proximity-based models. Finally, inspired by the effectiveness of proximity models, we extend our study by exploring the proximity evidence between query terms and opinionated terms. The consequent opinionated proximity model shows promising performance in the experiments.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.11, S.2278-2298

Search (97 results, page 1 of 5)

Authors

Languages

Types

Themes

Subjects

Classifications