Search (149 results, page 1 of 8)

Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.14

0.13841115 = product of:
  0.1845482 = sum of:
    0.05817665 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
      0.05817665 = score(doc=2861,freq=8.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.36057037 = fieldWeight in 2861, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
    0.08729243 = weight(_text_:search in 2861) [ClassicSimilarity], result of:
      0.08729243 = score(doc=2861,freq=14.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.5079997 = fieldWeight in 2861, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
    0.03907912 = product of:
      0.07815824 = sum of:
        0.07815824 = weight(_text_:engine in 2861) [ClassicSimilarity], result of:
          0.07815824 = score(doc=2861,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.29552078 = fieldWeight in 2861, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2861)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.

Gencosman, B.C.; Ozmutlu, H.C.; Ozmutlu, S.: Character n-gram application for automatic new topic identification (2014) 0.12

0.11544123 = product of:
  0.15392165 = sum of:
    0.029088326 = weight(_text_:web in 2688) [ClassicSimilarity], result of:
      0.029088326 = score(doc=2688,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.18028519 = fieldWeight in 2688, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2688)
    0.057146307 = weight(_text_:search in 2688) [ClassicSimilarity], result of:
      0.057146307 = score(doc=2688,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.33256388 = fieldWeight in 2688, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2688)
    0.06768702 = product of:
      0.13537404 = sum of:
        0.13537404 = weight(_text_:engine in 2688) [ClassicSimilarity], result of:
          0.13537404 = score(doc=2688,freq=6.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.51185703 = fieldWeight in 2688, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2688)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.

Airio, E.: Who benefits from CLIR in web retrieval? (2008) 0.11

0.11020951 = product of:
  0.14694601 = sum of:
    0.060458954 = weight(_text_:web in 2342) [ClassicSimilarity], result of:
      0.060458954 = score(doc=2342,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.37471575 = fieldWeight in 2342, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2342)
    0.03959212 = weight(_text_:search in 2342) [ClassicSimilarity], result of:
      0.03959212 = score(doc=2342,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.230407 = fieldWeight in 2342, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=2342)
    0.04689494 = product of:
      0.09378988 = sum of:
        0.09378988 = weight(_text_:engine in 2342) [ClassicSimilarity], result of:
          0.09378988 = score(doc=2342,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.35462496 = fieldWeight in 2342, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.046875 = fieldNorm(doc=2342)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Purpose - The aim of the current paper is to test whether query translation is beneficial in web retrieval. Design/methodology/approach - The language pairs were Finnish-Swedish, English-German and Finnish-French. A total of 12-18 participants were recruited for each language pair. Each participant performed four retrieval tasks. The author's aim was to compare the performance of the translated queries with that of the target language queries. Thus, the author asked participants to formulate a source language query and a target language query for each task. The source language queries were translated into the target language utilizing a dictionary-based system. In English-German, also machine translation was utilized. The author used Google as the search engine. Findings - The results differed depending on the language pair. The author concluded that the dictionary coverage had an effect on the results. On average, the results of query-translation were better than in the traditional laboratory tests. Originality/value - This research shows that query translation in web is beneficial especially for users with moderate and non-active language skills. This is valuable information for developers of cross-language information retrieval systems.

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.10014303 = product of:
  0.13352405 = sum of:
    0.078522965 = product of:
      0.23556888 = sum of:
        0.23556888 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.23556888 = score(doc=562,freq=2.0), product of:
            0.41914827 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.049439456 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.03490599 = weight(_text_:web in 562) [ClassicSimilarity], result of:
      0.03490599 = score(doc=562,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.21634221 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.02009509 = product of:
      0.04019018 = sum of:
        0.04019018 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.04019018 = score(doc=562,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Chandrasekar, R.; Bangalore, S.: Glean : using syntactic information in document filtering (2002) 0.09

0.093985304 = product of:
  0.12531374 = sum of:
    0.029088326 = weight(_text_:web in 4257) [ClassicSimilarity], result of:
      0.029088326 = score(doc=4257,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.18028519 = fieldWeight in 4257, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4257)
    0.057146307 = weight(_text_:search in 4257) [ClassicSimilarity], result of:
      0.057146307 = score(doc=4257,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.33256388 = fieldWeight in 4257, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4257)
    0.03907912 = product of:
      0.07815824 = sum of:
        0.07815824 = weight(_text_:engine in 4257) [ClassicSimilarity], result of:
          0.07815824 = score(doc=4257,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.29552078 = fieldWeight in 4257, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4257)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: In today's networked world, a huge amount of data is available in machine-processable form. Likewise, there are any number of search engines and specialized information retrieval (IR) programs that seek to extract relevant information from these data repositories. Most IR systems and Web search engines have been designed for speed and tend to maximize the quantity of information (recall) rather than the relevance of the information (precision) to the query. As a result, search engine users get inundated with information for practically any query, and are forced to scan a large number of potentially relevant items to get to the information of interest. The Holy Grail of IR is to somehow retrieve those and only those documents pertinent to the user's query. Polysemy and synonymy - the fact that often there are several meanings for a word or phrase, and likewise, many ways to express a conceptmake this a very hard task. While conventional IR systems provide usable solutions, there are a number of open problems to be solved, in areas such as syntactic processing, semantic analysis, and user modeling, before we develop systems that "understand" user queries and text collections. Meanwhile, we can use tools and techniques available today to improve the precision of retrieval. In particular, using the approach described in this article, we can approximate understanding using the syntactic structure and patterns of language use that is latent in documents to make IR more effective.

Doszkocs, T.E.; Zamora, A.: Dictionary services and spelling aids for Web searching (2004) 0.09
```
0.09054339 = product of:
  0.120724514 = sum of:
    0.050382458 = weight(_text_:web in 2541) [ClassicSimilarity], result of:
      0.050382458 = score(doc=2541,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.3122631 = fieldWeight in 2541, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.046659768 = weight(_text_:search in 2541) [ClassicSimilarity], result of:
      0.046659768 = score(doc=2541,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.27153727 = fieldWeight in 2541, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2541)
    0.02368229 = product of:
      0.04736458 = sum of:
        0.04736458 = weight(_text_:22 in 2541) [ClassicSimilarity], result of:
          0.04736458 = score(doc=2541,freq=4.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.27358043 = fieldWeight in 2541, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2541)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Abstract

The Specialized Information Services Division (SIS) of the National Library of Medicine (NLM) provides Web access to more than a dozen scientific databases on toxicology and the environment on TOXNET . Search queries on TOXNET often include misspelled or variant English words, medical and scientific jargon and chemical names. Following the example of search engines like Google and ClinicalTrials.gov, we set out to develop a spelling "suggestion" system for increased recall and precision in TOXNET searching. This paper describes development of dictionary technology that can be used in a variety of applications such as orthographic verification, writing aid, natural language processing, and information storage and retrieval. The design of the technology allows building complex applications using the components developed in the earlier phases of the work in a modular fashion without extensive rewriting of computer code. Since many of the potential applications envisioned for this work have on-line or web-based interfaces, the dictionaries and other computer components must have fast response, and must be adaptable to open-ended database vocabularies, including chemical nomenclature. The dictionary vocabulary for this work was derived from SIS and other databases and specialized resources, such as NLM's Unified Medical Language Systems (UMLS) . The resulting technology, A-Z Dictionary (AZdict), has three major constituents: 1) the vocabulary list, 2) the word attributes that define part of speech and morphological relationships between words in the list, and 3) a set of programs that implements the retrieval of words and their attributes, and determines similarity between words (ChemSpell). These three components can be used in various applications such as spelling verification, spelling aid, part-of-speech tagging, paraphrasing, and many other natural language processing functions.

Date

14. 8.2004 17:22:56

Source

Online. 28(2004) no.3, S.22-29

Griffith, C.: FREESTYLE: LEXIS-NEXIS goes natural (1994) 0.09

0.08573888 = product of:
  0.17147776 = sum of:
    0.093319535 = weight(_text_:search in 2512) [ClassicSimilarity], result of:
      0.093319535 = score(doc=2512,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.54307455 = fieldWeight in 2512, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.078125 = fieldNorm(doc=2512)
    0.07815824 = product of:
      0.15631647 = sum of:
        0.15631647 = weight(_text_:engine in 2512) [ClassicSimilarity], result of:
          0.15631647 = score(doc=2512,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.59104156 = fieldWeight in 2512, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.078125 = fieldNorm(doc=2512)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Describes FREESTYLE, the associative language search engine, developed by Mead Data Central for its LEXIS/NEXIS online service. The special feature of the associative language in FREESTYLE allows users to enter search descriptions in plain English

Symonds, M.; Bruza, P.; Zuccon, G.; Koopman, B.; Sitbon, L.; Turner, I.: Automatic query expansion : a structural linguistic perspective (2014) 0.08

0.07587066 = product of:
  0.101160884 = sum of:
    0.029088326 = weight(_text_:web in 1338) [ClassicSimilarity], result of:
      0.029088326 = score(doc=1338,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.18028519 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.032993436 = weight(_text_:search in 1338) [ClassicSimilarity], result of:
      0.032993436 = score(doc=1338,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 1338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1338)
    0.03907912 = product of:
      0.07815824 = sum of:
        0.07815824 = weight(_text_:engine in 1338) [ClassicSimilarity], result of:
          0.07815824 = score(doc=1338,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.29552078 = fieldWeight in 1338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1338)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: A user's query is considered to be an imprecise description of their information need. Automatic query expansion is the process of reformulating the original query with the goal of improving retrieval effectiveness. Many successful query expansion techniques model syntagmatic associations that infer two terms co-occur more often than by chance in natural language. However, structural linguistics relies on both syntagmatic and paradigmatic associations to deduce the meaning of a word. Given the success of dependency-based approaches to query expansion and the reliance on word meanings in the query formulation process, we argue that modeling both syntagmatic and paradigmatic information in the query expansion process improves retrieval effectiveness. This article develops and evaluates a new query expansion technique that is based on a formal, corpus-based model of word meaning that models syntagmatic and paradigmatic associations. We demonstrate that when sufficient statistical information exists, as in the case of longer queries, including paradigmatic information alone provides significant improvements in retrieval effectiveness across a wide variety of data sets. More generally, when our new query expansion approach is applied to large-scale web retrieval it demonstrates significant improvements in retrieval effectiveness over a strong baseline system, based on a commercial search engine.

Schneider, R.: Web 3.0 ante portas? : Integration von Social Web und Semantic Web (2008) 0.07

0.06559447 = product of:
  0.13118894 = sum of:
    0.10774467 = weight(_text_:web in 4184) [ClassicSimilarity], result of:
      0.10774467 = score(doc=4184,freq=14.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.6677857 = fieldWeight in 4184, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4184)
    0.023444273 = product of:
      0.046888545 = sum of:
        0.046888545 = weight(_text_:22 in 4184) [ClassicSimilarity], result of:
          0.046888545 = score(doc=4184,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.2708308 = fieldWeight in 4184, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4184)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Das Medium Internet ist im Wandel, und mit ihm ändern sich seine Publikations- und Rezeptionsbedingungen. Welche Chancen bieten die momentan parallel diskutierten Zukunftsentwürfe von Social Web und Semantic Web? Zur Beantwortung dieser Frage beschäftigt sich der Beitrag mit den Grundlagen beider Modelle unter den Aspekten Anwendungsbezug und Technologie, beleuchtet darüber hinaus jedoch auch deren Unzulänglichkeiten sowie den Mehrwert einer mediengerechten Kombination. Am Beispiel des grammatischen Online-Informationssystems grammis wird eine Strategie zur integrativen Nutzung der jeweiligen Stärken skizziert.
Date: 22. 1.2011 10:38:28
Source: Kommunikation, Partizipation und Wirkungen im Social Web, Band 1. Hrsg.: A. Zerfaß u.a
Theme: Semantic Web

Radev, D.; Fan, W.; Qu, H.; Wu, H.; Grewal, A.: Probabilistic question answering on the Web (2005) 0.06

0.06451727 = product of:
  0.12903453 = sum of:
    0.060458954 = weight(_text_:web in 3455) [ClassicSimilarity], result of:
      0.060458954 = score(doc=3455,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.37471575 = fieldWeight in 3455, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3455)
    0.068575576 = weight(_text_:search in 3455) [ClassicSimilarity], result of:
      0.068575576 = score(doc=3455,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.39907667 = fieldWeight in 3455, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=3455)
  0.5 = coord(2/4)

Abstract: Web-based search engines such as Google and NorthernLight return documents that are relevant to a user query, not answers to user questions. We have developed an architecture that augments existing search engines so that they support natural language question answering. The process entails five steps: query modulation, document retrieval, passage extraction, phrase extraction, and answer ranking. In this article, we describe some probabilistic approaches to the last three of these stages. We show how our techniques apply to a number of existing search engines, and we also present results contrasting three different methods for question answering. Our algorithm, probabilistic phrase reranking (PPR), uses proximity and question type features and achieves a total reciprocal document rank of .20 an the TREC8 corpus. Our techniques have been implemented as a Web-accessible system, called NSIR.

Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.06
```
0.0613487 = product of:
  0.1226974 = sum of:
    0.0914341 = weight(_text_:search in 1264) [ClassicSimilarity], result of:
      0.0914341 = score(doc=1264,freq=24.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.5321022 = fieldWeight in 1264, product of:
          4.8989797 = tf(freq=24.0), with freq of:
            24.0 = termFreq=24.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.03125 = fieldNorm(doc=1264)
    0.031263296 = product of:
      0.06252659 = sum of:
        0.06252659 = weight(_text_:engine in 1264) [ClassicSimilarity], result of:
          0.06252659 = score(doc=1264,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.23641664 = fieldWeight in 1264, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.06
```
0.061094895 = product of:
  0.12218979 = sum of:
    0.06504348 = weight(_text_:web in 604) [ClassicSimilarity], result of:
      0.06504348 = score(doc=604,freq=10.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.40312994 = fieldWeight in 604, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
    0.057146307 = weight(_text_:search in 604) [ClassicSimilarity], result of:
      0.057146307 = score(doc=604,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.33256388 = fieldWeight in 604, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
  0.5 = coord(2/4)
```
Abstract

Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Navarretta, C.; Pedersen, B.S.; Hansen, D.H.: Language technology in knowledge-organization systems (2006) 0.06

0.057735257 = product of:
  0.11547051 = sum of:
    0.068575576 = weight(_text_:search in 5706) [ClassicSimilarity], result of:
      0.068575576 = score(doc=5706,freq=6.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.39907667 = fieldWeight in 5706, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=5706)
    0.04689494 = product of:
      0.09378988 = sum of:
        0.09378988 = weight(_text_:engine in 5706) [ClassicSimilarity], result of:
          0.09378988 = score(doc=5706,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.35462496 = fieldWeight in 5706, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.046875 = fieldNorm(doc=5706)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This paper describes the language technology methods developed in the Danish research project VID to extract from Danish text material relevant information for the population of knowledge organization systems (KOS) within specific corporate domains. The results achieved by applying these methods to a prototype search engine tuned to the patent and trademark domain indicate that the use of human language technology can support the construction of a linguistically based KOS and that linguistic information in search improves recall substantially without harming precision (near 90%). Finally, we describe two research experiments where (1) linguistic analysis of Danish compounds and is exploited to improve search atrategies on these (2) linguistic knowledge is used to model corporate knowledge into a language-based ontology.

Rötzer, F.: Computer ergooglen die Bedeutung von Worten (2005) 0.06
```
0.055104755 = product of:
  0.07347301 = sum of:
    0.030229477 = weight(_text_:web in 3385) [ClassicSimilarity], result of:
      0.030229477 = score(doc=3385,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.18735787 = fieldWeight in 3385, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3385)
    0.01979606 = weight(_text_:search in 3385) [ClassicSimilarity], result of:
      0.01979606 = score(doc=3385,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.1152035 = fieldWeight in 3385, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3385)
    0.02344747 = product of:
      0.04689494 = sum of:
        0.04689494 = weight(_text_:engine in 3385) [ClassicSimilarity], result of:
          0.04689494 = score(doc=3385,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.17731248 = fieldWeight in 3385, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3385)
      0.5 = coord(1/2)
  0.75 = coord(3/4)
```
Content

"Wie könnten Computer Sprache lernen und dabei auch die Bedeutung von Worten sowie die Beziehungen zwischen ihnen verstehen? Dieses Problem der Semantik stellt eine gewaltige, bislang nur ansatzweise bewältigte Aufgabe dar, da Worte und Wortverbindungen oft mehrere oder auch viele Bedeutungen haben, die zudem vom außersprachlichen Kontext abhängen. Die beiden holländischen (Ein künstliches Bewusstsein aus einfachen Aussagen (1)). Paul Vitanyi (2) und Rudi Cilibrasi vom Nationalen Institut für Mathematik und Informatik (3) in Amsterdam schlagen eine elegante Lösung vor: zum Nachschlagen im Internet, der größten Datenbank, die es gibt, wird einfach Google benutzt. Objekte wie eine Maus können mit ihren Namen "Maus" benannt werden, die Bedeutung allgemeiner Begriffe muss aus ihrem Kontext gelernt werden. Ein semantisches Web zur Repräsentation von Wissen besteht aus den möglichen Verbindungen, die Objekte und ihre Namen eingehen können. Natürlich können in der Wirklichkeit neue Namen, aber auch neue Bedeutungen und damit neue Verknüpfungen geschaffen werden. Sprache ist lebendig und flexibel. Um einer Künstlichen Intelligenz alle Wortbedeutungen beizubringen, müsste mit der Hilfe von menschlichen Experten oder auch vielen Mitarbeitern eine riesige Datenbank mit den möglichen semantischen Netzen aufgebaut und dazu noch ständig aktualisiert werden. Das aber müsste gar nicht notwendig sein, denn mit dem Web gibt es nicht nur die größte und weitgehend kostenlos benutzbare semantische Datenbank, sie wird auch ständig von zahllosen Internetnutzern aktualisiert. Zudem gibt es Suchmaschinen wie Google, die Verbindungen zwischen Worten und damit deren Bedeutungskontext in der Praxis in ihrer Wahrscheinlichkeit quantitativ mit der Angabe der Webseiten, auf denen sie gefunden wurden, messen.
Mit einem bereits zuvor von Paul Vitanyi und anderen entwickeltem Verfahren, das den Zusammenhang von Objekten misst (normalized information distance - NID ), kann die Nähe zwischen bestimmten Objekten (Bilder, Worte, Muster, Intervalle, Genome, Programme etc.) anhand aller Eigenschaften analysiert und aufgrund der dominanten gemeinsamen Eigenschaft bestimmt werden. Ähnlich können auch die allgemein verwendeten, nicht unbedingt "wahren" Bedeutungen von Namen mit der Google-Suche erschlossen werden. 'At this moment one database stands out as the pinnacle of computer-accessible human knowledge and the most inclusive summary of statistical information: the Google search engine. There can be no doubt that Google has already enabled science to accelerate tremendously and revolutionized the research process. It has dominated the attention of internet users for years, and has recently attracted substantial attention of many Wall Street investors, even reshaping their ideas of company financing.' (Paul Vitanyi und Rudi Cilibrasi) Gibt man ein Wort ein wie beispielsweise "Pferd", erhält man bei Google 4.310.000 indexierte Seiten. Für "Reiter" sind es 3.400.000 Seiten. Kombiniert man beide Begriffe, werden noch 315.000 Seiten erfasst. Für das gemeinsame Auftreten beispielsweise von "Pferd" und "Bart" werden zwar noch immer erstaunliche 67.100 Seiten aufgeführt, aber man sieht schon, dass "Pferd" und "Reiter" enger zusammen hängen. Daraus ergibt sich eine bestimmte Wahrscheinlichkeit für das gemeinsame Auftreten von Begriffen. Aus dieser Häufigkeit, die sich im Vergleich mit der maximalen Menge (5.000.000.000) an indexierten Seiten ergibt, haben die beiden Wissenschaftler eine statistische Größe entwickelt, die sie "normalised Google distance" (NGD) nennen und die normalerweise zwischen 0 und 1 liegt. Je geringer NGD ist, desto enger hängen zwei Begriffe zusammen. "Das ist eine automatische Bedeutungsgenerierung", sagt Vitanyi gegenüber dern New Scientist (4). "Das könnte gut eine Möglichkeit darstellen, einen Computer Dinge verstehen und halbintelligent handeln zu lassen." Werden solche Suchen immer wieder durchgeführt, lässt sich eine Karte für die Verbindungen von Worten erstellen. Und aus dieser Karte wiederum kann ein Computer, so die Hoffnung, auch die Bedeutung der einzelnen Worte in unterschiedlichen natürlichen Sprachen und Kontexten erfassen. So habe man über einige Suchen realisiert, dass ein Computer zwischen Farben und Zahlen unterscheiden, holländische Maler aus dem 17. Jahrhundert und Notfälle sowie Fast-Notfälle auseinander halten oder elektrische oder religiöse Begriffe verstehen könne. Überdies habe eine einfache automatische Übersetzung Englisch-Spanisch bewerkstelligt werden können. Auf diese Weise ließe sich auch, so hoffen die Wissenschaftler, die Bedeutung von Worten erlernen, könne man Spracherkennung verbessern oder ein semantisches Web erstellen und natürlich endlich eine bessere automatische Übersetzung von einer Sprache in die andere realisieren.

Sünkler, S.; Kerkmann, F.; Schultheiß, S.: Ok Google . the end of search as we know it : sprachgesteuerte Websuche im Test (2018) 0.05

0.053023666 = product of:
  0.10604733 = sum of:
    0.04072366 = weight(_text_:web in 5626) [ClassicSimilarity], result of:
      0.04072366 = score(doc=5626,freq=2.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.25239927 = fieldWeight in 5626, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5626)
    0.06532367 = weight(_text_:search in 5626) [ClassicSimilarity], result of:
      0.06532367 = score(doc=5626,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.38015217 = fieldWeight in 5626, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5626)
  0.5 = coord(2/4)

Abstract: Sprachsteuerungssysteme, die den Nutzer auf Zuruf unterstützen, werden im Zuge der Verbreitung von Smartphones und Lautsprechersystemen wie Amazon Echo oder Google Home zunehmend populär. Eine der zentralen Anwendungen dabei stellt die Suche in Websuchmaschinen dar. Wie aber funktioniert "googlen", wenn der Nutzer seine Suchanfrage nicht schreibt, sondern spricht? Dieser Frage ist ein Projektteam der HAW Hamburg nachgegangen und hat im Auftrag der Deutschen Telekom untersucht, wie effektiv, effizient und zufriedenstellend Google Now, Apple Siri, Microsoft Cortana sowie das Amazon Fire OS arbeiten. Ermittelt wurden Stärken und Schwächen der Systeme sowie Erfolgskriterien für eine hohe Gebrauchstauglichkeit. Diese Erkenntnisse mündeten in dem Prototyp einer optimalen Voice Web Search.

Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.05
```
0.05241821 = product of:
  0.10483642 = sum of:
    0.05817665 = weight(_text_:web in 3488) [ClassicSimilarity], result of:
      0.05817665 = score(doc=3488,freq=8.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.36057037 = fieldWeight in 3488, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
    0.046659768 = weight(_text_:search in 3488) [ClassicSimilarity], result of:
      0.046659768 = score(doc=3488,freq=4.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.27153727 = fieldWeight in 3488, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
  0.5 = coord(2/4)
```
Abstract

There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.

Series

Information Systems and Applications, incl. Internet/Web, and HCI; 10151

Source

Semantic keyword-based search on structured data sources: COST Action IC1302. Second International KEYSTONE Conference, IKC 2016, Cluj-Napoca, Romania, September 8-9, 2016, Revised Selected Papers. Eds.: A. Calì, A. et al

Stoykova, V.; Petkova, E.: Automatic extraction of mathematical terms for precalculus (2012) 0.05

0.05045079 = product of:
  0.10090158 = sum of:
    0.046190813 = weight(_text_:search in 156) [ClassicSimilarity], result of:
      0.046190813 = score(doc=156,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.2688082 = fieldWeight in 156, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
    0.05471077 = product of:
      0.10942154 = sum of:
        0.10942154 = weight(_text_:engine in 156) [ClassicSimilarity], result of:
          0.10942154 = score(doc=156,freq=2.0), product of:
            0.26447627 = queryWeight, product of:
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.049439456 = queryNorm
            0.41372913 = fieldWeight in 156, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.349498 = idf(docFreq=570, maxDocs=44218)
              0.0546875 = fieldNorm(doc=156)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In this work, we present the results of research for evaluating a methodology for extracting mathematical terms for precalculus using the techniques for semantically-oriented statistical search. We use the corpus-based approach and the combination of different statistically-based techniques for extracting keywords, collocations and co-occurrences incorporated in the Sketch Engine software. We evaluate the collocations candidate terms for the basic concept function(s) and approve the related methodology by precalculus domain conceptual terms definitions. Finally, we offer a conceptual terms hierarchical representation and discuss the results with respect to their possible applications.

Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.05

0.050025538 = product of:
  0.100051075 = sum of:
    0.060458954 = weight(_text_:web in 5896) [ClassicSimilarity], result of:
      0.060458954 = score(doc=5896,freq=6.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.37471575 = fieldWeight in 5896, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=5896)
    0.03959212 = weight(_text_:search in 5896) [ClassicSimilarity], result of:
      0.03959212 = score(doc=5896,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.230407 = fieldWeight in 5896, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.046875 = fieldNorm(doc=5896)
  0.5 = coord(2/4)

Abstract: Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.

Li, Q.; Chen, Y.P.; Myaeng, S.-H.; Jin, Y.; Kang, B.-Y.: Concept unification of terms in different languages via web mining for Information Retrieval (2009) 0.05
```
0.045585044 = product of:
  0.09117009 = sum of:
    0.05817665 = weight(_text_:web in 4215) [ClassicSimilarity], result of:
      0.05817665 = score(doc=4215,freq=8.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.36057037 = fieldWeight in 4215, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4215)
    0.032993436 = weight(_text_:search in 4215) [ClassicSimilarity], result of:
      0.032993436 = score(doc=4215,freq=2.0), product of:
        0.17183559 = queryWeight, product of:
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.049439456 = queryNorm
        0.19200584 = fieldWeight in 4215, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.475677 = idf(docFreq=3718, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4215)
  0.5 = coord(2/4)
```
Abstract

For historical and cultural reasons, English phrases, especially proper nouns and new words, frequently appear in Web pages written primarily in East Asian languages such as Chinese, Korean, and Japanese. Although such English terms and their equivalences in these East Asian languages refer to the same concept, they are often erroneously treated as independent index units in traditional Information Retrieval (IR). This paper describes the degree to which the problem arises in IR and proposes a novel technique to solve it. Our method first extracts English terms from native Web documents in an East Asian language, and then unifies the extracted terms and their equivalences in the native language as one index unit. For Cross-Language Information Retrieval (CLIR), one of the major hindrances to achieving retrieval performance at the level of Mono-Lingual Information Retrieval (MLIR) is the translation of terms in search queries which can not be found in a bilingual dictionary. The Web mining approach proposed in this paper for concept unification of terms in different languages can also be applied to solve this well-known challenge in CLIR. Experimental results based on NTCIR and KT-Set test collections show that the high translation precision of our approach greatly improves performance of both Mono-Lingual and Cross-Language Information Retrieval.

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.04

0.044953533 = product of:
  0.089907065 = sum of:
    0.06981198 = weight(_text_:web in 563) [ClassicSimilarity], result of:
      0.06981198 = score(doc=563,freq=8.0), product of:
        0.16134618 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.049439456 = queryNorm
        0.43268442 = fieldWeight in 563, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.02009509 = product of:
      0.04019018 = sum of:
        0.04019018 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.04019018 = score(doc=563,freq=2.0), product of:
            0.17312855 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.049439456 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Date: 10. 1.2013 19:22:47

Search (149 results, page 1 of 8)

Authors

Years

Languages

Types

Themes

Subjects

Classifications