Search (5 results, page 1 of 1)

  • × author_ss:"Yu, Y."
  1. Luo, Z.; Yu, Y.; Osborne, M.; Wang, T.: Structuring tweets for improving Twitter search (2015) 0.00
    0.0026742492 = product of:
      0.0053484985 = sum of:
        0.0053484985 = product of:
          0.010696997 = sum of:
            0.010696997 = weight(_text_:a in 2335) [ClassicSimilarity], result of:
              0.010696997 = score(doc=2335,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.20142901 = fieldWeight in 2335, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2335)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Spam and wildly varying documents make searching in Twitter challenging. Most Twitter search systems generally treat a Tweet as a plain text when modeling relevance. However, a series of conventions allows users to Tweet in structural ways using a combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and the sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured documents (e.g., web pages) retrieval. In this study we utilize the structure of Tweets, induced by these blocks, for Twitter retrieval and Twitter opinion retrieval. For Twitter retrieval, a set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring Tweets can achieve state-of-the-art performance. Our approach does not rely on social media features, but when we do add this additional information, performance improves significantly. For Twitter opinion retrieval, we explore the question of whether structural information derived from the body of Tweets and opinionatedness ratings of Tweets can improve performance. Experimental results show that retrieval using a novel unsupervised opinionatedness feature based on structuring Tweets achieves comparable performance with a supervised method using manually tagged Tweets. Topic-related specific structured Tweet sets are shown to help with query-dependent opinion retrieval.
    Type
    a
  2. Wang, H.; Liu, Q.; Penin, T.; Fu, L.; Zhang, L.; Tran, T.; Yu, Y.; Pan, Y.: Semplore: a scalable IR approach to search the Web of Data (2009) 0.00
    0.002269176 = product of:
      0.004538352 = sum of:
        0.004538352 = product of:
          0.009076704 = sum of:
            0.009076704 = weight(_text_:a in 1638) [ClassicSimilarity], result of:
              0.009076704 = score(doc=1638,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.1709182 = fieldWeight in 1638, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1638)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The Web of Data keeps growing rapidly. However, the full exploitation of this large amount of structured data faces numerous challenges like usability, scalability, imprecise information needs and data change. We present Semplore, an IR-based system that aims at addressing these issues. Semplore supports intuitive faceted search and complex queries both on text and structured data. It combines imprecise keyword search and precise structured query in a unified ranking scheme. Scalable query processing is supported by leveraging inverted indexes traditionally used in IR systems. This is combined with a novel block-based index structure to support efficient index update when data changes. The experimental results show that Semplore is an efficient and effective system for searching the Web of Data and can be used as a basic infrastructure for Web-scale Semantic Web search engines.
    Type
    a
  3. Zhang, L.; Liu, Q.L.; Zhang, J.; Wang, H.F.; Pan, Y.; Yu, Y.: Semplore: an IR approach to scalable hybrid query of Semantic Web data (2007) 0.00
    0.0018909799 = product of:
      0.0037819599 = sum of:
        0.0037819599 = product of:
          0.0075639198 = sum of:
            0.0075639198 = weight(_text_:a in 231) [ClassicSimilarity], result of:
              0.0075639198 = score(doc=231,freq=10.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.14243183 = fieldWeight in 231, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=231)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we briefy describe how Semplore is used for searching Wikipedia and an IBM customer's product information.
    Type
    a
  4. Song, R.; Luo, Z.; Nie, J.-Y.; Yu, Y.; Hon, H.-W.: Identification of ambiguous queries in web search (2009) 0.00
    0.001757696 = product of:
      0.003515392 = sum of:
        0.003515392 = product of:
          0.007030784 = sum of:
            0.007030784 = weight(_text_:a in 2441) [ClassicSimilarity], result of:
              0.007030784 = score(doc=2441,freq=6.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.13239266 = fieldWeight in 2441, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2441)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    It is widely believed that many queries submitted to search engines are inherently ambiguous (e.g., java and apple). However, few studies have tried to classify queries based on ambiguity and to answer "what the proportion of ambiguous queries is". This paper deals with these issues. First, we clarify the definition of ambiguous queries by constructing the taxonomy of queries from being ambiguous to specific. Second, we ask human annotators to manually classify queries. From manually labeled results, we observe that query ambiguity is to some extent predictable. Third, we propose a supervised learning approach to automatically identify ambiguous queries. Experimental results show that we can correctly identify 87% of labeled queries with the approach. Finally, by using our approach, we estimate that about 16% of queries in a real search log are ambiguous.
    Type
    a
  5. Cao, Y.; Duan, H.; Lin, C.-L.; Yu, Y.: Re-ranking question search results by clustering questions (2011) 0.00
    0.001674345 = product of:
      0.00334869 = sum of:
        0.00334869 = product of:
          0.00669738 = sum of:
            0.00669738 = weight(_text_:a in 4469) [ClassicSimilarity], result of:
              0.00669738 = score(doc=4469,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.12611452 = fieldWeight in 4469, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=4469)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this article, we address the problem of question clustering and study its use for re-ranking question search results. In question clustering we have to organize question search results into certain meaningful and condensed groups. Specifically, we propose to use a data structure consisting of question topic and question focus for modeling questions, and then cluster questions on the basis of the data structure. Experimental results show that our approach to question clustering improves the performance of question search significantly better than the approach not utilizing the topic-focus structure.
    Type
    a