Search (10 results, page 1 of 1)

Chen, Z.; Wenyin, L.; Zhang, F.; Li, M.; Zhang, H.: Web mining for Web image retrieval (2001) 0.03

0.034791782 = product of:
  0.11597261 = sum of:
    0.044538345 = weight(_text_:web in 6521) [ClassicSimilarity], result of:
      0.044538345 = score(doc=6521,freq=14.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.47698978 = fieldWeight in 6521, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6521)
    0.06491486 = weight(_text_:log in 6521) [ClassicSimilarity], result of:
      0.06491486 = score(doc=6521,freq=2.0), product of:
        0.18335998 = queryWeight, product of:
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.028611459 = queryNorm
        0.3540296 = fieldWeight in 6521, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.0390625 = fieldNorm(doc=6521)
    0.00651941 = product of:
      0.019558229 = sum of:
        0.019558229 = weight(_text_:29 in 6521) [ClassicSimilarity], result of:
          0.019558229 = score(doc=6521,freq=2.0), product of:
            0.10064617 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.028611459 = queryNorm
            0.19432661 = fieldWeight in 6521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6521)
      0.33333334 = coord(1/3)
  0.3 = coord(3/10)

Abstract: The popularity of digital images is rapidly increasing due to improving digital imaging technologies and convenient availability facilitated by the Internet. However, how to find user-intended images from the Internet is nontrivial. The main reason is that the Web images are usually not annotated using semantic descriptors. In this article, we present an effective approach to and a prototype system for image retrieval from the Internet using Web mining. The system can also serve as a Web image search engine. One of the key ideas in the approach is to extract the text information on the Web pages to semantically describe the images. The text description is then combined with other low-level image features in the image similarity assessment. Another main contribution of this work is that we apply data mining on the log of users' feedback to improve image retrieval performance in three aspects. First, the accuracy of the document space model of image representation obtained from the Web pages is improved by removing clutter and irrelevant text information. Second, to construct the user space model of users' representation of images, which is then combined with the document space model to eliminate mismatch between the page author's expression and the user's understanding and expectation. Third, to discover the relationship between low-level and high-level features, which is extremely useful for assigning the low-level features' weights in similarity assessment
Date: 29. 9.2001 17:32:09

Chen, Z.; Fu, B.: On the complexity of Rocchio's similarity-based relevance feedback algorithm (2007) 0.01
```
0.014515405 = product of:
  0.14515404 = sum of:
    0.14515404 = weight(_text_:log in 578) [ClassicSimilarity], result of:
      0.14515404 = score(doc=578,freq=10.0), product of:
        0.18335998 = queryWeight, product of:
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.028611459 = queryNorm
        0.79163426 = fieldWeight in 578, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.0390625 = fieldNorm(doc=578)
  0.1 = coord(1/10)
```
Abstract

Rocchio's similarity-based relevance feedback algorithm, one of the most important query reformation methods in information retrieval, is essentially an adaptive learning algorithm from examples in searching for documents represented by a linear classifier. Despite its popularity in various applications, there is little rigorous analysis of its learning complexity in literature. In this article, the authors prove for the first time that the learning complexity of Rocchio's algorithm is O(d + d**2(log d + log n)) over the discretized vector space {0, ... , n - 1 }**d when the inner product similarity measure is used. The upper bound on the learning complexity for searching for documents represented by a monotone linear classifier (q, 0) over {0, ... , n - 1 }d can be improved to, at most, 1 + 2k (n - 1) (log d + log(n - 1)), where k is the number of nonzero components in q. Several lower bounds on the learning complexity are also obtained for Rocchio's algorithm. For example, the authors prove that Rocchio's algorithm has a lower bound Omega((d über 2)log n) on its learning complexity over the Boolean vector space {0,1}**d.

Wenyin, L.; Chen, Z.; Li, M.; Zhang, H.: ¬A media agent for automatically builiding a personalized semantic index of Web media objects (2001) 0.01

0.0114609385 = product of:
  0.05730469 = sum of:
    0.0494814 = weight(_text_:web in 6522) [ClassicSimilarity], result of:
      0.0494814 = score(doc=6522,freq=12.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.5299281 = fieldWeight in 6522, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=6522)
    0.007823291 = product of:
      0.023469873 = sum of:
        0.023469873 = weight(_text_:29 in 6522) [ClassicSimilarity], result of:
          0.023469873 = score(doc=6522,freq=2.0), product of:
            0.10064617 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.028611459 = queryNorm
            0.23319192 = fieldWeight in 6522, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=6522)
      0.33333334 = coord(1/3)
  0.2 = coord(2/10)

Abstract: A novel idea of media agent is briefly presented, which can automatically build a personalized semantic index of Web media objects for each particular user. Because the Web is a rich source of multimedia data and the text content on the Web pages is usually semantically related to those media objects on the same pages, the media agent can automatically collect the URLs and related text, and then build the index of the multimedia data, on behalf of the user whenever and wherever she accesses these multimedia data or their container Web pages. Moreover, the media agent can also use an off-line crawler to build the index for those multimedia objects that are relevant to the user's favorites but have not accessed by the user yet. When the user wants to find these multimedia data once again, the semantic index facilitates text-based search for her.
Date: 29. 9.2001 17:37:16
Theme: Web-Agenten

Lian, T.; Chen, Z.; Lin, Y.; Ma, J.: Temporal patterns of the online video viewing behavior of smart TV viewers (2018) 0.01
```
0.006491486 = product of:
  0.06491486 = sum of:
    0.06491486 = weight(_text_:log in 4219) [ClassicSimilarity], result of:
      0.06491486 = score(doc=4219,freq=2.0), product of:
        0.18335998 = queryWeight, product of:
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.028611459 = queryNorm
        0.3540296 = fieldWeight in 4219, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4219)
  0.1 = coord(1/10)
```
Abstract

In recent years, millions of households have shifted from traditional TVs to smart TVs for viewing online videos on TV screens. In this article, we perform extensive analyses on a large-scale online video viewing log on smart TVs. Because time influences almost every aspect of our lives, our aim is to understand temporal patterns of the online video viewing behavior of smart TV viewers at the crowd level. First, we measure the amount of time per hour spent in watching online videos on smart TV by each household on each day. By applying clustering techniques, we identify eight daily patterns whose peak hours occur in different segments of the day. The differences among households can be characterized by three types of temporal habits. We also uncover five periodic weekly patterns. There seems to be a circadian rhythm at the crow level. Further analysis confirms that there exists a holiday effect in the online video viewing behavior on smart TVs. Finally, we investigate the popularity variations of different video categories over the day. The obtained insights shed light on how we can partition a day to improve the performance of time-aware video recommendations for smart TV viewers.
Shen, D.; Yang, Q.; Chen, Z.: Noise reduction through summarization for Web-page classification (2007) 0.01
```
0.006060208 = product of:
  0.06060208 = sum of:
    0.06060208 = weight(_text_:web in 953) [ClassicSimilarity], result of:
      0.06060208 = score(doc=953,freq=18.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.64902663 = fieldWeight in 953, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=953)
  0.1 = coord(1/10)
```
Abstract

Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.

Shen, D.; Chen, Z.; Yang, Q.; Zeng, H.J.; Zhang, B.; Lu, Y.; Ma, W.Y.: Web page classification through summarization (2004) 0.00

0.0033667826 = product of:
  0.033667825 = sum of:
    0.033667825 = weight(_text_:web in 4132) [ClassicSimilarity], result of:
      0.033667825 = score(doc=4132,freq=2.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.36057037 = fieldWeight in 4132, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=4132)
  0.1 = coord(1/10)

Chen, Z.; Meng, X.; Fowler, R.H.; Zhu, B.: Real-time adaptive feature and document learning for Web search (2001) 0.00
```
0.0023806747 = product of:
  0.023806747 = sum of:
    0.023806747 = weight(_text_:web in 5209) [ClassicSimilarity], result of:
      0.023806747 = score(doc=5209,freq=4.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.25496176 = fieldWeight in 5209, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5209)
  0.1 = coord(1/10)
```
Abstract

Chen et alia report on the design of FEATURES, a web search engine with adaptive features based on minimal relevance feedback. Rather than developing user profiles from previous searcher activity either at the server or client location, or updating indexes after search completion, FEATURES allows for index and user characterization files to be updated during query modification on retrieval from a general purpose search engine. Indexing terms relevant to a query are defined as the union of all terms assigned to documents retrieved by the initial search run and are used to build a vector space model on this retrieved set. The top ten weighted terms are presented to the user for a relevant non-relevant choice which is used to modify the term weights. Documents are chosen if their summed term weights are greater than some threshold. A user evaluation of the top ten ranked documents as non-relevant will decrease these term weights and a positive judgement will increase them. A new ordering of the retrieved set will generate new display lists of terms and documents. Precision is improved in a test on Alta Vista searches.
Ren, P.; Chen, Z.; Ma, J.; Zhang, Z.; Si, L.; Wang, S.: Detecting temporal patterns of user queries (2017) 0.00
```
0.0020200694 = product of:
  0.020200694 = sum of:
    0.020200694 = weight(_text_:web in 3315) [ClassicSimilarity], result of:
      0.020200694 = score(doc=3315,freq=2.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.21634221 = fieldWeight in 3315, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=3315)
  0.1 = coord(1/10)
```
Abstract

Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.
Chen, Z.; Huang, Y.; Tian, J.; Liu, X.; Fu, K.; Huang, T.: Joint model for subsentence-level sentiment analysis with Markov logic (2015) 0.00
```
0.0016833913 = product of:
  0.016833913 = sum of:
    0.016833913 = weight(_text_:web in 2210) [ClassicSimilarity], result of:
      0.016833913 = score(doc=2210,freq=2.0), product of:
        0.0933738 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.028611459 = queryNorm
        0.18028519 = fieldWeight in 2210, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2210)
  0.1 = coord(1/10)
```
Abstract

Sentiment analysis mainly focuses on the study of one's opinions that express positive or negative sentiments. With the explosive growth of web documents, sentiment analysis is becoming a hot topic in both academic research and system design. Fine-grained sentiment analysis is traditionally solved as a 2-step strategy, which results in cascade errors. Although joint models, such as joint sentiment/topic and maximum entropy (MaxEnt)/latent Dirichlet allocation, are proposed to tackle this problem of sentiment analysis, they focus on the joint learning of both aspects and sentiments. Thus, they are not appropriate to solve the cascade errors for sentiment analysis at the sentence or subsentence level. In this article, we present a novel jointly fine-grained sentiment analysis framework at the subsentence level with Markov logic. First, we divide the task into 2 separate stages (subjectivity classification and polarity classification). Then, the 2 separate stages are processed, respectively, with different feature sets, which are implemented by local formulas in Markov logic. Finally, global formulas in Markov logic are adopted to realize the interactions of the 2 separate stages. The joint inference of subjectivity and polarity helps prevent cascade errors. Experiments on a Chinese sentiment data set manifest that our joint model brings significant improvements.

Chen, Z.: ¬A conceptual model for storage and retrieval of short scientific texts (1993) 0.00

0.001303882 = product of:
  0.01303882 = sum of:
    0.01303882 = product of:
      0.039116457 = sum of:
        0.039116457 = weight(_text_:29 in 2715) [ClassicSimilarity], result of:
          0.039116457 = score(doc=2715,freq=2.0), product of:
            0.10064617 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.028611459 = queryNorm
            0.38865322 = fieldWeight in 2715, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=2715)
      0.33333334 = coord(1/3)
  0.1 = coord(1/10)

Source: Information processing and management. 29(1993) no.2, S.209-214

Search (10 results, page 1 of 1)

Authors

Years

Themes