Search (13 results, page 1 of 1)

Gao, J.; Zhang, J.: Clustered SVD strategies in latent semantic indexing (2005) 0.03

0.032942846 = product of:
  0.06588569 = sum of:
    0.036211025 = weight(_text_:data in 1166) [ClassicSimilarity], result of:
      0.036211025 = score(doc=1166,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 1166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1166)
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 1166) [ClassicSimilarity], result of:
          0.05934933 = score(doc=1166,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 1166, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1166)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collections. For large inhomogeneous datasets, the performance of the SVD based text retrieval technique may deteriorate. We propose to partition a large inhomogeneous dataset into several smaller ones with clustered structure, on which we apply the truncated SVD. Our experimental results show that the clustered SVD strategies may enhance the retrieval accuracy and reduce the computing and storage costs.
Source: Information processing and management. 41(2005) no.5, S.1051-1064

Zhang, J.; Zhao, Y.: ¬A user term visualization analysis based on a social question and answer log (2013) 0.03
```
0.028887425 = product of:
  0.05777485 = sum of:
    0.03657866 = weight(_text_:data in 2715) [ClassicSimilarity], result of:
      0.03657866 = score(doc=2715,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 2715, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2715)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 2715) [ClassicSimilarity], result of:
          0.042392377 = score(doc=2715,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 2715, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2715)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The authors of this paper investigate terms of consumers' diabetes based on a log from the Yahoo!Answers social question and answers (Q&A) forum, ascertain characteristics and relationships among terms related to diabetes from the consumers' perspective, and reveal users' diabetes information seeking patterns. In this study, the log analysis method, data coding method, and visualization multiple-dimensional scaling analysis method were used for analysis. The visual analyses were conducted at two levels: terms analysis within a category and category analysis among the categories in the schema. The findings show that the average number of words per question was 128.63, the average number of sentences per question was 8.23, the average number of words per response was 254.83, and the average number of sentences per response was 16.01. There were 12 categories (Cause & Pathophysiology, Sign & Symptom, Diagnosis & Test, Organ & Body Part, Complication & Related Disease, Medication, Treatment, Education & Info Resource, Affect, Social & Culture, Lifestyle, and Nutrient) in the diabetes related schema which emerged from the data coding analysis. The analyses at the two levels show that terms and categories were clustered and patterns were revealed. Future research directions are also included.

Source

Information processing and management. 49(2013) no.5, S.1019-1048

Zhang, J.: TOFIR: A tool of facilitating information retrieval : introduce a visual retrieval model (2001) 0.01

0.014837332 = product of:
  0.05934933 = sum of:
    0.05934933 = product of:
      0.11869866 = sum of:
        0.11869866 = weight(_text_:processing in 7711) [ClassicSimilarity], result of:
          0.11869866 = score(doc=7711,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.6261658 = fieldWeight in 7711, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.109375 = fieldNorm(doc=7711)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 37(2001) no.4, S.639-657

Zhang, L.; Liu, Q.L.; Zhang, J.; Wang, H.F.; Pan, Y.; Yu, Y.: Semplore: an IR approach to scalable hybrid query of Semantic Web data (2007) 0.01
```
0.011199882 = product of:
  0.04479953 = sum of:
    0.04479953 = weight(_text_:data in 231) [ClassicSimilarity], result of:
      0.04479953 = score(doc=231,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.30255508 = fieldWeight in 231, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=231)
  0.25 = coord(1/4)
```
Abstract

As an extension to the current Web, Semantic Web will not only contain structured data with machine understandable semantics but also textual information. While structured queries can be used to find information more precisely on the Semantic Web, keyword searches are still needed to help exploit textual information. It thus becomes very important that we can combine precise structured queries with imprecise keyword searches to have a hybrid query capability. In addition, due to the huge volume of information on the Semantic Web, the hybrid query must be processed in a very scalable way. In this paper, we define such a hybrid query capability that combines unary tree-shaped structured queries with keyword searches. We show how existing information retrieval (IR) index structures and functions can be reused to index semantic web data and its textual information, and how the hybrid query is evaluated on the index structure using IR engines in an efficient and scalable manner. We implemented this IR approach in an engine called Semplore. Comprehensive experiments on its performance show that it is a promising approach. It leads us to believe that it may be possible to evolve current web search engines to query and search the Semantic Web. Finally, we briefy describe how Semplore is used for searching Wikipedia and an IBM customer's product information.
Li, D.; Tang, J.; Ding, Y.; Shuai, X.; Chambers, T.; Sun, G.; Luo, Z.; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging (2015) 0.01
```
0.009144665 = product of:
  0.03657866 = sum of:
    0.03657866 = weight(_text_:data in 2345) [ClassicSimilarity], result of:
      0.03657866 = score(doc=2345,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 2345, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2345)
  0.25 = coord(1/4)
```
Abstract

Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.

Theme

Data Mining
Li, D.; Luo, Z.; Ding, Y.; Tang, J.; Sun, G.G.-Z.; Dai, X.; Du, J.; Zhang, J.; Kong, S.: User-level microblogging recommendation incorporating social influence (2017) 0.01
```
0.009144665 = product of:
  0.03657866 = sum of:
    0.03657866 = weight(_text_:data in 3426) [ClassicSimilarity], result of:
      0.03657866 = score(doc=3426,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 3426, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
  0.25 = coord(1/4)
```
Abstract

With the information overload of user-generated content in microblogging, users find it extremely challenging to browse and find valuable information in their first attempt. In this paper we propose a microblogging recommendation algorithm, TSI-MR (Topic-Level Social Influence-based Microblogging Recommendation), which can significantly improve users' microblogging experiences. The main innovation of this proposed algorithm is that we consider social influences and their indirect structural relationships, which are largely based on social status theory, from the topic level. The primary advantage of this approach is that it can build an accurate description of latent relationships between two users with weak connections, which can improve the performance of the model; furthermore, it can solve sparsity problems of training data to a certain extent. The realization of the model is mainly based on Factor Graph. We also applied a distributed strategy to further improve the efficiency of the model. Finally, we use data from Tencent Weibo, one of the most popular microblogging services in China, to evaluate our methods. The results show that incorporating social influence can improve microblogging performance considerably, and outperform the baseline methods.

Zhang, J.; Dimitroff, A.: ¬The impact of webpage content characteristics on webpage visibility in search engine results : part I (2005) 0.01

0.008478476 = product of:
  0.033913903 = sum of:
    0.033913903 = product of:
      0.067827806 = sum of:
        0.067827806 = weight(_text_:processing in 1032) [ClassicSimilarity], result of:
          0.067827806 = score(doc=1032,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.35780904 = fieldWeight in 1032, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0625 = fieldNorm(doc=1032)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 41(2005) no.3, S.665-690

Zhang, J.; Jastram, I.: ¬A study of the metadata creation behavior of different user groups on the Internet (2006) 0.01

0.007418666 = product of:
  0.029674664 = sum of:
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 982) [ClassicSimilarity], result of:
          0.05934933 = score(doc=982,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 982, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=982)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 42(2006) no.4, S.1099-1122

Zhang, J.; Dimitroff, A.: ¬The impact of metadata implementation on webpage visibility in search engine results : part II (2005) 0.01

0.007418666 = product of:
  0.029674664 = sum of:
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 1027) [ClassicSimilarity], result of:
          0.05934933 = score(doc=1027,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 1027, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1027)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 41(2005) no.3, S.691-716

Zhang, J.; Dimitroff, A.: ¬The impact of metadata implementation on webpage visibility in search engine results : part II (2005) 0.01

0.007418666 = product of:
  0.029674664 = sum of:
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 1033) [ClassicSimilarity], result of:
          0.05934933 = score(doc=1033,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 1033, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1033)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 41(2005) no.3, S.691-715

Zhang, J.; Wolfram, D.; Wang, P.: Analysis of query keywords of sports-related queries using visualization and clustering (2009) 0.01
```
0.006466255 = product of:
  0.02586502 = sum of:
    0.02586502 = weight(_text_:data in 2947) [ClassicSimilarity], result of:
      0.02586502 = score(doc=2947,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.17468026 = fieldWeight in 2947, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2947)
  0.25 = coord(1/4)
```
Abstract

The authors investigated 11 sports-related query keywords extracted from a public search engine query log to better understand sports-related information seeking on the Internet. After the query log contents were cleaned and query data were parsed, popular sports-related keywords were identified, along with frequently co-occurring query terms associated with the identified keywords. Relationships among each sports-related focus keyword and its related keywords were characterized and grouped using multidimensional scaling (MDS) in combination with traditional hierarchical clustering methods. The two approaches were synthesized in a visual context by highlighting the results of the hierarchical clustering analysis in the visual MDS configuration. Important events, people, subjects, merchandise, and so on related to a sport were illustrated, and relationships among the sports were analyzed. A small-scale comparative study of sports searches with and without term assistance was conducted. Searches that used search term assistance by relying on previous query term relationships outperformed the searches without the search term assistance. The findings of this study provide insights into sports information seeking behavior on the Internet. The developed method also may be applied to other query log subject areas.

Zhang, J.; Nguyen, T.: WebStar: a visualization model for hyperlink structures (2005) 0.01

0.0063588563 = product of:
  0.025435425 = sum of:
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 1056) [ClassicSimilarity], result of:
          0.05087085 = score(doc=1056,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 1056, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=1056)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Information processing and management. 41(2005) no.4, S.1003-1018

Zhang, J.; Zeng, M.L.: ¬A new similarity measure for subject hierarchical structures (2014) 0.00

0.0039652926 = product of:
  0.01586117 = sum of:
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 1778) [ClassicSimilarity], result of:
          0.03172234 = score(doc=1778,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 1778, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1778)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 8. 4.2015 16:22:13

Search (13 results, page 1 of 1)

Authors

Years

Types

Themes