Search (12 results, page 1 of 1)

Liu, X.; Croft, W.B.: Cluster-based retrieval using language models (2004) 0.01

0.014103786 = product of:
  0.028207572 = sum of:
    0.028207572 = product of:
      0.056415144 = sum of:
        0.056415144 = weight(_text_:research in 4115) [ClassicSimilarity], result of:
          0.056415144 = score(doc=4115,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.37825575 = fieldWeight in 4115, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.09375 = fieldNorm(doc=4115)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: SIGIR'04: Proceedings of the 27th Annual International ACM-SIGIR Conference an Research and Development in Information Retrieval. Ed.: K. Järvelin, u.a

Liu, X.; Turtle, H.: Real-time user interest modeling for real-time ranking (2013) 0.01
```
0.009972882 = product of:
  0.019945765 = sum of:
    0.019945765 = product of:
      0.03989153 = sum of:
        0.03989153 = weight(_text_:research in 1035) [ClassicSimilarity], result of:
          0.03989153 = score(doc=1035,freq=4.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.2674672 = fieldWeight in 1035, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.046875 = fieldNorm(doc=1035)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

User interest as a very dynamic information need is often ignored in most existing information retrieval systems. In this research, we present the results of experiments designed to evaluate the performance of a real-time interest model (RIM) that attempts to identify the dynamic and changing query level interests regarding social media outputs. Unlike most existing ranking methods, our ranking approach targets calculation of the probability that user interest in the content of the document is subject to very dynamic user interest change. We describe 2 formulations of the model (real-time interest vector space and real-time interest language model) stemming from classical relevance ranking methods and develop a novel methodology for evaluating the performance of RIM using Amazon Mechanical Turk to collect (interest-based) relevance judgments on a daily basis. Our results show that the model usually, although not always, performs better than baseline results obtained from commercial web search engines. We identify factors that affect RIM performance and outline plans for future research.

Liu, X.; Chen, X.: Authors' noninstitutional emails and their correlation with retraction (2021) 0.01

0.009402524 = product of:
  0.018805047 = sum of:
    0.018805047 = product of:
      0.037610095 = sum of:
        0.037610095 = weight(_text_:research in 152) [ClassicSimilarity], result of:
          0.037610095 = score(doc=152,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.2521705 = fieldWeight in 152, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0625 = fieldNorm(doc=152)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: We collected research articles from Retraction Watch database, Scopus, and a major retraction announcement by Springer, to identify emails used by authors. Authors' emails can be institutional emails and noninstitutional emails. Data suggest that retracted articles are more likely to use noninstitutional emails, but it is difficult to generalize. The study put some focus on authors from China.

Chen, M.; Liu, X.; Qin, J.: Semantic relation extraction from socially-generated tags : a methodology for metadata generation (2008) 0.01

0.00885352 = product of:
  0.01770704 = sum of:
    0.01770704 = product of:
      0.03541408 = sum of:
        0.03541408 = weight(_text_:22 in 2648) [ClassicSimilarity], result of:
          0.03541408 = score(doc=2648,freq=2.0), product of:
            0.18306525 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05227703 = queryNorm
            0.19345059 = fieldWeight in 2648, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2648)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas

Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.01
```
0.008310735 = product of:
  0.01662147 = sum of:
    0.01662147 = product of:
      0.03324294 = sum of:
        0.03324294 = weight(_text_:research in 4277) [ClassicSimilarity], result of:
          0.03324294 = score(doc=4277,freq=4.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.22288933 = fieldWeight in 4277, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4277)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.
Liu, X.; Kaza, S.; Zhang, P.; Chen, H.: Determining inventor status and its effect on knowledge diffusion : a study on nanotechnology literature from China, Russia, and India (2011) 0.01
```
0.008310735 = product of:
  0.01662147 = sum of:
    0.01662147 = product of:
      0.03324294 = sum of:
        0.03324294 = weight(_text_:research in 4468) [ClassicSimilarity], result of:
          0.03324294 = score(doc=4468,freq=4.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.22288933 = fieldWeight in 4468, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4468)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

In an increasingly global research landscape, it is important to identify the most prolific researchers in various institutions and their influence on the diffusion of knowledge. Knowledge diffusion within institutions is influenced by not just the status of individual researchers but also the collaborative culture that determines status. There are various methods to measure individual status, but few studies have compared them or explored the possible effects of different cultures on the status measures. In this article, we examine knowledge diffusion within science and technology-oriented research organizations. Using social network analysis metrics to measure individual status in large-scale coauthorship networks, we studied an individual's impact on the recombination of knowledge to produce innovation in nanotechnology. Data from the most productive and high-impact institutions in China (Chinese Academy of Sciences), Russia (Russian Academy of Sciences), and India (Indian Institutes of Technology) were used. We found that boundary-spanning individuals influenced knowledge diffusion in all countries. However, our results also indicate that cultural and institutional differences may influence knowledge diffusion.
Zhang, C.; Liu, X.; Xu, Y.(C.); Wang, Y.: Quality-structure index : a new metric to measure scientific journal influence (2011) 0.01
```
0.005876578 = product of:
  0.011753156 = sum of:
    0.011753156 = product of:
      0.023506312 = sum of:
        0.023506312 = weight(_text_:research in 4366) [ClassicSimilarity], result of:
          0.023506312 = score(doc=4366,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.15760657 = fieldWeight in 4366, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4366)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

An innovative model to measure the influence among scientific journals is developed in this study. This model is based on the path analysis of a journal citation network, and its output is a journal influence matrix that describes the directed influence among all journals. Based on this model, an index of journals' overall influence, the quality-structure index (QSI), is derived. Journal ranking based on QSI has the advantage of accounting for both intrinsic journal quality and the structural position of a journal in a citation network. The QSI also integrates the characteristics of two prevailing streams of journal-assessment measures: those based on bibliometric statistics to approximate intrinsic journal quality, such as the Journal Impact Factor, and those using a journal's structural position based on the PageRank-type of algorithm, such as the Eigenfactor score. Empirical results support our finding that the new index is significantly closer to scholars' subjective perception of journal influence than are the two aforementioned measures. In addition, the journal influence matrix offers a new way to measure two-way influences between any two academic journals, hence establishing a theoretical basis for future scientometrics studies to investigate the knowledge flow within and across research disciplines.
Liu, X.; Qin, J.: ¬An interactive metadata model for structural, descriptive, and referential representation of scholarly output (2014) 0.01
```
0.005876578 = product of:
  0.011753156 = sum of:
    0.011753156 = product of:
      0.023506312 = sum of:
        0.023506312 = weight(_text_:research in 1253) [ClassicSimilarity], result of:
          0.023506312 = score(doc=1253,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.15760657 = fieldWeight in 1253, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The scientific metadata model proposed in this article encompasses both classical descriptive metadata such as those defined in the Dublin Core Metadata Element Set (DC) and the innovative structural and referential metadata properties that go beyond the classical model. Structural metadata capture the structural vocabulary in research publications; referential metadata include not only citations but also data about other types of scholarly output that is based on or related to the same publication. The article describes the structural, descriptive, and referential (SDR) elements of the metadata model and explains the underlying assumptions and justifications for each major component in the model. ScholarWiki, an experimental system developed as a proof of concept, was built over the wiki platform to allow user interaction with the metadata and the editing, deleting, and adding of metadata. By allowing and encouraging scholars (both as authors and as users) to participate in the knowledge and metadata editing and enhancing process, the larger community will benefit from more accurate and effective information retrieval. The ScholarWiki system utilizes machine-learning techniques that can automatically produce self-enhanced metadata by learning from the structural metadata that scholars contribute, which will add intelligence to enhance and update automatically the publication of metadata Wiki pages.
Chen, Z.; Huang, Y.; Tian, J.; Liu, X.; Fu, K.; Huang, T.: Joint model for subsentence-level sentiment analysis with Markov logic (2015) 0.01
```
0.005876578 = product of:
  0.011753156 = sum of:
    0.011753156 = product of:
      0.023506312 = sum of:
        0.023506312 = weight(_text_:research in 2210) [ClassicSimilarity], result of:
          0.023506312 = score(doc=2210,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.15760657 = fieldWeight in 2210, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2210)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Sentiment analysis mainly focuses on the study of one's opinions that express positive or negative sentiments. With the explosive growth of web documents, sentiment analysis is becoming a hot topic in both academic research and system design. Fine-grained sentiment analysis is traditionally solved as a 2-step strategy, which results in cascade errors. Although joint models, such as joint sentiment/topic and maximum entropy (MaxEnt)/latent Dirichlet allocation, are proposed to tackle this problem of sentiment analysis, they focus on the joint learning of both aspects and sentiments. Thus, they are not appropriate to solve the cascade errors for sentiment analysis at the sentence or subsentence level. In this article, we present a novel jointly fine-grained sentiment analysis framework at the subsentence level with Markov logic. First, we divide the task into 2 separate stages (subjectivity classification and polarity classification). Then, the 2 separate stages are processed, respectively, with different feature sets, which are implemented by local formulas in Markov logic. Finally, global formulas in Markov logic are adopted to realize the interactions of the 2 separate stages. The joint inference of subjectivity and polarity helps prevent cascade errors. Experiments on a Chinese sentiment data set manifest that our joint model brings significant improvements.
Jiang, Z.; Liu, X.; Chen, Y.: Recovering uncaptured citations in a scholarly network : a two-step citation analysis to estimate publication importance (2016) 0.01
```
0.005876578 = product of:
  0.011753156 = sum of:
    0.011753156 = product of:
      0.023506312 = sum of:
        0.023506312 = weight(_text_:research in 3018) [ClassicSimilarity], result of:
          0.023506312 = score(doc=3018,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.15760657 = fieldWeight in 3018, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3018)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The citation relationships between publications, which are significant for assessing the importance of scholarly components within a network, have been used for various scientific applications. Missing citation metadata in scholarly databases, however, create problems for classical citation-based ranking algorithms and challenge the performance of citation-based retrieval systems. In this research, we utilize a two-step citation analysis method to investigate the importance of publications for which citation information is partially missing. First, we calculate the importance of the author and then use his importance to estimate the publication importance for some selected articles. To evaluate this method, we designed a simulation experiment-"random citation-missing"-to test the two-step citation analysis that we carried out with the Association for Computing Machinery (ACM) Digital Library (DL). In this experiment, we simulated different scenarios in a large-scale scientific digital library, from high-quality citation data, to very poor quality data, The results show that a two-step citation analysis can effectively uncover the importance of publications in different situations. More importantly, we found that the optimized impact from the importance of an author (first step) is exponentially increased when the quality of citation decreases. The findings from this study can further enhance citation-based publication-ranking algorithms for real-world applications.
Liu, X.; Hu, M.; Xiao, B.S.; Shao, J.: Is my doctor around me? : Investigating the impact of doctors' presence on patients' review behaviors on an online health platform (2022) 0.01
```
0.005876578 = product of:
  0.011753156 = sum of:
    0.011753156 = product of:
      0.023506312 = sum of:
        0.023506312 = weight(_text_:research in 650) [ClassicSimilarity], result of:
          0.023506312 = score(doc=650,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.15760657 = fieldWeight in 650, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=650)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Patient-generated online reviews are well-established as an important source of information for people to evaluate doctors' quality and improve health outcomes. However, how such reviews are generated in the first place is not well examined. This study examines a hitherto unexplored social driver of online review generation-doctors' presence on online health platforms, which results in the reviewers (i.e., patients) and the reviewees (i.e., doctors) coexisting in the same medium. Drawing on the Stimulus-Organism-Response theory as an overarching framework, we advance hypotheses about the impact of doctors' presence on their patients' review behaviors, including review volume, review effort, and emotional expression. To achieve causal identification, we conduct a quasi-experiment on a large online health platform and employ propensity score matching and difference-in-difference estimation. Our findings show that doctors' presence increases their patients' review volume. Furthermore, doctors' presence motivates their patients to exert greater effort and express more positive emotions in the review text. The results also show that the presence of doctors with higher professional titles has a stronger effect on review volume than the presence of doctors with lower professional titles. Our findings offer important implications both for research and practice.
Cui, Y.; Wang, Y.; Liu, X.; Wang, X.; Zhang, X.: Multidimensional scholarly citations : characterizing and understanding scholars' citation behaviors (2023) 0.01
```
0.005876578 = product of:
  0.011753156 = sum of:
    0.011753156 = product of:
      0.023506312 = sum of:
        0.023506312 = weight(_text_:research in 847) [ClassicSimilarity], result of:
          0.023506312 = score(doc=847,freq=2.0), product of:
            0.1491455 = queryWeight, product of:
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.05227703 = queryNorm
            0.15760657 = fieldWeight in 847, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.8529835 = idf(docFreq=6931, maxDocs=44218)
              0.0390625 = fieldNorm(doc=847)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This study investigates scholars' citation behaviors from a fine-grained perspective. Specifically, each scholarly citation is considered multidimensional rather than logically unidimensional (i.e., present or absent). Thirty million articles from PubMed were accessed for use in empirical research, in which a total of 15 interpretable features of scholarly citations were constructed and grouped into three main categories. Each category corresponds to one aspect of the reasons and motivations behind scholars' citation decision-making during academic writing. Using about 500,000 pairs of actual and randomly generated scholarly citations, a series of Random Forest-based classification experiments were conducted to quantitatively evaluate the correlation between each constructed citation feature and citation decisions made by scholars. Our experimental results indicate that citation proximity is the category most relevant to scholars' citation decision-making, followed by citation authority and citation inertia. However, big-name scholars whose h-indexes rank among the top 1% exhibit a unique pattern of citation behaviors-their citation decision-making correlates most closely with citation inertia, with the correlation nearly three times as strong as that of their ordinary counterparts. Hopefully, the empirical findings presented in this paper can bring us closer to characterizing and understanding the complex process of generating scholarly citations in academia.

Search (12 results, page 1 of 1)

Authors

Years

Themes