Search (17 results, page 1 of 1)

  • × author_ss:"Liu, X."
  1. Frias-Martinez, E.; Chen, S.Y.; Liu, X.: Automatic cognitive style identification of digital library users for personalization (2007) 0.03
    0.029936418 = product of:
      0.104777455 = sum of:
        0.08978786 = weight(_text_:interactions in 74) [ClassicSimilarity], result of:
          0.08978786 = score(doc=74,freq=2.0), product of:
            0.22965278 = queryWeight, product of:
              5.8977947 = idf(docFreq=329, maxDocs=44218)
              0.038938753 = queryNorm
            0.39097226 = fieldWeight in 74, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.8977947 = idf(docFreq=329, maxDocs=44218)
              0.046875 = fieldNorm(doc=74)
        0.014989593 = weight(_text_:with in 74) [ClassicSimilarity], result of:
          0.014989593 = score(doc=74,freq=2.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.15974675 = fieldWeight in 74, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.046875 = fieldNorm(doc=74)
      0.2857143 = coord(2/7)
    
    Abstract
    Digital libraries have become one of the most important Web services for information seeking. One of their main drawbacks is their global approach: In general, there is just one interface for all users. One of the key elements in improving user satisfaction in digital libraries is personalization. When considering personalizing factors, cognitive styles have been proved to be one of the relevant parameters that affect information seeking. This justifies the introduction of cognitive style as one of the parameters of a Web personalized service. Nevertheless, this approach has one major drawback: Each user has to run a time-consuming test that determines his or her cognitive style. In this article, we present a study of how different classification systems can be used to automatically identify the cognitive style of a user using the set of interactions with a digital library. These classification systems can be used to automatically personalize, from a cognitive-style point of view, the interaction of the digital library and each of its users.
  2. Chen, Z.; Huang, Y.; Tian, J.; Liu, X.; Fu, K.; Huang, T.: Joint model for subsentence-level sentiment analysis with Markov logic (2015) 0.03
    0.028515965 = product of:
      0.09980587 = sum of:
        0.074823216 = weight(_text_:interactions in 2210) [ClassicSimilarity], result of:
          0.074823216 = score(doc=2210,freq=2.0), product of:
            0.22965278 = queryWeight, product of:
              5.8977947 = idf(docFreq=329, maxDocs=44218)
              0.038938753 = queryNorm
            0.3258102 = fieldWeight in 2210, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.8977947 = idf(docFreq=329, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2210)
        0.024982655 = weight(_text_:with in 2210) [ClassicSimilarity], result of:
          0.024982655 = score(doc=2210,freq=8.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.2662446 = fieldWeight in 2210, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2210)
      0.2857143 = coord(2/7)
    
    Abstract
    Sentiment analysis mainly focuses on the study of one's opinions that express positive or negative sentiments. With the explosive growth of web documents, sentiment analysis is becoming a hot topic in both academic research and system design. Fine-grained sentiment analysis is traditionally solved as a 2-step strategy, which results in cascade errors. Although joint models, such as joint sentiment/topic and maximum entropy (MaxEnt)/latent Dirichlet allocation, are proposed to tackle this problem of sentiment analysis, they focus on the joint learning of both aspects and sentiments. Thus, they are not appropriate to solve the cascade errors for sentiment analysis at the sentence or subsentence level. In this article, we present a novel jointly fine-grained sentiment analysis framework at the subsentence level with Markov logic. First, we divide the task into 2 separate stages (subjectivity classification and polarity classification). Then, the 2 separate stages are processed, respectively, with different feature sets, which are implemented by local formulas in Markov logic. Finally, global formulas in Markov logic are adopted to realize the interactions of the 2 separate stages. The joint inference of subjectivity and polarity helps prevent cascade errors. Experiments on a Chinese sentiment data set manifest that our joint model brings significant improvements.
  3. Chen, M.; Liu, X.; Qin, J.: Semantic relation extraction from socially-generated tags : a methodology for metadata generation (2008) 0.01
    0.009949936 = product of:
      0.034824774 = sum of:
        0.021635616 = weight(_text_:with in 2648) [ClassicSimilarity], result of:
          0.021635616 = score(doc=2648,freq=6.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.2305746 = fieldWeight in 2648, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2648)
        0.013189158 = product of:
          0.026378317 = sum of:
            0.026378317 = weight(_text_:22 in 2648) [ClassicSimilarity], result of:
              0.026378317 = score(doc=2648,freq=2.0), product of:
                0.13635688 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038938753 = queryNorm
                0.19345059 = fieldWeight in 2648, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2648)
          0.5 = coord(1/2)
      0.2857143 = coord(2/7)
    
    Abstract
    The growing predominance of social semantics in the form of tagging presents the metadata community with both opportunities and challenges as for leveraging this new form of information content representation and for retrieval. One key challenge is the absence of contextual information associated with these tags. This paper presents an experiment working with Flickr tags as an example of utilizing social semantics sources for enriching subject metadata. The procedure included four steps: 1) Collecting a sample of Flickr tags, 2) Calculating cooccurrences between tags through mutual information, 3) Tracing contextual information of tag pairs via Google search results, 4) Applying natural language processing and machine learning techniques to extract semantic relations between tags. The experiment helped us to build a context sentence collection from the Google search results, which was then processed by natural language processing and machine learning algorithms. This new approach achieved a reasonably good rate of accuracy in assigning semantic relations to tag pairs. This paper also explores the implications of this approach for using social semantics to enrich subject metadata.
    Source
    Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas
  4. Liu, X.; Zhang, J.; Guo, C.: Full-text citation analysis : a new method to enhance scholarly networks (2013) 0.00
    0.0043710545 = product of:
      0.03059738 = sum of:
        0.03059738 = weight(_text_:with in 1044) [ClassicSimilarity], result of:
          0.03059738 = score(doc=1044,freq=12.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.3260817 = fieldWeight in 1044, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1044)
      0.14285715 = coord(1/7)
    
    Abstract
    In this article, we use innovative full-text citation analysis along with supervised topic modeling and network-analysis algorithms to enhance classical bibliometric analysis and publication/author/venue ranking. By utilizing citation contexts extracted from a large number of full-text publications, each citation or publication is represented by a probability distribution over a set of predefined topics, where each topic is labeled by an author-contributed keyword. We then used publication/citation topic distribution to generate a citation graph with vertex prior and edge transitioning probability distributions. The publication importance score for each given topic is calculated by PageRank with edge and vertex prior distributions. To evaluate this work, we sampled 104 topics (labeled with keywords) in review papers. The cited publications of each review paper are assumed to be "important publications" for the target topic (keyword), and we use these cited publications to validate our topic-ranking result and to compare different publication-ranking lists. Evaluation results show that full-text citation and publication content prior topic distribution, along with the classical PageRank algorithm can significantly enhance bibliometric analysis and scientific publication ranking performance, comparing with term frequency-inverted document frequency (tf-idf), language model, BM25, PageRank, and PageRank + language model (p < .001), for academic information retrieval (IR) systems.
  5. Liu, X.; Bu, Y.; Li, M.; Li, J.: Monodisciplinary collaboration disrupts science more than multidisciplinary collaboration (2024) 0.00
    0.0037089628 = product of:
      0.025962738 = sum of:
        0.025962738 = weight(_text_:with in 1202) [ClassicSimilarity], result of:
          0.025962738 = score(doc=1202,freq=6.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.2766895 = fieldWeight in 1202, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.046875 = fieldNorm(doc=1202)
      0.14285715 = coord(1/7)
    
    Abstract
    Collaboration across disciplines is a critical form of scientific collaboration to solve complex problems and make innovative contributions. This study focuses on the association between multidisciplinary collaboration measured by coauthorship in publications and the disruption of publications measured by the Disruption (D) index. We used authors' affiliations as a proxy of the disciplines to which they belong and categorized an article into multidisciplinary collaboration or monodisciplinary collaboration. The D index quantifies the extent to which a study disrupts its predecessors. We selected 13 journals that publish articles in six disciplines from the Microsoft Academic Graph (MAG) database and then constructed regression models with fixed effects and estimated the relationship between the variables. The findings show that articles with monodisciplinary collaboration are more disruptive than those with multidisciplinary collaboration. Furthermore, we uncovered the mechanism of how monodisciplinary collaboration disrupts science more than multidisciplinary collaboration by exploring the references of the sampled publications.
  6. Yang, Y.; Liu, X.: ¬A re-examination of text categorization methods (1999) 0.00
    0.0035330812 = product of:
      0.024731567 = sum of:
        0.024731567 = weight(_text_:with in 3386) [ClassicSimilarity], result of:
          0.024731567 = score(doc=3386,freq=4.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.2635687 = fieldWeight in 3386, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3386)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper reports a controlled study with statistical significance tests an five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classifier. We focus an the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten, and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).
  7. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00
    0.0030283553 = product of:
      0.021198487 = sum of:
        0.021198487 = weight(_text_:with in 3464) [ClassicSimilarity], result of:
          0.021198487 = score(doc=3464,freq=4.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.22591603 = fieldWeight in 3464, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.14285715 = coord(1/7)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
  8. Liu, X.; Chen, X.: Authors' noninstitutional emails and their correlation with retraction (2021) 0.00
    0.0028551605 = product of:
      0.019986123 = sum of:
        0.019986123 = weight(_text_:with in 152) [ClassicSimilarity], result of:
          0.019986123 = score(doc=152,freq=2.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.21299566 = fieldWeight in 152, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0625 = fieldNorm(doc=152)
      0.14285715 = coord(1/7)
    
  9. Liu, X.; Croft, W.B.: Statistical language modeling for information retrieval (2004) 0.00
    0.0025236295 = product of:
      0.017665405 = sum of:
        0.017665405 = weight(_text_:with in 4277) [ClassicSimilarity], result of:
          0.017665405 = score(doc=4277,freq=4.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.18826336 = fieldWeight in 4277, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4277)
      0.14285715 = coord(1/7)
    
    Abstract
    This chapter reviews research and applications in statistical language modeling for information retrieval (IR), which has emerged within the past several years as a new probabilistic framework for describing information retrieval processes. Generally speaking, statistical language modeling, or more simply language modeling (LM), involves estimating a probability distribution that captures statistical regularities of natural language use. Applied to information retrieval, language modeling refers to the problem of estimating the likelihood that a query and a document could have been generated by the same language model, given the language model of the document either with or without a language model of the query. The roots of statistical language modeling date to the beginning of the twentieth century when Markov tried to model letter sequences in works of Russian literature (Manning & Schütze, 1999). Zipf (1929, 1932, 1949, 1965) studied the statistical properties of text and discovered that the frequency of works decays as a Power function of each works rank. However, it was Shannon's (1951) work that inspired later research in this area. In 1951, eager to explore the applications of his newly founded information theory to human language, Shannon used a prediction game involving n-grams to investigate the information content of English text. He evaluated n-gram models' performance by comparing their crossentropy an texts with the true entropy estimated using predictions made by human subjects. For many years, statistical language models have been used primarily for automatic speech recognition. Since 1980, when the first significant language model was proposed (Rosenfeld, 2000), statistical language modeling has become a fundamental component of speech recognition, machine translation, and spelling correction.
  10. Clewley, N.; Chen, S.Y.; Liu, X.: Cognitive styles and search engine preferences : field dependence/independence vs holism/serialism (2010) 0.00
    0.0025236295 = product of:
      0.017665405 = sum of:
        0.017665405 = weight(_text_:with in 3961) [ClassicSimilarity], result of:
          0.017665405 = score(doc=3961,freq=4.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.18826336 = fieldWeight in 3961, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3961)
      0.14285715 = coord(1/7)
    
    Abstract
    Purpose - Cognitive style has been identified to be significantly influential in deciding users' preferences of search engines. In particular, Witkin's field dependence/independence has been widely studied in the area of web searching. It has been suggested that this cognitive style has conceptual links with the holism/serialism. This study aims to investigate the differences between the field dependence/independence and holism/serialism. Design/methodology/approach - An empirical study was conducted with 120 students from a UK university. Riding's cognitive style analysis (CSA) and Ford's study preference questionnaire (SPQ) were used to identify the students' cognitive styles. A questionnaire was designed to identify users' preferences for the design of search engines. Data mining techniques were applied to analyse the data obtained from the empirical study. Findings - The results highlight three findings. First, a fundamental link is confirmed between the two cognitive styles. Second, the relationship between field dependent users and holists is suggested to be more prominent than that of field independent users and serialists. Third, the interface design preferences of field dependent and field independent users can be split more clearly than those of holists and serialists. Originality/value - The contributions of this study include a deeper understanding of the similarities and differences between field dependence/independence and holists/serialists as well as proposing a novel methodology for data analyses.
  11. Liu, X.; Hu, M.; Xiao, B.S.; Shao, J.: Is my doctor around me? : Investigating the impact of doctors' presence on patients' review behaviors on an online health platform (2022) 0.00
    0.0025236295 = product of:
      0.017665405 = sum of:
        0.017665405 = weight(_text_:with in 650) [ClassicSimilarity], result of:
          0.017665405 = score(doc=650,freq=4.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.18826336 = fieldWeight in 650, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=650)
      0.14285715 = coord(1/7)
    
    Abstract
    Patient-generated online reviews are well-established as an important source of information for people to evaluate doctors' quality and improve health outcomes. However, how such reviews are generated in the first place is not well examined. This study examines a hitherto unexplored social driver of online review generation-doctors' presence on online health platforms, which results in the reviewers (i.e., patients) and the reviewees (i.e., doctors) coexisting in the same medium. Drawing on the Stimulus-Organism-Response theory as an overarching framework, we advance hypotheses about the impact of doctors' presence on their patients' review behaviors, including review volume, review effort, and emotional expression. To achieve causal identification, we conduct a quasi-experiment on a large online health platform and employ propensity score matching and difference-in-difference estimation. Our findings show that doctors' presence increases their patients' review volume. Furthermore, doctors' presence motivates their patients to exert greater effort and express more positive emotions in the review text. The results also show that the presence of doctors with higher professional titles has a stronger effect on review volume than the presence of doctors with lower professional titles. Our findings offer important implications both for research and practice.
  12. Cui, Y.; Wang, Y.; Liu, X.; Wang, X.; Zhang, X.: Multidimensional scholarly citations : characterizing and understanding scholars' citation behaviors (2023) 0.00
    0.0025236295 = product of:
      0.017665405 = sum of:
        0.017665405 = weight(_text_:with in 847) [ClassicSimilarity], result of:
          0.017665405 = score(doc=847,freq=4.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.18826336 = fieldWeight in 847, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=847)
      0.14285715 = coord(1/7)
    
    Abstract
    This study investigates scholars' citation behaviors from a fine-grained perspective. Specifically, each scholarly citation is considered multidimensional rather than logically unidimensional (i.e., present or absent). Thirty million articles from PubMed were accessed for use in empirical research, in which a total of 15 interpretable features of scholarly citations were constructed and grouped into three main categories. Each category corresponds to one aspect of the reasons and motivations behind scholars' citation decision-making during academic writing. Using about 500,000 pairs of actual and randomly generated scholarly citations, a series of Random Forest-based classification experiments were conducted to quantitatively evaluate the correlation between each constructed citation feature and citation decisions made by scholars. Our experimental results indicate that citation proximity is the category most relevant to scholars' citation decision-making, followed by citation authority and citation inertia. However, big-name scholars whose h-indexes rank among the top 1% exhibit a unique pattern of citation behaviors-their citation decision-making correlates most closely with citation inertia, with the correlation nearly three times as strong as that of their ordinary counterparts. Hopefully, the empirical findings presented in this paper can bring us closer to characterizing and understanding the complex process of generating scholarly citations in academia.
  13. Kwasnik, B.H.; Liu, X.: Classification structures in the changing environment of active commercial websites : the case of eBay.com (2000) 0.00
    0.0021413704 = product of:
      0.014989593 = sum of:
        0.014989593 = weight(_text_:with in 122) [ClassicSimilarity], result of:
          0.014989593 = score(doc=122,freq=2.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.15974675 = fieldWeight in 122, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.046875 = fieldNorm(doc=122)
      0.14285715 = coord(1/7)
    
    Abstract
    This paper reports on a portion of a larger ongoing project. We address the issues of information organization and retrieval in large, active commercial websites. More specifically, we address the use of classification for providing access to the contents of such sites. We approach this analysis by describing the functionality and structure of the classification scheme of one such representative, large, active, commercial websites: eBay.com, a web-based auction site for millions of users and items. We compare eBay's classification scheme with the Art & Architecture Thesaurus, which is a tool for describing and providing access to material culture.
  14. Liu, X.; Guo, C.; Zhang, L.: Scholar metadata and knowledge generation with human and artificial intelligence (2014) 0.00
    0.0021413704 = product of:
      0.014989593 = sum of:
        0.014989593 = weight(_text_:with in 1287) [ClassicSimilarity], result of:
          0.014989593 = score(doc=1287,freq=2.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.15974675 = fieldWeight in 1287, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.046875 = fieldNorm(doc=1287)
      0.14285715 = coord(1/7)
    
  15. Liu, X.; Qin, J.: ¬An interactive metadata model for structural, descriptive, and referential representation of scholarly output (2014) 0.00
    0.0017844755 = product of:
      0.012491328 = sum of:
        0.012491328 = weight(_text_:with in 1253) [ClassicSimilarity], result of:
          0.012491328 = score(doc=1253,freq=2.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.1331223 = fieldWeight in 1253, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1253)
      0.14285715 = coord(1/7)
    
    Abstract
    The scientific metadata model proposed in this article encompasses both classical descriptive metadata such as those defined in the Dublin Core Metadata Element Set (DC) and the innovative structural and referential metadata properties that go beyond the classical model. Structural metadata capture the structural vocabulary in research publications; referential metadata include not only citations but also data about other types of scholarly output that is based on or related to the same publication. The article describes the structural, descriptive, and referential (SDR) elements of the metadata model and explains the underlying assumptions and justifications for each major component in the model. ScholarWiki, an experimental system developed as a proof of concept, was built over the wiki platform to allow user interaction with the metadata and the editing, deleting, and adding of metadata. By allowing and encouraging scholars (both as authors and as users) to participate in the knowledge and metadata editing and enhancing process, the larger community will benefit from more accurate and effective information retrieval. The ScholarWiki system utilizes machine-learning techniques that can automatically produce self-enhanced metadata by learning from the structural metadata that scholars contribute, which will add intelligence to enhance and update automatically the publication of metadata Wiki pages.
  16. Liu, X.; Zheng, W.; Fang, H.: ¬An exploration of ranking models and feedback method for related entity finding (2013) 0.00
    0.0017844755 = product of:
      0.012491328 = sum of:
        0.012491328 = weight(_text_:with in 2714) [ClassicSimilarity], result of:
          0.012491328 = score(doc=2714,freq=2.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.1331223 = fieldWeight in 2714, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2714)
      0.14285715 = coord(1/7)
    
    Abstract
    Most existing search engines focus on document retrieval. However, information needs are certainly not limited to finding relevant documents. Instead, a user may want to find relevant entities such as persons and organizations. In this paper, we study the problem of related entity finding. Our goal is to rank entities based on their relevance to a structured query, which specifies an input entity, the type of related entities and the relation between the input and related entities. We first discuss a general probabilistic framework, derive six possible retrieval models to rank the related entities, and then compare these models both analytically and empirically. To further improve performance, we study the problem of feedback in the context of related entity finding. Specifically, we propose a mixture model based feedback method that can utilize the pseudo feedback entities to estimate an enriched model for the relation between the input and related entities. Experimental results over two standard TREC collections show that the derived relation generation model combined with a relation feedback method performs better than other models.
  17. Jiang, Z.; Liu, X.; Chen, Y.: Recovering uncaptured citations in a scholarly network : a two-step citation analysis to estimate publication importance (2016) 0.00
    0.0017844755 = product of:
      0.012491328 = sum of:
        0.012491328 = weight(_text_:with in 3018) [ClassicSimilarity], result of:
          0.012491328 = score(doc=3018,freq=2.0), product of:
            0.09383348 = queryWeight, product of:
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.038938753 = queryNorm
            0.1331223 = fieldWeight in 3018, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.409771 = idf(docFreq=10797, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3018)
      0.14285715 = coord(1/7)
    
    Abstract
    The citation relationships between publications, which are significant for assessing the importance of scholarly components within a network, have been used for various scientific applications. Missing citation metadata in scholarly databases, however, create problems for classical citation-based ranking algorithms and challenge the performance of citation-based retrieval systems. In this research, we utilize a two-step citation analysis method to investigate the importance of publications for which citation information is partially missing. First, we calculate the importance of the author and then use his importance to estimate the publication importance for some selected articles. To evaluate this method, we designed a simulation experiment-"random citation-missing"-to test the two-step citation analysis that we carried out with the Association for Computing Machinery (ACM) Digital Library (DL). In this experiment, we simulated different scenarios in a large-scale scientific digital library, from high-quality citation data, to very poor quality data, The results show that a two-step citation analysis can effectively uncover the importance of publications in different situations. More importantly, we found that the optimized impact from the importance of an author (first step) is exponentially increased when the quality of citation decreases. The findings from this study can further enhance citation-based publication-ranking algorithms for real-world applications.