Search (10 results, page 1 of 1)

  • × author_ss:"Li, Y."
  1. Crespo, J.A.; Herranz, N.; Li, Y.; Ruiz-Castillo, J.: ¬The effect on citation inequality of differences in citation practices at the web of science subject category level (2014) 0.03
    0.02950487 = product of:
      0.05900974 = sum of:
        0.03657866 = weight(_text_:data in 1291) [ClassicSimilarity], result of:
          0.03657866 = score(doc=1291,freq=4.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.24703519 = fieldWeight in 1291, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1291)
        0.022431081 = product of:
          0.044862162 = sum of:
            0.044862162 = weight(_text_:22 in 1291) [ClassicSimilarity], result of:
              0.044862162 = score(doc=1291,freq=4.0), product of:
                0.16398162 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046827413 = queryNorm
                0.27358043 = fieldWeight in 1291, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1291)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    This article studies the impact of differences in citation practices at the subfield, or Web of Science subject category level, using the model introduced in Crespo, Li, and Ruiz-Castillo (2013a), according to which the number of citations received by an article depends on its underlying scientific influence and the field to which it belongs. We use the same Thomson Reuters data set of about 4.4 million articles used in Crespo et al. (2013a) to analyze 22 broad fields. The main results are the following: First, when the classification system goes from 22 fields to 219 subfields the effect on citation inequality of differences in citation practices increases from ?14% at the field level to 18% at the subfield level. Second, we estimate a set of exchange rates (ERs) over a wide [660, 978] citation quantile interval to express the citation counts of articles into the equivalent counts in the all-sciences case. In the fractional case, for example, we find that in 187 of 219 subfields the ERs are reliable in the sense that the coefficient of variation is smaller than or equal to 0.10. Third, in the fractional case the normalization of the raw data using the ERs (or subfield mean citations) as normalization factors reduces the importance of the differences in citation practices from 18% to 3.8% (3.4%) of overall citation inequality. Fourth, the results in the fractional case are essentially replicated when we adopt a multiplicative approach.
  2. Song, J.; Huang, Y.; Qi, X.; Li, Y.; Li, F.; Fu, K.; Huang, T.: Discovering hierarchical topic evolution in time-stamped documents (2016) 0.02
    0.015519011 = product of:
      0.062076043 = sum of:
        0.062076043 = weight(_text_:data in 2853) [ClassicSimilarity], result of:
          0.062076043 = score(doc=2853,freq=8.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.4192326 = fieldWeight in 2853, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=2853)
      0.25 = coord(1/4)
    
    Abstract
    The objective of this paper is to propose a hierarchical topic evolution model (HTEM) that can organize time-varying topics in a hierarchy and discover their evolutions with multiple timescales. In the proposed HTEM, topics near the root of the hierarchy are more abstract and also evolve in the longer timescales than those near the leaves. To achieve this goal, the distance-dependent Chinese restaurant process (ddCRP) is extended to a new nested process that is able to simultaneously model the dependencies among data and the relationship between clusters. The HTEM is proposed based on the new process for time-stamped documents, in which the timestamp is utilized to measure the dependencies among documents. Moreover, an efficient Gibbs sampler is developed for the proposed HTEM. Our experimental results on two popular real-world data sets verify that the proposed HTEM can capture coherent topics and discover their hierarchical evolutions. It also outperforms the baseline model in terms of likelihood on held-out data.
    Theme
    Data Mining
  3. Arora, S.K.; Li, Y.; Youtie, J.; Shapira, P.: Using the wayback machine to mine websites in the social sciences : a methodological resource (2016) 0.01
    0.014458986 = product of:
      0.057835944 = sum of:
        0.057835944 = weight(_text_:data in 3050) [ClassicSimilarity], result of:
          0.057835944 = score(doc=3050,freq=10.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.39059696 = fieldWeight in 3050, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3050)
      0.25 = coord(1/4)
    
    Abstract
    Websites offer an unobtrusive data source for developing and analyzing information about various types of social science phenomena. In this paper, we provide a methodological resource for social scientists looking to expand their toolkit using unstructured web-based text, and in particular, with the Wayback Machine, to access historical website data. After providing a literature review of existing research that uses the Wayback Machine, we put forward a step-by-step description of how the analyst can design a research project using archived websites. We draw on the example of a project that analyzes indicators of innovation activities and strategies in 300 U.S. small- and medium-sized enterprises in green goods industries. We present six steps to access historical Wayback website data: (a) sampling, (b) organizing and defining the boundaries of the web crawl, (c) crawling, (d) website variable operationalization, (e) integration with other data sources, and (f) analysis. Although our examples draw on specific types of firms in green goods industries, the method can be generalized to other areas of research. In discussing the limitations and benefits of using the Wayback Machine, we note that both machine and human effort are essential to developing a high-quality data set from archived web information.
  4. Li, Y.; Shawe-Taylor, J.: Advanced learning algorithms for cross-language patent retrieval and classification (2007) 0.01
    0.008992782 = product of:
      0.035971127 = sum of:
        0.035971127 = product of:
          0.071942255 = sum of:
            0.071942255 = weight(_text_:processing in 931) [ClassicSimilarity], result of:
              0.071942255 = score(doc=931,freq=4.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3795138 = fieldWeight in 931, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=931)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Footnote
    Beitrag innerhalb eines Themenschwerpunkt "special issue on patent processing"
    Source
    Information processing and management. 43(2007) no.5, S.1183-1199
  5. Cao, Q.; Lu, Y.; Dong, D.; Tang, Z.; Li, Y.: ¬The roles of bridging and bonding in social media communities (2013) 0.01
    0.0077595054 = product of:
      0.031038022 = sum of:
        0.031038022 = weight(_text_:data in 1009) [ClassicSimilarity], result of:
          0.031038022 = score(doc=1009,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.2096163 = fieldWeight in 1009, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1009)
      0.25 = coord(1/4)
    
    Abstract
    Social media communities have emerged recently as open and free communication platforms to support real-time information sharing among members. Drawing on social capital theories, we develop a theoretical model to investigate how the two types of social capital (bonding and bridging) contribute to the individual and collective well-being of virtual communities through information exchange. Research hypotheses were tested through survey instruments and computer archive data of 475 members of a large social network site during the Wenchuan earthquake (2008) in China. We find that bonding has a positive and significant impact on bridging. Both bonding and bridging have positive and significant impacts on information quality, but not on information quantity. Results also suggest that information quality is more critical to individuals and collective well-being than information quantity after a disaster.
  6. Xiao, D.; Ji, Y.; Li, Y.; Zhuang, F.; Shi, C.: Coupled matrix factorization and topic modeling for aspect mining (2018) 0.01
    0.0074939844 = product of:
      0.029975938 = sum of:
        0.029975938 = product of:
          0.059951875 = sum of:
            0.059951875 = weight(_text_:processing in 5042) [ClassicSimilarity], result of:
              0.059951875 = score(doc=5042,freq=4.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.3162615 = fieldWeight in 5042, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5042)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Aspect mining, which aims to extract ad hoc aspects from online reviews and predict rating or opinion on each aspect, can satisfy the personalized needs for evaluation of specific aspect on product quality. Recently, with the increase of related research, how to effectively integrate rating and review information has become the key issue for addressing this problem. Considering that matrix factorization is an effective tool for rating prediction and topic modeling is widely used for review processing, it is a natural idea to combine matrix factorization and topic modeling for aspect mining (or called aspect rating prediction). However, this idea faces several challenges on how to address suitable sharing factors, scale mismatch, and dependency relation of rating and review information. In this paper, we propose a novel model to effectively integrate Matrix factorization and Topic modeling for Aspect rating prediction (MaToAsp). To overcome the above challenges and ensure the performance, MaToAsp employs items as the sharing factors to combine matrix factorization and topic modeling, and introduces an interpretive preference probability to eliminate scale mismatch. In the hybrid model, we establish a dependency relation from ratings to sentiment terms in phrases. The experiments on two real datasets including Chinese Dianping and English Tripadvisor prove that MaToAsp not only obtains reasonable aspect identification but also achieves the best aspect rating prediction performance, compared to recent representative baselines.
    Source
    Information processing and management. 54(2018) no.6, S.861-873
  7. Zhang, Y.; Li, Y.: ¬A user-centered functional metadata evaluation of moving image collections (2008) 0.01
    0.006466255 = product of:
      0.02586502 = sum of:
        0.02586502 = weight(_text_:data in 1884) [ClassicSimilarity], result of:
          0.02586502 = score(doc=1884,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 1884, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1884)
      0.25 = coord(1/4)
    
    Abstract
    In this article, the authors report a series of evaluations of two metadata schemes developed for Moving Image Collections (MIC), an integrated online catalog of moving images. Through two online surveys and one experiment spanning various stages of metadata implementation, the MIC evaluation team explored a user-centered approach in which the four generic user tasks suggested by IFLA FRBR (International Association of Library Associations Functional Requirement for Bibliographic Records) were embedded in data collection and analyses. Diverse groups of users rated usefulness of individual metadata fields for finding, identifying, selecting, and obtaining moving images. The results demonstrate a consistency across these evaluations with respect to (a) identification of a set of useful metadata fields highly rated by target users for each of the FRBR generic tasks, and (b) indication of a significant interaction between MIC metadata fields and the FRBR generic tasks. The findings provide timely feedback for the MIC implementation specifically, and valuable suggestions to other similar metadata application settings in general. They also suggest the feasibility of using the four IFLA FRBR generic tasks as a framework for user-centered functional metadata evaluations.
  8. Zhang, X.; Li, Y.; Liu, J.; Zhang, Y.: Effects of interaction design in digital libraries on user interactions (2008) 0.01
    0.006466255 = product of:
      0.02586502 = sum of:
        0.02586502 = weight(_text_:data in 1898) [ClassicSimilarity], result of:
          0.02586502 = score(doc=1898,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 1898, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1898)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - This study aims to investigate the effects of different search and browse features in digital libraries (DLs) on task interactions, and what features would lead to poor user experience. Design/methodology/approach - Three operational DLs: ACM, IEEE CS, and IEEE Xplore are used in this study. These three DLs present different features in their search and browsing designs. Two information-seeking tasks are constructed: one search task and one browsing task. An experiment was conducted in a usability laboratory. Data from 35 participants are collected on a set of measures for user interactions. Findings - The results demonstrate significant differences in many aspects of the user interactions between the three DLs. For both search and browse designs, the features that lead to poor user interactions are identified. Research limitations/implications - User interactions are affected by specific design features in DLs. Some of the design features may lead to poor user performance and should be improved. The study was limited mainly in the variety and the number of tasks used. Originality/value - The study provided empirical evidence to the effects of interaction design features in DLs on user interactions and performance. The results contribute to our knowledge about DL designs in general and about the three operational DLs in particular.
  9. Li, Y.; Xu, S.; Luo, X.; Lin, S.: ¬A new algorithm for product image search based on salient edge characterization (2014) 0.01
    0.006466255 = product of:
      0.02586502 = sum of:
        0.02586502 = weight(_text_:data in 1552) [ClassicSimilarity], result of:
          0.02586502 = score(doc=1552,freq=2.0), product of:
            0.14807065 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046827413 = queryNorm
            0.17468026 = fieldWeight in 1552, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1552)
      0.25 = coord(1/4)
    
    Abstract
    Visually assisted product image search has gained increasing popularity because of its capability to greatly improve end users' e-commerce shopping experiences. Different from general-purpose content-based image retrieval (CBIR) applications, the specific goal of product image search is to retrieve and rank relevant products from a large-scale product database to visually assist a user's online shopping experience. In this paper, we explore the problem of product image search through salient edge characterization and analysis, for which we propose a novel image search method coupled with an interactive user region-of-interest indication function. Given a product image, the proposed approach first extracts an edge map, based on which contour curves are further extracted. We then segment the extracted contours into fragments according to the detected contour corners. After that, a set of salient edge elements is extracted from each product image. Based on salient edge elements matching and similarity evaluation, the method derives a new pairwise image similarity estimate. Using the new image similarity, we can then retrieve product images. To evaluate the performance of our algorithm, we conducted 120 sessions of querying experiments on a data set comprised of around 13k product images collected from multiple, real-world e-commerce websites. We compared the performance of the proposed method with that of a bag-of-words method (Philbin, Chum, Isard, Sivic, & Zisserman, 2008) and a Pyramid Histogram of Orientated Gradients (PHOG) method (Bosch, Zisserman, & Munoz, 2007). Experimental results demonstrate that the proposed method improves the performance of example-based product image retrieval.
  10. Li, Y.; Belkin, N.J.: ¬A faceted approach to conceptualizing tasks in information seeking (2008) 0.01
    0.005299047 = product of:
      0.021196188 = sum of:
        0.021196188 = product of:
          0.042392377 = sum of:
            0.042392377 = weight(_text_:processing in 2442) [ClassicSimilarity], result of:
              0.042392377 = score(doc=2442,freq=2.0), product of:
                0.18956426 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046827413 = queryNorm
                0.22363065 = fieldWeight in 2442, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2442)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information processing and management. 44(2008) no.6, S.1822-1837