Search (8 results, page 1 of 1)

  • × author_ss:"Wang, S."
  1. Isaac, A.; Wang, S.; Zinn, C.; Matthezing, H.; Meij, L. van der; Schlobach, S.: Evaluating thesaurus alignments for semantic interoperability in the library domain (2009) 0.01
    0.01152081 = product of:
      0.02304162 = sum of:
        0.02304162 = product of:
          0.04608324 = sum of:
            0.04608324 = weight(_text_:data in 1650) [ClassicSimilarity], result of:
              0.04608324 = score(doc=1650,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.2794884 = fieldWeight in 1650, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1650)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Thesaurus alignments play an important role in realizing efficient access to heterogeneous cultural-heritage data. Current technology, however, provides only limited value for such access because it fails to bridge the gap between theoretical study and practical application requirements. This article explores common real-world library problems and identifies solutions that focus on the application-embedded study, development, and evaluation of matching technology.
  2. Zhang, L.; Wang, S.; Liu, B.: Deep learning for sentiment analysis : a survey (2018) 0.01
    0.01152081 = product of:
      0.02304162 = sum of:
        0.02304162 = product of:
          0.04608324 = sum of:
            0.04608324 = weight(_text_:data in 4092) [ClassicSimilarity], result of:
              0.04608324 = score(doc=4092,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.2794884 = fieldWeight in 4092, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4092)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.
  3. Cui, C.; Ma, J.; Lian, T.; Chen, Z.; Wang, S.: Improving image annotation via ranking-oriented neighbor search and learning-based keyword propagation (2015) 0.01
    0.010183054 = product of:
      0.020366108 = sum of:
        0.020366108 = product of:
          0.040732216 = sum of:
            0.040732216 = weight(_text_:data in 1609) [ClassicSimilarity], result of:
              0.040732216 = score(doc=1609,freq=4.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.24703519 = fieldWeight in 1609, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1609)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic image annotation plays a critical role in modern keyword-based image retrieval systems. For this task, the nearest-neighbor-based scheme works in two phases: first, it finds the most similar neighbors of a new image from the set of labeled images; then, it propagates the keywords associated with the neighbors to the new image. In this article, we propose a novel approach for image annotation, which simultaneously improves both phases of the nearest-neighbor-based scheme. In the phase of neighbor search, different from existing work discovering the nearest neighbors with the predicted distance, we introduce a ranking-oriented neighbor search mechanism (RNSM), where the ordering of labeled images is optimized directly without going through the intermediate step of distance prediction. In the phase of keyword propagation, different from existing work using simple heuristic rules to select the propagated keywords, we present a learning-based keyword propagation strategy (LKPS), where a scoring function is learned to evaluate the relevance of keywords based on their multiple relations with the nearest neighbors. Extensive experiments on the Corel 5K data set and the MIR Flickr data set demonstrate the effectiveness of our approach.
  4. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.01
    0.010183054 = product of:
      0.020366108 = sum of:
        0.020366108 = product of:
          0.040732216 = sum of:
            0.040732216 = weight(_text_:data in 5400) [ClassicSimilarity], result of:
              0.040732216 = score(doc=5400,freq=4.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.24703519 = fieldWeight in 5400, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5400)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
  5. Wang, S.; Ma, Y.; Mao, J.; Bai, Y.; Liang, Z.; Li, G.: Quantifying scientific breakthroughs by a novel disruption indicator based on knowledge entities : On the rise of scrape-and-report scholarship in online reviews research (2023) 0.01
    0.0088311145 = product of:
      0.017662229 = sum of:
        0.017662229 = product of:
          0.035324458 = sum of:
            0.035324458 = weight(_text_:22 in 882) [ClassicSimilarity], result of:
              0.035324458 = score(doc=882,freq=2.0), product of:
                0.18260197 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.052144732 = queryNorm
                0.19345059 = fieldWeight in 882, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=882)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 1.2023 18:37:33
  6. Ren, P.; Chen, Z.; Ma, J.; Zhang, Z.; Si, L.; Wang, S.: Detecting temporal patterns of user queries (2017) 0.01
    0.008640608 = product of:
      0.017281216 = sum of:
        0.017281216 = product of:
          0.03456243 = sum of:
            0.03456243 = weight(_text_:data in 3315) [ClassicSimilarity], result of:
              0.03456243 = score(doc=3315,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.2096163 = fieldWeight in 3315, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3315)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.
  7. Wang, S.; Koopman, R.: Second life for authority records (2015) 0.01
    0.008146443 = product of:
      0.016292887 = sum of:
        0.016292887 = product of:
          0.032585774 = sum of:
            0.032585774 = weight(_text_:data in 2303) [ClassicSimilarity], result of:
              0.032585774 = score(doc=2303,freq=4.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.19762816 = fieldWeight in 2303, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2303)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Authority control is a standard practice in the library community that provides consistent, unique, and unambiguous reference to entities such as persons, places, concepts, etc. The ideal way of referring to authority records through unique identifiers is in line with the current linked data principle. When presenting a bibliographic record, the linked authority records are expanded with the authoritative information. This way, any update in the authority records will not affect the indexing of the bibliographic records. The structural information in the authority files can also be leveraged to expand the user's query to retrieve bibliographic records associated with all the variations, narrower terms or related terms. However, in many digital libraries, especially largescale aggregations such as WorldCat and Europeana, name strings are often used instead of authority record identifiers. This is also partly due to the lack of global authority records that are valid across countries and cultural heritage domains. But even when there are global authority systems, they are not applied at scale. For example, in WorldCat, only 15% of the records have DDC and 3% have UDC codes; less than 40% of the records have one or more topical terms catalogued in the 650 MARC field, many of which are too general (such as "sports" or "literature") to be useful for retrieving bibliographic records. Therefore, when a user query is based on a Dewey code, the results usually have high precision but the recall is much lower than it should be; and, a search on a general topical term returns millions of hits without being even complete. All these practices make it difficult to leverage the key benefits of authority files. This is also true for authority files that have been transformed into linked data and enriched with mapping information. There are practical reasons for using name strings instead of identifiers. One is the indexing and query response. The future infrastructure design should take the performance into account while embracing the benefit of linking instead of copying, without introducing extra complexity to users. Notwithstanding all the restrictions, we argue that largescale aggregations also bring new opportunities for better exploiting the benefits of authority records. It is possible to use machine learning techniques to automatically link bibliographic records to authority records based on the manual input of cataloguers. Text mining and visualization techniques can offer a contextual view of authority records, which in return can be used to retrieve missing or mis-catalogued records. In this talk, we will describe such opportunities in more detail.
  8. Xie, I.; Babu, R.; Lee, H.S.; Wang, S.; Lee, T.H.: Orientation tactics and associated factors in the digital library environment : comparison between blind and sighted users (2021) 0.01
    0.007200507 = product of:
      0.014401014 = sum of:
        0.014401014 = product of:
          0.028802028 = sum of:
            0.028802028 = weight(_text_:data in 307) [ClassicSimilarity], result of:
              0.028802028 = score(doc=307,freq=2.0), product of:
                0.16488427 = queryWeight, product of:
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.052144732 = queryNorm
                0.17468026 = fieldWeight in 307, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1620505 = idf(docFreq=5088, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=307)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This is the first study that compares types of orientation tactics that blind and sighted users applied in their initial interactions with a digital library (DL) and the associated factors. Multiple methods were employed for data collection: questionnaires, think-aloud protocols, and transaction logs. The paper identifies seven types of orientation tactics applied by the two groups of users. While sighted users focused on skimming DL content, blind users concentrated on exploring DL structure. Moreover, the authors discovered 13 types of system, user, and interaction factors that led to the use of orientation tactics. More system factors than user factors affect blind users' tactics in browsing DL structures. The findings of this study support the social model that the sight-centered design of DLs, rather than blind users' disability, prohibits them from effectively interacting with a DL. Simultaneously, the results reveal the limitation of existing interactive information retrieval models that do not take people with disabilities into consideration. DL design implications are discussed based on the identified factors.