Search (8 results, page 1 of 1)

Lu, W.; Ding, H.; Jiang, J.: ¬A document expansion framework for tag-based image retrieval (2018) 0.02
```
0.02067415 = product of:
  0.0413483 = sum of:
    0.0413483 = sum of:
      0.010148063 = weight(_text_:a in 4630) [ClassicSimilarity], result of:
        0.010148063 = score(doc=4630,freq=18.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.19109234 = fieldWeight in 4630, product of:
            4.2426405 = tf(freq=18.0), with freq of:
              18.0 = termFreq=18.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4630)
      0.03120024 = weight(_text_:22 in 4630) [ClassicSimilarity], result of:
        0.03120024 = score(doc=4630,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.19345059 = fieldWeight in 4630, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4630)
  0.5 = coord(1/2)
```
Abstract

Purpose The purpose of this paper is to utilize document expansion techniques for improving image representation and retrieval. This paper proposes a concise framework for tag-based image retrieval (TBIR). Design/methodology/approach The proposed approach includes three core components: a strategy of selecting expansion (similar) images from the whole corpus (e.g. cluster-based or nearest neighbor-based); a technique for assessing image similarity, which is adopted for selecting expansion images (text, image, or mixed); and a model for matching the expanded image representation with the search query (merging or separate). Findings The results show that applying the proposed method yields significant improvements in effectiveness, and the method obtains better performance on the top of the rank and makes a great improvement on some topics with zero score in baseline. Moreover, nearest neighbor-based expansion strategy outperforms the cluster-based expansion strategy, and using image features for selecting expansion images is better than using text features in most cases, and the separate method for calculating the augmented probability P(q|RD) is able to erase the negative influences of error images in RD. Research limitations/implications Despite these methods only outperform on the top of the rank instead of the entire rank list, TBIR on mobile platforms still can benefit from this approach. Originality/value Unlike former studies addressing the sparsity, vocabulary mismatch, and tag relatedness in TBIR individually, the approach proposed by this paper addresses all these issues with a single document expansion framework. It is a comprehensive investigation of document expansion techniques in TBIR.

Date

20. 1.2015 18:30:22

Type

a
Jiang, Y.; Meng, R.; Huang, Y.; Lu, W.; Liu, J.: Generating keyphrases for readers : a controllable keyphrase generation framework (2023) 0.02
```
0.01938208 = product of:
  0.03876416 = sum of:
    0.03876416 = sum of:
      0.0075639198 = weight(_text_:a in 1012) [ClassicSimilarity], result of:
        0.0075639198 = score(doc=1012,freq=10.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.14243183 = fieldWeight in 1012, product of:
            3.1622777 = tf(freq=10.0), with freq of:
              10.0 = termFreq=10.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1012)
      0.03120024 = weight(_text_:22 in 1012) [ClassicSimilarity], result of:
        0.03120024 = score(doc=1012,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.19345059 = fieldWeight in 1012, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1012)
  0.5 = coord(1/2)
```
Abstract

With the wide application of keyphrases in many Information Retrieval (IR) and Natural Language Processing (NLP) tasks, automatic keyphrase prediction has been emerging. However, these statistically important phrases are contributing increasingly less to the related tasks because the end-to-end learning mechanism enables models to learn the important semantic information of the text directly. Similarly, keyphrases are of little help for readers to quickly grasp the paper's main idea because the relationship between the keyphrase and the paper is not explicit to readers. Therefore, we propose to generate keyphrases with specific functions for readers to bridge the semantic gap between them and the information producers, and verify the effectiveness of the keyphrase function for assisting users' comprehension with a user experiment. A controllable keyphrase generation framework (the CKPG) that uses the keyphrase function as a control code to generate categorized keyphrases is proposed and implemented based on Transformer, BART, and T5, respectively. For the Computer Science domain, the Macro-avgs of , , and on the Paper with Code dataset are up to 0.680, 0.535, and 0.558, respectively. Our experimental results indicate the effectiveness of the CKPG models.

Date

22. 6.2023 14:55:20

Type

a
Zhang, L.; Lu, W.; Yang, J.: LAGOS-AND : a large gold standard dataset for scholarly author name disambiguation (2023) 0.02
```
0.018529613 = product of:
  0.037059225 = sum of:
    0.037059225 = sum of:
      0.005858987 = weight(_text_:a in 883) [ClassicSimilarity], result of:
        0.005858987 = score(doc=883,freq=6.0), product of:
          0.053105544 = queryWeight, product of:
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.046056706 = queryNorm
          0.11032722 = fieldWeight in 883, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            1.153047 = idf(docFreq=37942, maxDocs=44218)
            0.0390625 = fieldNorm(doc=883)
      0.03120024 = weight(_text_:22 in 883) [ClassicSimilarity], result of:
        0.03120024 = score(doc=883,freq=2.0), product of:
          0.16128273 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046056706 = queryNorm
          0.19345059 = fieldWeight in 883, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=883)
  0.5 = coord(1/2)
```
Abstract

In this article, we present a method to automatically build large labeled datasets for the author ambiguity problem in the academic world by leveraging the authoritative academic resources, ORCID and DOI. Using the method, we built LAGOS-AND, two large, gold-standard sub-datasets for author name disambiguation (AND), of which LAGOS-AND-BLOCK is created for clustering-based AND research and LAGOS-AND-PAIRWISE is created for classification-based AND research. Our LAGOS-AND datasets are substantially different from the existing ones. The initial versions of the datasets (v1.0, released in February 2021) include 7.5 M citations authored by 798 K unique authors (LAGOS-AND-BLOCK) and close to 1 M instances (LAGOS-AND-PAIRWISE). And both datasets show close similarities to the whole Microsoft Academic Graph (MAG) across validations of six facets. In building the datasets, we reveal the variation degrees of last names in three literature databases, PubMed, MAG, and Semantic Scholar, by comparing author names hosted to the authors' official last names shown on the ORCID pages. Furthermore, we evaluate several baseline disambiguation methods as well as the MAG's author IDs system on our datasets, and the evaluation helps identify several interesting findings. We hope the datasets and findings will bring new insights for future studies. The code and datasets are publicly available.

Date

22. 1.2023 18:40:36

Type

a
Lu, W.; MacFarlane, A.; Venuti, F.: Okapi-based XML indexing (2009) 0.00
```
0.0025370158 = product of:
  0.0050740317 = sum of:
    0.0050740317 = product of:
      0.010148063 = sum of:
        0.010148063 = weight(_text_:a in 3629) [ClassicSimilarity], result of:
          0.010148063 = score(doc=3629,freq=18.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.19109234 = fieldWeight in 3629, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3629)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Purpose - Being an important data exchange and information storage standard, XML has generated a great deal of interest and particular attention has been paid to the issue of XML indexing. Clear use cases for structured search in XML have been established. However, most of the research in the area is either based on relational database systems or specialized semi-structured data management systems. This paper aims to propose a method for XML indexing based on the information retrieval (IR) system Okapi. Design/methodology/approach - First, the paper reviews the structure of inverted files and gives an overview of the issues of why this indexing mechanism cannot properly support XML retrieval, using the underlying data structures of Okapi as an example. Then the paper explores a revised method implemented on Okapi using path indexing structures. The paper evaluates these index structures through the metrics of indexing run time, path search run time and space costs using the INEX and Reuters RVC1 collections. Findings - Initial results on the INEX collections show that there is a substantial overhead in space costs for the method, but this increase does not affect run time adversely. Indexing results on differing sized Reuters RVC1 sub-collections show that the increase in space costs with increasing the size of a collection is significant, but in terms of run time the increase is linear. Path search results show sub-millisecond run times, demonstrating minimal overhead for XML search. Practical implications - Overall, the results show the method implemented to support XML search in a traditional IR system such as Okapi is viable. Originality/value - The paper provides useful information on a method for XML indexing based on the IR system Okapi.

Type

a
Huang, S.; Qian, J.; Huang, Y.; Lu, W.; Bu, Y.; Yang, J.; Cheng, Q.: Disclosing the relationship between citation structure and future impact of a publication (2022) 0.00
```
0.0022374375 = product of:
  0.004474875 = sum of:
    0.004474875 = product of:
      0.00894975 = sum of:
        0.00894975 = weight(_text_:a in 621) [ClassicSimilarity], result of:
          0.00894975 = score(doc=621,freq=14.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1685276 = fieldWeight in 621, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=621)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Each section header of an article has its distinct communicative function. Citations from distinct sections may be different regarding citing motivation. In this paper, we grouped section headers with similar functions as a structural function and defined the distribution of citations from structural functions for a paper as its citation structure. We aim to explore the relationship between citation structure and the future impact of a publication and disclose the relative importance among citations from different structural functions. Specifically, we proposed two citation counting methods and a citation life cycle identification method, by which the regression data were built. Subsequently, we employed a ridge regression model to predict the future impact of the paper and analyzed the relative weights of regressors. Based on documents collected from the Association for Computational Linguistics Anthology website, our empirical experiments disclosed that functional structure features improve the prediction accuracy of citation count prediction and that there exist differences among citations from different structural functions. Specifically, at the early stage of citation lifetime, citations from Introduction and Method are particularly important for perceiving future impact of papers, and citations from Result and Conclusion are also vital. However, early accumulation of citations from the Background seems less important.

Type

a
Jones, L.M.; Wright, K.D.; Jack, A.I.; Friedman, J.P.; Fresco, D.M.; Veinot, T.; Lu, W.; Moore, S.M.: ¬The relationships between health information behavior and neural processing in african americans with prehypertension : color or text (2019) 0.00
```
0.0018909799 = product of:
  0.0037819599 = sum of:
    0.0037819599 = product of:
      0.0075639198 = sum of:
        0.0075639198 = weight(_text_:a in 5361) [ClassicSimilarity], result of:
          0.0075639198 = score(doc=5361,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.14243183 = fieldWeight in 5361, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5361)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Information behavior may enhance hypertension self-management in African Americans. The goal of this substudy was to examine the relationships between measures of self-reported health information behavior and neural measures of health information processing in a sample of 19 prehypertensive African Americans (mean age = 52.5, 52.6% women). We measured (a) health information seeking, sharing, and use (surveys) and (b) neural activity using functional magnetic resonance imaging (fMRI) to assess response to health information videos. We hypothesized that differential activation (comparison of analytic vs. empathic brain activity when watching a specific type of video) would indicate better function in three, distinct cognitive domains: (a) Analytic Network, (b) Default Mode Network (DMN), and (c) ventromedial prefrontal cortex (vmPFC). Scores on the information sharing measure (but not seeking or use) were positively associated with differential activation in the vmPFC (rs = .53, p = .02) and the DMN (rs = .43, p = .06). Our findings correspond with previous work indicating that activation of the DMN and vmPFC is associated with sharing information to persuade others and with behavior change. Although health information is commonly conveyed as detached and analytic in nature, our findings suggest that neural processing of socially and emotionally salient health information is more closely associated with health information sharing.

Type

a

Huang, Y.; Bu, Y.; Ding, Y.; Lu, W.: From zero to one : a perspective on citing (2019) 0.00

0.0014351527 = product of:
  0.0028703054 = sum of:
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = weight(_text_:a in 5387) [ClassicSimilarity], result of:
          0.005740611 = score(doc=5387,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.10809815 = fieldWeight in 5387, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=5387)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Lu, W.; Li, X.; Liu, Z.; Cheng, Q.: How do author-selected keywords function semantically in scientific manuscripts? (2019) 0.00
```
0.0011959607 = product of:
  0.0023919214 = sum of:
    0.0023919214 = product of:
      0.0047838427 = sum of:
        0.0047838427 = weight(_text_:a in 5453) [ClassicSimilarity], result of:
          0.0047838427 = score(doc=5453,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.090081796 = fieldWeight in 5453, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5453)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Author-selected keywords have been widely utilized for indexing, information retrieval, bibliometrics and knowledge organization in previous studies. However, few studies exist con-cerning how author-selected keywords function semantically in scientific manuscripts. In this paper, we investigated this problem from the perspective of term function (TF) by devising indica-tors of the diversity and symmetry of keyword term functions in papers, as well as the intensity of individual term functions in papers. The data obtained from the whole Journal of Informetrics(JOI) were manually processed by an annotation scheme of key-word term functions, including "research topic," "research method," "research object," "research area," "data" and "others," based on empirical work in content analysis. The results show, quantitatively, that the diversity of keyword term function de-creases, and the irregularity increases with the number of author-selected keywords in a paper. Moreover, the distribution of the intensity of individual keyword term function indicated that no significant difference exists between the ranking of the five term functions with the increase of the number of author-selected keywords (i.e., "research topic" > "research method" > "research object" > "research area" > "data"). The findings indicate that precise keyword related research must take into account the dis-tinct types of author-selected keywords.

Type

a

Search (8 results, page 1 of 1)

Authors

Years

Themes