Search (6 results, page 1 of 1)

Wang, S.; Ma, Y.; Mao, J.; Bai, Y.; Liang, Z.; Li, G.: Quantifying scientific breakthroughs by a novel disruption indicator based on knowledge entities : On the rise of scrape-and-report scholarship in online reviews research (2023) 0.01

0.0128297005 = product of:
  0.025659401 = sum of:
    0.008582841 = weight(_text_:information in 882) [ClassicSimilarity], result of:
      0.008582841 = score(doc=882,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.09697737 = fieldWeight in 882, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=882)
    0.01707656 = product of:
      0.03415312 = sum of:
        0.03415312 = weight(_text_:22 in 882) [ClassicSimilarity], result of:
          0.03415312 = score(doc=882,freq=2.0), product of:
            0.17654699 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050415643 = queryNorm
            0.19345059 = fieldWeight in 882, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=882)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 1.2023 18:37:33
Source: Journal of the Association for Information Science and Technology. 74(2023) no.2, S.150-167

Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.00
```
0.0042914203 = product of:
  0.017165681 = sum of:
    0.017165681 = weight(_text_:information in 4292) [ClassicSimilarity], result of:
      0.017165681 = score(doc=4292,freq=8.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.19395474 = fieldWeight in 4292, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4292)
  0.25 = coord(1/4)
```
Abstract

Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.

Source

Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133
Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.00
```
0.0030344925 = product of:
  0.01213797 = sum of:
    0.01213797 = weight(_text_:information in 4005) [ClassicSimilarity], result of:
      0.01213797 = score(doc=4005,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13714671 = fieldWeight in 4005, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4005)
  0.25 = coord(1/4)
```
Abstract

Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1776-1784
Mao, J.; Cui, H.: Identifying bacterial biotope entities using sequence labeling : performance and feature analysis (2018) 0.00
```
0.0030344925 = product of:
  0.01213797 = sum of:
    0.01213797 = weight(_text_:information in 4462) [ClassicSimilarity], result of:
      0.01213797 = score(doc=4462,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.13714671 = fieldWeight in 4462, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4462)
  0.25 = coord(1/4)
```
Abstract

Habitat information is important to biodiversity conservation and research. Extracting bacterial biotope entities from scientific publications is important to large scale study of the relationships between bacteria and their living environments. To facilitate the further development of robust habitat text mining systems for biodiversity, following the BioNLP task framework, three sequence labeling techniques, CRFs (Conditional Random Fields), MEMM (Maximum Entropy Markov Model) and SVMhmm (Support Vector Machine) and one classifier, SVMmulticlass, are compared on their performance in identifying three types of bacterial biotope entities: bacteria, habitats and geographical locations. The effectiveness of a variety of basic word formation features, syntactic features, and semantic features are exploited and compared for the three sequence labeling methods. Experiments on two publicly available BioNLP collections show that, in addition to a WordNet feature, word embedding featured clusters (although not trained with the task-specific corpus) consistently improve the performance for all methods on all entity types in both collections. Other features produce various results. Our results also show that when trained on limited corpora, Brown clusters resulted in better performance than word embedding clusters did. Further analysis suggests that the entity recognition performance can be greatly boosted through improving the accuracy of entity boundary identification.

Source

Journal of the Association for Information Science and Technology. 69(2018) no.9, S.1134-1147

Liang, Z.; Mao, J.; Li, G.: Bias against scientific novelty : a prepublication perspective (2023) 0.00

0.0025748524 = product of:
  0.01029941 = sum of:
    0.01029941 = weight(_text_:information in 845) [ClassicSimilarity], result of:
      0.01029941 = score(doc=845,freq=2.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.116372846 = fieldWeight in 845, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=845)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 74(2023) no.1, S.99-114

Hu, K.; Luo, Q.; Qi, K.; Yang, S.; Mao, J.; Fu, X.; Zheng, J.; Wu, H.; Guo, Y.; Zhu, Q.: Understanding the topic evolution of scientific literatures like an evolving city : using Google Word2Vec model and spatial autocorrelation analysis (2019) 0.00
```
0.002427594 = product of:
  0.009710376 = sum of:
    0.009710376 = weight(_text_:information in 5102) [ClassicSimilarity], result of:
      0.009710376 = score(doc=5102,freq=4.0), product of:
        0.08850355 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.050415643 = queryNorm
        0.10971737 = fieldWeight in 5102, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=5102)
  0.25 = coord(1/4)
```
Abstract

Topic evolution has been described by many approaches from a macro level to a detail level, by extracting topic dynamics from text in literature and other media types. However, why the evolution happens is less studied. In this paper, we focus on whether and how the keyword semantics can invoke or affect the topic evolution. We assume that the semantic relatedness among the keywords can affect topic popularity during literature surveying and citing process, thus invoking evolution. However, the assumption is needed to be confirmed in an approach that fully considers the semantic interactions among topics. Traditional topic evolution analyses in scientometric domains cannot provide such support because of using limited semantic meanings. To address this problem, we apply the Google Word2Vec, a deep learning language model, to enhance the keywords with more complete semantic information. We further develop the semantic space as an urban geographic space. We analyze the topic evolution geographically using the measures of spatial autocorrelation, as if keywords are the changing lands in an evolving city. The keyword citations (keyword citation counts one when the paper containing this keyword obtains a citation) are used as an indicator of keyword popularity. Using the bibliographical datasets of the geographical natural hazard field, experimental results demonstrate that in some local areas, the popularity of keywords is affecting that of the surrounding keywords. However, there are no significant impacts on the evolution of all keywords. The spatial autocorrelation analysis identifies the interaction patterns (including High-High leading, High-Low suppressing) among the keywords in local areas. This approach can be regarded as an analyzing framework borrowed from geospatial modeling. Moreover, the prediction results in local areas are demonstrated to be more accurate if considering the spatial autocorrelations.

Source

Information processing and management. 56(2019) no.4, S.1185-1203

Search (6 results, page 1 of 1)

Authors

Years

Themes