Search (9 results, page 1 of 1)

Lu, K.; Cai, X.; Ajiferuke, I.; Wolfram, D.: Vocabulary size and its effect on topic representation (2017) 0.03
```
0.025475822 = product of:
  0.050951645 = sum of:
    0.035584353 = weight(_text_:management in 3414) [ClassicSimilarity], result of:
      0.035584353 = score(doc=3414,freq=2.0), product of:
        0.15925534 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.047248192 = queryNorm
        0.22344214 = fieldWeight in 3414, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.046875 = fieldNorm(doc=3414)
    0.015367293 = product of:
      0.030734586 = sum of:
        0.030734586 = weight(_text_:science in 3414) [ClassicSimilarity], result of:
          0.030734586 = score(doc=3414,freq=4.0), product of:
            0.124457374 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.047248192 = queryNorm
            0.24694869 = fieldWeight in 3414, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.046875 = fieldNorm(doc=3414)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This study investigates how computational overhead for topic model training may be reduced by selectively removing terms from the vocabulary of text corpora being modeled. We compare the impact of removing singly occurring terms, the top 0.5%, 1% and 5% most frequently occurring terms and both top 0.5% most frequent and singly occurring terms, along with changes in the number of topics modeled (10, 20, 30, 40, 50, 100) using three datasets. Four outcome measures are compared. The removal of singly occurring terms has little impact on outcomes for all of the measures tested. Document discriminative capacity, as measured by the document space density, is reduced by the removal of frequently occurring terms, but increases with higher numbers of topics. Vocabulary size does not greatly influence entropy, but entropy is affected by the number of topics. Finally, topic similarity, as measured by pairwise topic similarity and Jensen-Shannon divergence, decreases with the removal of frequent terms. The findings have implications for information science research in information retrieval and informetrics that makes use of topic modeling.

Content

Vgl.: http://www.sciencedirect.com/science/article/pii/S0306457317300298.

Source

Information processing and management. 53(2017) no.3, S.653-665

Ajiferuke, I.; Lu, K.; Wolfram, D.: ¬A comparison of citer and citation-based measure outcomes for multiple disciplines (2010) 0.02

0.015035374 = product of:
  0.060141496 = sum of:
    0.060141496 = sum of:
      0.021732632 = weight(_text_:science in 4000) [ClassicSimilarity], result of:
        0.021732632 = score(doc=4000,freq=2.0), product of:
          0.124457374 = queryWeight, product of:
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.047248192 = queryNorm
          0.17461908 = fieldWeight in 4000, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            2.6341193 = idf(docFreq=8627, maxDocs=44218)
            0.046875 = fieldNorm(doc=4000)
      0.038408864 = weight(_text_:22 in 4000) [ClassicSimilarity], result of:
        0.038408864 = score(doc=4000,freq=2.0), product of:
          0.16545512 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.047248192 = queryNorm
          0.23214069 = fieldWeight in 4000, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=4000)
  0.25 = coord(1/4)

Date: 28. 9.2010 12:54:22
Source: Journal of the American Society for Information Science and Technology. 61(2010) no.10, S.2086-2096

Mu, X.; Lu, K.; Ryu, H.: Explicitly integrating MeSH thesaurus help into health information retrieval systems : an empirical user study (2014) 0.01

0.007413407 = product of:
  0.029653627 = sum of:
    0.029653627 = weight(_text_:management in 2703) [ClassicSimilarity], result of:
      0.029653627 = score(doc=2703,freq=2.0), product of:
        0.15925534 = queryWeight, product of:
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.047248192 = queryNorm
        0.18620178 = fieldWeight in 2703, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.3706124 = idf(docFreq=4130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2703)
  0.25 = coord(1/4)

Source: Information processing and management. 50(2014) no.1, S.24-40

Koh, K.; Snead, J.T.; Lu, K.: ¬The processes of maker learning and information behavior in a technology-rich high school class (2019) 0.00
```
0.0039210445 = product of:
  0.015684178 = sum of:
    0.015684178 = product of:
      0.031368356 = sum of:
        0.031368356 = weight(_text_:science in 5436) [ClassicSimilarity], result of:
          0.031368356 = score(doc=5436,freq=6.0), product of:
            0.124457374 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.047248192 = queryNorm
            0.25204095 = fieldWeight in 5436, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5436)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This mixed-method study investigated the processes of making and information behavior as integrated in self-directed learning in a high school maker class. Twenty students engaged in making projects of their choice with low- and high-technologies for 15 weeks. Data collection included visual process mapping activities, surveys, and Dervin's Sense-Making Methodology-informed interviews. Findings included inspirations, actions, emotions, challenges, helps, and learning that occurred during the making processes. Information played an integral role as students engaged in creative production and learning. Students identified information as helps, challenges, how they learn, and learning outcomes. The study proposes a new, evolving process model of making that illustrates production-centered information behavior and learning. The model's spiral form emphasizes the non-linear and cyclical nature of the making process. Squiggly lines represent how the making process is gap-filled and uncertain. The study contributes to the scholarly and professional fields of information science, library and information studies, maker, and STEAM (Science, Technology, Engineering, Art, and Math) learning.

Source

Journal of the Association for Information Science and Technology. 70(2019) no.12, S.1395-1412
Lu, K.; Wolfram, D.: Measuring author research relatedness : a comparison of word-based, topic-based, and author cocitation approaches (2012) 0.00
```
0.0032015191 = product of:
  0.012806077 = sum of:
    0.012806077 = product of:
      0.025612153 = sum of:
        0.025612153 = weight(_text_:science in 453) [ClassicSimilarity], result of:
          0.025612153 = score(doc=453,freq=4.0), product of:
            0.124457374 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.047248192 = queryNorm
            0.20579056 = fieldWeight in 453, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=453)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Relationships between authors based on characteristics of published literature have been studied for decades. Author cocitation analysis using mapping techniques has been most frequently used to study how closely two authors are thought to be in intellectual space based on how members of the research community co-cite their works. Other approaches exist to study author relatedness based more directly on the text of their published works. In this study we present static and dynamic word-based approaches using vector space modeling, as well as a topic-based approach based on latent Dirichlet allocation for mapping author research relatedness. Vector space modeling is used to define an author space consisting of works by a given author. Outcomes for the two word-based approaches and a topic-based approach for 50 prolific authors in library and information science are compared with more traditional author cocitation analysis using multidimensional scaling and hierarchical cluster analysis. The two word-based approaches produced similar outcomes except where two authors were frequent co-authors for the majority of their articles. The topic-based approach produced the most distinctive map.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.10, S.1973-1986
Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.00
```
0.0032015191 = product of:
  0.012806077 = sum of:
    0.012806077 = product of:
      0.025612153 = sum of:
        0.025612153 = weight(_text_:science in 4292) [ClassicSimilarity], result of:
          0.025612153 = score(doc=4292,freq=4.0), product of:
            0.124457374 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.047248192 = queryNorm
            0.20579056 = fieldWeight in 4292, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.

Source

Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133

Lu, K.; Kipp, M.E.I.: Understanding the retrieval effectiveness of collaborative tags and author keywords in different retrieval environments : an experimental study on medical collections (2014) 0.00

0.0022638158 = product of:
  0.009055263 = sum of:
    0.009055263 = product of:
      0.018110527 = sum of:
        0.018110527 = weight(_text_:science in 1215) [ClassicSimilarity], result of:
          0.018110527 = score(doc=1215,freq=2.0), product of:
            0.124457374 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.047248192 = queryNorm
            0.1455159 = fieldWeight in 1215, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1215)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 65(2014) no.3, S.483-500

Lu, K.; Joo, S.; Lee, T.; Hu, R.: Factors that influence query reformulations and search performance in health information retrieval : a multilevel modeling approach (2017) 0.00

0.0022638158 = product of:
  0.009055263 = sum of:
    0.009055263 = product of:
      0.018110527 = sum of:
        0.018110527 = weight(_text_:science in 3754) [ClassicSimilarity], result of:
          0.018110527 = score(doc=3754,freq=2.0), product of:
            0.124457374 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.047248192 = queryNorm
            0.1455159 = fieldWeight in 3754, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3754)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 68(2017) no.8, S.1886-1898

Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.00

0.0022638158 = product of:
  0.009055263 = sum of:
    0.009055263 = product of:
      0.018110527 = sum of:
        0.018110527 = weight(_text_:science in 4005) [ClassicSimilarity], result of:
          0.018110527 = score(doc=4005,freq=2.0), product of:
            0.124457374 = queryWeight, product of:
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.047248192 = queryNorm
            0.1455159 = fieldWeight in 4005, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.6341193 = idf(docFreq=8627, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4005)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Source: Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1776-1784

Search (9 results, page 1 of 1)

Authors

Themes