Search (5 results, page 1 of 1)

Did you mean:
lcsh's%3a%22Cataloging %2f data processing%22 5
lcshs%3a%22Cataloging %2f data processing%22 5

Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.05
```
0.04727441 = product of:
  0.09454882 = sum of:
    0.057835944 = weight(_text_:data in 4367) [ClassicSimilarity], result of:
      0.057835944 = score(doc=4367,freq=10.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.39059696 = fieldWeight in 4367, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4367)
    0.036712877 = product of:
      0.073425755 = sum of:
        0.073425755 = weight(_text_:processing in 4367) [ClassicSimilarity], result of:
          0.073425755 = score(doc=4367,freq=6.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.38733965 = fieldWeight in 4367, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4367)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.

Theme

Data Mining
Zhang, Z.; Li, Q.; Zeng, D.; Ga, H.: Extracting evolutionary communities in community question answering (2014) 0.01
```
0.009144665 = product of:
  0.03657866 = sum of:
    0.03657866 = weight(_text_:data in 1286) [ClassicSimilarity], result of:
      0.03657866 = score(doc=1286,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 1286, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1286)
  0.25 = coord(1/4)
```
Abstract

With the rapid growth of Web 2.0, community question answering (CQA) has become a prevalent information seeking channel, in which users form interactive communities by posting questions and providing answers. Communities may evolve over time, because of changes in users' interests, activities, and new users joining the network. To better understand user interactions in CQA communities, it is necessary to analyze the community structures and track community evolution over time. Existing work in CQA focuses on question searching or content quality detection, and the important problems of community extraction and evolutionary pattern detection have not been studied. In this article, we propose a probabilistic community model (PCM) to extract overlapping community structures and capture their evolution patterns in CQA. The empirical results show that our algorithm appears to improve the community extraction quality. We show empirically, using the iPhone data set, that interesting community evolution patterns can be discovered, with each evolution pattern reflecting the variation of users' interests over time. Our analysis suggests that individual users could benefit to gain comprehensive information from tracking the transition of products. We also show that the communities provide a decision-making basis for business.

Theme

Data Mining
Zhang, Z.; Zhang, Z.; Law, R.: Editorial responsiveness, journal quality, and total review time : an empirical analysis (2012) 0.01
```
0.009052756 = product of:
  0.036211025 = sum of:
    0.036211025 = weight(_text_:data in 245) [ClassicSimilarity], result of:
      0.036211025 = score(doc=245,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 245, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=245)
  0.25 = coord(1/4)
```
Abstract

This study examined the relationships among perceived editorial responsiveness, perceived journal quality, and review time of submissions for authors in mainland China. Online review data generated by authors who have experienced the submission process in 10 Chinese academic journals were collected. The results of Spearman correlation analysis show that Chinese authors' perceived responsiveness of an editorial office is positively correlated with perceived quality of the journal, and the total review time does not affect perceptions of the quality of a journal and its editorial responsiveness.

Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.01

0.0077595054 = product of:
  0.031038022 = sum of:
    0.031038022 = weight(_text_:data in 1326) [ClassicSimilarity], result of:
      0.031038022 = score(doc=1326,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 1326, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1326)
  0.25 = coord(1/4)

Theme: Data Mining

Ren, P.; Chen, Z.; Ma, J.; Zhang, Z.; Si, L.; Wang, S.: Detecting temporal patterns of user queries (2017) 0.01
```
0.0077595054 = product of:
  0.031038022 = sum of:
    0.031038022 = weight(_text_:data in 3315) [ClassicSimilarity], result of:
      0.031038022 = score(doc=3315,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 3315, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=3315)
  0.25 = coord(1/4)
```
Abstract

Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.

Search (5 results, page 1 of 1)

Authors

Themes