Search (12 results, page 1 of 1)

Qu, B.; Cong, G.; Li, C.; Sun, A.; Chen, H.: ¬An evaluation of classification models for question topic categorization (2012) 0.01
```
0.008697641 = product of:
  0.017395282 = sum of:
    0.017395282 = product of:
      0.034790564 = sum of:
        0.034790564 = weight(_text_:c in 237) [ClassicSimilarity], result of:
          0.034790564 = score(doc=237,freq=4.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.2694848 = fieldWeight in 237, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0390625 = fieldNorm(doc=237)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We study the problem of question topic classification using a very large real-world Community Question Answering (CQA) dataset from Yahoo! Answers. The dataset comprises 3.9 million questions and these questions are organized into more than 1,000 categories in a hierarchy. To the best knowledge, this is the first systematic evaluation of the performance of different classification methods on question topic classification as well as short texts. Specifically, we empirically evaluate the following in classifying questions into CQA categories: (a) the usefulness of n-gram features and bag-of-word features; (b) the performance of three standard classification algorithms (naive Bayes, maximum entropy, and support vector machines); (c) the performance of the state-of-the-art hierarchical classification algorithms; (d) the effect of training data size on performance; and (e) the effectiveness of the different components of CQA data, including subject, content, asker, and the best answer. The experimental results show what aspects are important for question topic classification in terms of both effectiveness and efficiency. We believe that the experimental findings from this study will be useful in real-world classification problems.

Chung, W.; Chen, H.: Browsing the underdeveloped Web : an experiment on the Arabic Medical Web Directory (2009) 0.01

0.0076062274 = product of:
  0.015212455 = sum of:
    0.015212455 = product of:
      0.03042491 = sum of:
        0.03042491 = weight(_text_:22 in 2733) [ClassicSimilarity], result of:
          0.03042491 = score(doc=2733,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.23214069 = fieldWeight in 2733, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2733)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 17:57:50

Hu, P.J.-H.; Lin, C.; Chen, H.: User acceptance of intelligence and security informatics technology : a study of COPLINK (2005) 0.01

0.007380193 = product of:
  0.014760386 = sum of:
    0.014760386 = product of:
      0.029520772 = sum of:
        0.029520772 = weight(_text_:c in 3233) [ClassicSimilarity], result of:
          0.029520772 = score(doc=3233,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.22866541 = fieldWeight in 3233, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.046875 = fieldNorm(doc=3233)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Ku, Y.; Chiu, C.; Zhang, Y.; Chen, H.; Su, H.: Text mining self-disclosing health information for public health service (2014) 0.01

0.007380193 = product of:
  0.014760386 = sum of:
    0.014760386 = product of:
      0.029520772 = sum of:
        0.029520772 = weight(_text_:c in 1262) [ClassicSimilarity], result of:
          0.029520772 = score(doc=1262,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.22866541 = fieldWeight in 1262, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.046875 = fieldNorm(doc=1262)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Carmel, E.; Crawford, S.; Chen, H.: Browsing in hypertext : a cognitive study (1992) 0.01

0.0063385232 = product of:
  0.0126770465 = sum of:
    0.0126770465 = product of:
      0.025354093 = sum of:
        0.025354093 = weight(_text_:22 in 7469) [ClassicSimilarity], result of:
          0.025354093 = score(doc=7469,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.19345059 = fieldWeight in 7469, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=7469)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Source: IEEE transactions on systems, man and cybernetics. 22(1992) no.5, S.865-884

Leroy, G.; Chen, H.: Genescene: an ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts (2005) 0.01

0.0063385232 = product of:
  0.0126770465 = sum of:
    0.0126770465 = product of:
      0.025354093 = sum of:
        0.025354093 = weight(_text_:22 in 5259) [ClassicSimilarity], result of:
          0.025354093 = score(doc=5259,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.19345059 = fieldWeight in 5259, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5259)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 7.2006 14:26:01

Zheng, R.; Li, J.; Chen, H.; Huang, Z.: ¬A framework for authorship identification of online messages : writing-style features and classification techniques (2006) 0.01

0.0063385232 = product of:
  0.0126770465 = sum of:
    0.0126770465 = product of:
      0.025354093 = sum of:
        0.025354093 = weight(_text_:22 in 5276) [ClassicSimilarity], result of:
          0.025354093 = score(doc=5276,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.19345059 = fieldWeight in 5276, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5276)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 7.2006 16:14:37

Hu, D.; Kaza, S.; Chen, H.: Identifying significant facilitators of dark network evolution (2009) 0.01

0.0063385232 = product of:
  0.0126770465 = sum of:
    0.0126770465 = product of:
      0.025354093 = sum of:
        0.025354093 = weight(_text_:22 in 2753) [ClassicSimilarity], result of:
          0.025354093 = score(doc=2753,freq=2.0), product of:
            0.13106237 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.037426826 = queryNorm
            0.19345059 = fieldWeight in 2753, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2753)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Date: 22. 3.2009 18:50:30

Chen, H.; Ng, T.D.; Martinez, J.; Schatz, B.R.: ¬A concept space approach to addressing the vocabulary problem in scientific information retrieval : an experiment on the Worm Community System (1997) 0.01
```
0.0061501605 = product of:
  0.012300321 = sum of:
    0.012300321 = product of:
      0.024600642 = sum of:
        0.024600642 = weight(_text_:c in 6492) [ClassicSimilarity], result of:
          0.024600642 = score(doc=6492,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1905545 = fieldWeight in 6492, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6492)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This research presents an algorithmic approach to addressing the vocabulary problem in scientific information retrieval and information sharing, using the molecular biology domain as an example. We first present a literature review of cognitive studies related to the vocabulary problem and vocabulary-based search aids (thesauri) and then discuss techniques for building robust and domain-specific thesauri to assist in cross-domain scientific information retrieval. Using a variation of the automatic thesaurus generation techniques, which we refer to as the concept space approach, we recently conducted an experiment in the molecular biology domain in which we created a C. elegans worm thesaurus of 7.657 worm-specific terms and a Drosophila fly thesaurus of 15.626 terms. About 30% of these terms overlapped, which created vocabulary paths from one subject domain to the other. Based on a cognitve study of term association involving 4 biologists, we found that a large percentage (59,6-85,6%) of the terms suggested by the subjects were identified in the cojoined fly-worm thesaurus. However, we found only a small percentage (8,4-18,1%) of the associations suggested by the subjects in the thesaurus

Dang, Y.; Zhang, Y.; Chen, H.; Hu, P.J.-H.; Brown, S.A.; Larson, C.: Arizona Literature Mapper : an integrated approach to monitor and analyze global bioterrorism research literature (2009) 0.01

0.0061501605 = product of:
  0.012300321 = sum of:
    0.012300321 = product of:
      0.024600642 = sum of:
        0.024600642 = weight(_text_:c in 2943) [ClassicSimilarity], result of:
          0.024600642 = score(doc=2943,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1905545 = fieldWeight in 2943, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2943)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Huang, C.; Fu, T.; Chen, H.: Text-based video content classification for online video-sharing sites (2010) 0.01

0.0061501605 = product of:
  0.012300321 = sum of:
    0.012300321 = product of:
      0.024600642 = sum of:
        0.024600642 = weight(_text_:c in 3452) [ClassicSimilarity], result of:
          0.024600642 = score(doc=3452,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.1905545 = fieldWeight in 3452, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3452)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Chen, H.: Intelligence and security informatics : Introduction to the special topic issue (2005) 0.00
```
0.004305112 = product of:
  0.008610224 = sum of:
    0.008610224 = product of:
      0.017220449 = sum of:
        0.017220449 = weight(_text_:c in 3232) [ClassicSimilarity], result of:
          0.017220449 = score(doc=3232,freq=2.0), product of:
            0.1291003 = queryWeight, product of:
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.037426826 = queryNorm
            0.13338815 = fieldWeight in 3232, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.4494052 = idf(docFreq=3817, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3232)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Making the Nation Safer: The Role of Science and Technology in Countering Terrorism The commitment of the scientific, engineering, and health communities to helping the United States and the world respond to security challenges became evident after September 11, 2001. The U.S. National Research Council's report an "Making the Nation Safer: The Role of Science and Technology in Countering Terrorism," (National Research Council, 2002, p. 1) explains the context of such a new commitment: Terrorism is a serious threat to the Security of the United States and indeed the world. The vulnerability of societies to terrorist attacks results in part from the proliferation of chemical, biological, and nuclear weapons of mass destruction, but it also is a consequence of the highly efficient and interconnected systems that we rely an for key services such as transportation, information, energy, and health care. The efficient functioning of these systems reflects great technological achievements of the past century, but interconnectedness within and across systems also means that infrastructures are vulnerable to local disruptions, which could lead to widespread or catastrophic failures. As terrorists seek to exploit these vulnerabilities, it is fitting that we harness the nation's exceptional scientific and technological capabilities to Counter terrorist threats. A committee of 24 of the leading scientific, engineering, medical, and policy experts in the United States conducted the study described in the report. Eight panels were separately appointed and asked to provide input to the committee. The panels included: (a) biological sciences, (b) chemical issues, (c) nuclear and radiological issues, (d) information technology, (e) transportation, (f) energy facilities, Cities, and fixed infrastructure, (g) behavioral, social, and institutional issues, and (h) systems analysis and systems engineering. The focus of the committee's work was to make the nation safer from emerging terrorist threats that sought to inflict catastrophic damage an the nation's people, its infrastructure, or its economy. The committee considered nine areas, each of which is discussed in a separate chapter in the report: nuclear and radiological materials, human and agricultural health systems, toxic chemicals and explosive materials, information technology, energy systems, transportation systems, Cities and fixed infrastructure, the response of people to terrorism, and complex and interdependent systems. The chapter an information technology (IT) is particularly relevant to this special issue. The report recommends that "a strategic long-term research and development agenda should be established to address three primary counterterrorismrelated areas in IT: information and network security, the IT needs of emergency responders, and information fusion and management" (National Research Council, 2002, pp. 11 -12). The MD in information and network security should include approaches and architectures for prevention, identification, and containment of cyber-intrusions and recovery from them. The R&D to address IT needs of emergency responders should include ensuring interoperability, maintaining and expanding communications capability during an emergency, communicating with the public during an emergency, and providing support for decision makers. The R&D in information fusion and management for the intelligence, law enforcement, and emergency response communities should include data mining, data integration, language technologies, and processing of image and audio data. Much of the research reported in this special issue is related to information fusion and management for homeland security.

Search (12 results, page 1 of 1)

Authors

Years

Themes