Search (17 results, page 1 of 1)

  • × author_ss:"Chen, H."
  1. Zheng, R.; Li, J.; Chen, H.; Huang, Z.: ¬A framework for authorship identification of online messages : writing-style features and classification techniques (2006) 0.01
    0.010532449 = product of:
      0.021064898 = sum of:
        0.009444992 = product of:
          0.03777997 = sum of:
            0.03777997 = weight(_text_:learning in 5276) [ClassicSimilarity], result of:
              0.03777997 = score(doc=5276,freq=2.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.24665193 = fieldWeight in 5276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5276)
          0.25 = coord(1/4)
        0.011619906 = product of:
          0.023239812 = sum of:
            0.023239812 = weight(_text_:22 in 5276) [ClassicSimilarity], result of:
              0.023239812 = score(doc=5276,freq=2.0), product of:
                0.120133065 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0343058 = queryNorm
                0.19345059 = fieldWeight in 5276, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5276)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    With the rapid proliferation of Internet technologies and applications, misuse of online messages for inappropriate or illegal purposes has become a major concern for society. The anonymous nature of online-message distribution makes identity tracing a critical problem. We developed a framework for authorship identification of online messages to address the identity-tracing problem. In this framework, four types of writing-style features (lexical, syntactic, structural, and content-specific features) are extracted and inductive learning algorithms are used to build feature-based classification models to identify authorship of online messages. To examine this framework, we conducted experiments on English and Chinese online-newsgroup messages. We compared the discriminating power of the four types of features and of three classification techniques: decision trees, backpropagation neural networks, and support vector machines. The experimental results showed that the proposed approach was able to identify authors of online messages with satisfactory accuracy of 70 to 95%. All four types of message features contributed to discriminating authors of online messages. Support vector machines outperformed the other two classification techniques in our experiments. The high performance we achieved for both the English and Chinese datasets showed the potential of this approach in a multiple-language context.
    Date
    22. 7.2006 16:14:37
  2. Chen, H.: Machine learning for information retrieval : neural networks, symbolic learning, and genetic algorithms (1994) 0.01
    0.0066114943 = product of:
      0.026445977 = sum of:
        0.026445977 = product of:
          0.10578391 = sum of:
            0.10578391 = weight(_text_:learning in 2657) [ClassicSimilarity], result of:
              0.10578391 = score(doc=2657,freq=8.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.6906254 = fieldWeight in 2657, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2657)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    In the 1980s, knowledge-based techniques also made an impressive contribution to 'intelligent' information retrieval and indexing. More recently, researchers have turned to newer artificial intelligence based inductive learning techniques including neural networks, symbolic learning, and genetic algorithms grounded on diverse paradigms. These have provided great opportunities to enhance the capabilities of current information storage and retrieval systems. Provides an overview of these techniques and presents 3 popular methods: the connectionist Hopfield network; the symbolic ID3/ID5R; and evaluation based genetic algorithms in the context of information retrieval. The techniques are promising in their ability to analyze user queries, identify users' information needs, and suggest alternatives for search and can greatly complement the prevailing full text, keyword based, probabilistic, and knowledge based techniques
  3. Chen, H.: Introduction to the JASIST special topic section on Web retrieval and mining : A machine learning perspective (2003) 0.01
    0.0056669954 = product of:
      0.022667982 = sum of:
        0.022667982 = product of:
          0.09067193 = sum of:
            0.09067193 = weight(_text_:learning in 1610) [ClassicSimilarity], result of:
              0.09067193 = score(doc=1610,freq=8.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.59196466 = fieldWeight in 1610, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1610)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Research in information retrieval (IR) has advanced significantly in the past few decades. Many tasks, such as indexing and text categorization, can be performed automatically with minimal human effort. Machine learning has played an important role in such automation by learning various patterns such as document topics, text structures, and user interests from examples. In recent years, it has become increasingly difficult to search for useful information an the World Wide Web because of its large size and unstructured nature. Useful information and resources are often hidden in the Web. While machine learning has been successfully applied to traditional IR systems, it poses some new challenges to apply these algorithms to the Web due to its large size, link structure, diversity in content and languages, and dynamic nature. On the other hand, such characteristics of the Web also provide interesting patterns and knowledge that do not present in traditional information retrieval systems.
  4. Chen, H.; Shankaranarayanan, G.; She, L.: ¬A machine learning approach to inductive query by examples : an experiment using relevance feedback, ID3, genetic algorithms, and simulated annealing (1998) 0.00
    0.004907762 = product of:
      0.019631049 = sum of:
        0.019631049 = product of:
          0.078524195 = sum of:
            0.078524195 = weight(_text_:learning in 1148) [ClassicSimilarity], result of:
              0.078524195 = score(doc=1148,freq=6.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.51265645 = fieldWeight in 1148, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1148)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Information retrieval using probabilistic techniques has attracted significant attention on the part of researchers in information and computer science over the past few decades. In the 1980s, knowledge-based techniques also made an impressive contribution to 'intelligent' information retrieval and indexing. More recently, information science researchers have tfurned to other newer inductive learning techniques including symbolic learning, genetic algorithms, and simulated annealing. These newer techniques, which are grounded in diverse paradigms, have provided great opportunities for researchers to enhance the information processing and retrieval capabilities of current information systems. In this article, we first provide an overview of these newer techniques and their use in information retrieval research. In order to femiliarize readers with the techniques, we present 3 promising methods: the symbolic ID3 algorithm, evolution-based genetic algorithms, and simulated annealing. We discuss their knowledge representations and algorithms in the unique context of information retrieval
  5. Li, J.; Zhang, Z.; Li, X.; Chen, H.: Kernel-based learning for biomedical relation extraction (2008) 0.00
    0.004907762 = product of:
      0.019631049 = sum of:
        0.019631049 = product of:
          0.078524195 = sum of:
            0.078524195 = weight(_text_:learning in 1611) [ClassicSimilarity], result of:
              0.078524195 = score(doc=1611,freq=6.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.51265645 = fieldWeight in 1611, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1611)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Relation extraction is the process of scanning text for relationships between named entities. Recently, significant studies have focused on automatically extracting relations from biomedical corpora. Most existing biomedical relation extractors require manual creation of biomedical lexicons or parsing templates based on domain knowledge. In this study, we propose to use kernel-based learning methods to automatically extract biomedical relations from literature text. We develop a framework of kernel-based learning for biomedical relation extraction. In particular, we modified the standard tree kernel function by incorporating a trace kernel to capture richer contextual information. In our experiments on a biomedical corpus, we compare different kernel functions for biomedical relation detection and classification. The experimental results show that a tree kernel outperforms word and sequence kernels for relation detection, our trace-tree kernel outperforms the standard tree kernel, and a composite kernel outperforms individual kernels for relation extraction.
  6. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.00
    0.004089802 = product of:
      0.016359208 = sum of:
        0.016359208 = product of:
          0.06543683 = sum of:
            0.06543683 = weight(_text_:learning in 4367) [ClassicSimilarity], result of:
              0.06543683 = score(doc=4367,freq=6.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.42721373 = fieldWeight in 4367, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4367)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.
  7. Chen, H.; Baptista Nunes, J.M.; Ragsdell, G.; An, X.: Somatic and cultural knowledge : drivers of a habitus-driven model of tacit knowledge acquisition (2019) 0.00
    0.0040486967 = product of:
      0.016194787 = sum of:
        0.016194787 = product of:
          0.06477915 = sum of:
            0.06477915 = weight(_text_:learning in 5460) [ClassicSimilarity], result of:
              0.06477915 = score(doc=5460,freq=12.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.42291996 = fieldWeight in 5460, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.02734375 = fieldNorm(doc=5460)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    The purpose of this paper is to identify and explain the role of individual learning and development in acquiring tacit knowledge in the context of the inexorable and intense continuous change (technological and otherwise) that characterizes our society today, and also to investigate the software (SW) sector, which is at the core of contemporary continuous change and is a paradigm of effective and intrinsic knowledge sharing (KS). This makes the SW sector unique and different from others where KS is so hard to implement. Design/methodology/approach The study employed an inductive qualitative approach based on a multi-case study approach, composed of three successful SW companies in China. These companies are representative of the fabric of the sector, namely a small- and medium-sized enterprise, a large private company and a large state-owned enterprise. The fieldwork included 44 participants who were interviewed using a semi-structured script. The interview data were coded and interpreted following the Straussian grounded theory pattern of open coding, axial coding and selective coding. The process of interviewing was stopped when theoretical saturation was achieved after a careful process of theoretical sampling.
    Findings The findings of this research suggest that individual learning and development are deemed to be the fundamental feature for professional success and survival in the continuously changing environment of the SW industry today. However, individual learning was described by the participants as much more than a mere individual process. It involves a collective and participatory effort within the organization and the sector as a whole, and a KS process that transcends organizational, cultural and national borders. Individuals in particular are mostly motivated by the pressing need to face and adapt to the dynamic and changeable environments of today's digital society that is led by the sector. Software practitioners are continuously in need of learning, refreshing and accumulating tacit knowledge, partly because it is required by their companies, but also due to a sound awareness of continuous technical and technological changes that seem only to increase with the advances of information technology. This led to a clear theoretical understanding that the continuous change that faces the sector has led to individual acquisition of culture and somatic knowledge that in turn lay the foundation for not only the awareness of the need for continuous individual professional development but also for the creation of habitus related to KS and continuous learning. Originality/value The study reported in this paper shows that there is a theoretical link between the existence of conducive organizational and sector-wide somatic and cultural knowledge, and the success of KS practices that lead to individual learning and development. Therefore, the theory proposed suggests that somatic and cultural knowledge are crucial drivers for the creation of habitus of individual tacit knowledge acquisition. The paper further proposes a habitus-driven individual development (HDID) Theoretical Model that can be of use to both academics and practitioners interested in fostering and developing processes of KS and individual development in knowledge-intensive organizations.
  8. Chung, W.; Chen, H.: Browsing the underdeveloped Web : an experiment on the Arabic Medical Web Directory (2009) 0.00
    0.0034859716 = product of:
      0.013943886 = sum of:
        0.013943886 = product of:
          0.027887773 = sum of:
            0.027887773 = weight(_text_:22 in 2733) [ClassicSimilarity], result of:
              0.027887773 = score(doc=2733,freq=2.0), product of:
                0.120133065 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0343058 = queryNorm
                0.23214069 = fieldWeight in 2733, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2733)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2009 17:57:50
  9. Chau, M.; Wong, C.H.; Zhou, Y.; Qin, J.; Chen, H.: Evaluating the use of search engine development tools in IT education (2010) 0.00
    0.0033393092 = product of:
      0.013357237 = sum of:
        0.013357237 = product of:
          0.053428948 = sum of:
            0.053428948 = weight(_text_:learning in 3325) [ClassicSimilarity], result of:
              0.053428948 = score(doc=3325,freq=4.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.34881854 = fieldWeight in 3325, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3325)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    It is important for education in computer science and information systems to keep up to date with the latest development in technology. With the rapid development of the Internet and the Web, many schools have included Internet-related technologies, such as Web search engines and e-commerce, as part of their curricula. Previous research has shown that it is effective to use search engine development tools to facilitate students' learning. However, the effectiveness of these tools in the classroom has not been evaluated. In this article, we review the design of three search engine development tools, SpidersRUs, Greenstone, and Alkaline, followed by an evaluation study that compared the three tools in the classroom. In the study, 33 students were divided into 13 groups and each group used the three tools to develop three independent search engines in a class project. Our evaluation results showed that SpidersRUs performed better than the two other tools in overall satisfaction and the level of knowledge gained in their learning experience when using the tools for a class project on Internet applications development.
  10. Ramsey, M.C.; Chen, H.; Zhu, B.; Schatz, B.R.: ¬A collection of visual thesauri for browsing large collections of geographic images (1999) 0.00
    0.0033057472 = product of:
      0.013222989 = sum of:
        0.013222989 = product of:
          0.052891955 = sum of:
            0.052891955 = weight(_text_:learning in 3922) [ClassicSimilarity], result of:
              0.052891955 = score(doc=3922,freq=2.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.3453127 = fieldWeight in 3922, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3922)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Digital libraries of geo-spatial multimedia content are currently deficient in providing fuzzy, concept-based retrieval mechanisms to users. The main challenge is that indexing and thesaurus creation are extremely labor-intensive processes for text documents and especially for images. Recently, 800.000 declassified staellite photographs were made available by the US Geological Survey. Additionally, millions of satellite and aerial photographs are archived in national and local map libraries. Such enormous collections make human indexing and thesaurus generation methods impossible to utilize. In this article we propose a scalable method to automatically generate visual thesauri of large collections of geo-spatial media using fuzzy, unsupervised machine-learning techniques
  11. Carmel, E.; Crawford, S.; Chen, H.: Browsing in hypertext : a cognitive study (1992) 0.00
    0.0029049765 = product of:
      0.011619906 = sum of:
        0.011619906 = product of:
          0.023239812 = sum of:
            0.023239812 = weight(_text_:22 in 7469) [ClassicSimilarity], result of:
              0.023239812 = score(doc=7469,freq=2.0), product of:
                0.120133065 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0343058 = queryNorm
                0.19345059 = fieldWeight in 7469, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=7469)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    IEEE transactions on systems, man and cybernetics. 22(1992) no.5, S.865-884
  12. Leroy, G.; Chen, H.: Genescene: an ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts (2005) 0.00
    0.0029049765 = product of:
      0.011619906 = sum of:
        0.011619906 = product of:
          0.023239812 = sum of:
            0.023239812 = weight(_text_:22 in 5259) [ClassicSimilarity], result of:
              0.023239812 = score(doc=5259,freq=2.0), product of:
                0.120133065 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0343058 = queryNorm
                0.19345059 = fieldWeight in 5259, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5259)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 14:26:01
  13. Hu, D.; Kaza, S.; Chen, H.: Identifying significant facilitators of dark network evolution (2009) 0.00
    0.0029049765 = product of:
      0.011619906 = sum of:
        0.011619906 = product of:
          0.023239812 = sum of:
            0.023239812 = weight(_text_:22 in 2753) [ClassicSimilarity], result of:
              0.023239812 = score(doc=2753,freq=2.0), product of:
                0.120133065 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0343058 = queryNorm
                0.19345059 = fieldWeight in 2753, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2753)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2009 18:50:30
  14. Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.00
    0.0028334977 = product of:
      0.011333991 = sum of:
        0.011333991 = product of:
          0.045335963 = sum of:
            0.045335963 = weight(_text_:learning in 4242) [ClassicSimilarity], result of:
              0.045335963 = score(doc=4242,freq=2.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.29598233 = fieldWeight in 4242, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4242)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
  15. Chen, H.; Yim, T.; Fye, D.: Automatic thesaurus generation for an electronic community system (1995) 0.00
    0.002361248 = product of:
      0.009444992 = sum of:
        0.009444992 = product of:
          0.03777997 = sum of:
            0.03777997 = weight(_text_:learning in 2918) [ClassicSimilarity], result of:
              0.03777997 = score(doc=2918,freq=2.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.24665193 = fieldWeight in 2918, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2918)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Reports an algorithmic approach to the automatic generation of thesauri for electronic community systems. The techniques used included terms filtering, automatic indexing, and cluster analysis. The testbed for the research was the Worm Community System, which contains a comprehensive library of specialized community data and literature, currently in use by molecular biologists who study the nematode worm. The resulting worm thesaurus included 2709 researchers' names, 798 gene names, 20 experimental methods, and 4302 subject descriptors. On average, each term had about 90 weighted neighbouring terms indicating relevant concepts. The thesaurus was developed as an online search aide. Tests the worm thesaurus in an experiment with 6 worm researchers of varying degrees of expertise and background. The experiment showed that the thesaurus was an excellent 'memory jogging' device and that it supported learning and serendipitous browsing. Despite some occurrences of obvious noise, the system was useful in suggesting relevant concepts for the researchers' queries and it helped improve concept recall. With a simple browsing interface, an automatic thesaurus can become a useful tool for online search and can assist researchers in exploring and traversing a dynamic and complex electronic community system
  16. Chen, H.; Lally, A.M.; Zhu, B.; Chau, M.: HelpfulMed : Intelligent searching for medical information over the Internet (2003) 0.00
    0.002361248 = product of:
      0.009444992 = sum of:
        0.009444992 = product of:
          0.03777997 = sum of:
            0.03777997 = weight(_text_:learning in 1615) [ClassicSimilarity], result of:
              0.03777997 = score(doc=1615,freq=2.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.24665193 = fieldWeight in 1615, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1615)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Footnote
    Teil eines Themenheftes: "Web retrieval and mining: A machine learning perspective"
  17. Yang, M.; Kiang, M.; Chen, H.; Li, Y.: Artificial immune system for illicit content identification in social media (2012) 0.00
    0.002361248 = product of:
      0.009444992 = sum of:
        0.009444992 = product of:
          0.03777997 = sum of:
            0.03777997 = weight(_text_:learning in 4980) [ClassicSimilarity], result of:
              0.03777997 = score(doc=4980,freq=2.0), product of:
                0.15317118 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0343058 = queryNorm
                0.24665193 = fieldWeight in 4980, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4980)
          0.25 = coord(1/4)
      0.25 = coord(1/4)
    
    Abstract
    Social media is frequently used as a platform for the exchange of information and opinions as well as propaganda dissemination. But online content can be misused for the distribution of illicit information, such as violent postings in web forums. Illicit content is highly distributed in social media, while non-illicit content is unspecific and topically diverse. It is costly and time consuming to label a large amount of illicit content (positive examples) and non-illicit content (negative examples) to train classification systems. Nevertheless, it is relatively easy to obtain large volumes of unlabeled content in social media. In this article, an artificial immune system-based technique is presented to address the difficulties in the illicit content identification in social media. Inspired by the positive selection principle in the immune system, we designed a novel labeling heuristic based on partially supervised learning to extract high-quality positive and negative examples from unlabeled datasets. The empirical evaluation results from two large hate group web forums suggest that our proposed approach generally outperforms the benchmark techniques and exhibits more stable performance.