Search (109 results, page 2 of 6)

  • × language_ss:"e"
  • × theme_ss:"Data Mining"
  1. Blake, C.: Text mining (2011) 0.00
    4.604387E-4 = product of:
      0.00690658 = sum of:
        0.00690658 = product of:
          0.01381316 = sum of:
            0.01381316 = weight(_text_:information in 1599) [ClassicSimilarity], result of:
              0.01381316 = score(doc=1599,freq=2.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.27153665 = fieldWeight in 1599, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1599)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Source
    Annual review of information science and technology. 45(2011) no.1, S.121-155
  2. Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.00
    4.4124527E-4 = product of:
      0.0066186786 = sum of:
        0.0066186786 = product of:
          0.013237357 = sum of:
            0.013237357 = weight(_text_:information in 4242) [ClassicSimilarity], result of:
              0.013237357 = score(doc=4242,freq=10.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.2602176 = fieldWeight in 4242, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4242)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
    Source
    Annual review of information science and technology. 38(2004), S.289-330
  3. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.00
    4.3507366E-4 = product of:
      0.0065261046 = sum of:
        0.0065261046 = product of:
          0.013052209 = sum of:
            0.013052209 = weight(_text_:information in 605) [ClassicSimilarity], result of:
              0.013052209 = score(doc=605,freq=14.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.256578 = fieldWeight in 605, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=605)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1838-1850
  4. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.00
    4.0279995E-4 = product of:
      0.006041999 = sum of:
        0.006041999 = product of:
          0.012083998 = sum of:
            0.012083998 = weight(_text_:information in 3237) [ClassicSimilarity], result of:
              0.012083998 = score(doc=3237,freq=12.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23754507 = fieldWeight in 3237, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3237)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    In this article we present a semi-supervised active learning algorithm for pattern discovery in information extraction from textual data. The patterns are reduced regular expressions composed of various characteristics of features useful in information extraction. Our major contribution is a semi-supervised learning algorithm that extracts information from a set of examples labeled as relevant or irrelevant to a given attribute. The approach is semi-supervised because it does not require precise labeling of the exact location of features in the training data. This significantly reduces the effort needed to develop a training set. An active learning algorithm is used to assist the semi-supervised learning algorithm to further reduce the training set development effort. The active learning algorithm is seeded with a Single positive example of a given attribute. The context of the seed is used to automatically identify candidates for additional positive examples of the given attribute. Candidate examples are manually pruned during the active learning phase, and our semi-supervised learning algorithm automatically discovers reduced regular expressions for each attribute. We have successfully applied this learning technique in the extraction of textual features from police incident reports, university crime reports, and patents. The performance of our algorithm compares favorably with competitive extraction systems being used in criminal justice information systems.
    Source
    Journal of the American Society for Information Science and Technology. 56(2005) no.3, S.258-271
  5. O'Brien, H.L.; Lebow, M.: Mixed-methods approach to measuring user experience in online news interactions (2013) 0.00
    4.0279995E-4 = product of:
      0.006041999 = sum of:
        0.006041999 = product of:
          0.012083998 = sum of:
            0.012083998 = weight(_text_:information in 1001) [ClassicSimilarity], result of:
              0.012083998 = score(doc=1001,freq=12.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23754507 = fieldWeight in 1001, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1001)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.8, S.1543-1556
  6. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.00
    3.987516E-4 = product of:
      0.005981274 = sum of:
        0.005981274 = product of:
          0.011962548 = sum of:
            0.011962548 = weight(_text_:information in 601) [ClassicSimilarity], result of:
              0.011962548 = score(doc=601,freq=6.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23515764 = fieldWeight in 601, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=601)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1793-1804
  7. Zhou, L.; Chaovalit, P.: Ontology-supported polarity mining (2008) 0.00
    3.987516E-4 = product of:
      0.005981274 = sum of:
        0.005981274 = product of:
          0.011962548 = sum of:
            0.011962548 = weight(_text_:information in 1343) [ClassicSimilarity], result of:
              0.011962548 = score(doc=1343,freq=6.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23515764 = fieldWeight in 1343, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1343)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Polarity mining provides an in-depth analysis of semantic orientations of text information. Motivated by its success in the area of topic mining, we propose an ontology-supported polarity mining (OSPM) approach. The approach aims to enhance polarity mining with ontology by providing detailed topic-specific information. OSPM was evaluated in the movie review domain using both supervised and unsupervised techniques. Results revealed that OSPM outperformed the baseline method without ontology support. The findings of this study not only advance the state of polarity mining research but also shed light on future research directions.
    Source
    Journal of the American Society for Information Science and Technology. 59(2008) no.1, S.98-110
  8. Knowledge discovery and data mining (1998) 0.00
    3.9466174E-4 = product of:
      0.005919926 = sum of:
        0.005919926 = product of:
          0.011839852 = sum of:
            0.011839852 = weight(_text_:information in 2898) [ClassicSimilarity], result of:
              0.011839852 = score(doc=2898,freq=2.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23274569 = fieldWeight in 2898, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2898)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Source
    Journal of the American Society for Information Science. 49(1998) no.5, S.397-470
  9. Intelligent information processing and web mining : Proceedings of the International IIS: IIPWM'03 Conference held in Zakopane, Poland, June 2-5, 2003 (2003) 0.00
    3.9466174E-4 = product of:
      0.005919926 = sum of:
        0.005919926 = product of:
          0.011839852 = sum of:
            0.011839852 = weight(_text_:information in 4642) [ClassicSimilarity], result of:
              0.011839852 = score(doc=4642,freq=2.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23274569 = fieldWeight in 4642, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4642)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
  10. Gaizauskas, R.; Wilks, Y.: Information extraction : beyond document retrieval (1998) 0.00
    3.9466174E-4 = product of:
      0.005919926 = sum of:
        0.005919926 = product of:
          0.011839852 = sum of:
            0.011839852 = weight(_text_:information in 4716) [ClassicSimilarity], result of:
              0.011839852 = score(doc=4716,freq=8.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23274569 = fieldWeight in 4716, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4716)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    In this paper we give a synoptic view of the growth of the text processing technology of informatione xtraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining
  11. Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.00
    3.9466174E-4 = product of:
      0.005919926 = sum of:
        0.005919926 = product of:
          0.011839852 = sum of:
            0.011839852 = weight(_text_:information in 602) [ClassicSimilarity], result of:
              0.011839852 = score(doc=602,freq=8.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.23274569 = fieldWeight in 602, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=602)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1805-1819
  12. Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.00
    3.6770437E-4 = product of:
      0.005515565 = sum of:
        0.005515565 = product of:
          0.01103113 = sum of:
            0.01103113 = weight(_text_:information in 597) [ClassicSimilarity], result of:
              0.01103113 = score(doc=597,freq=10.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.21684799 = fieldWeight in 597, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=597)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    With the overwhelming volume of information, the task of finding relevant information on a given topic on the Web is becoming increasingly difficult. Web search engines hence become one of the most popular solutions available on the Web. However, it has never been easy for novice users to organize and represent their information needs using simple queries. Users have to keep modifying their input queries until they get expected results. Therefore, it is often desirable for search engines to give suggestions on related queries to users. Besides, by identifying those related queries, search engines can potentially perform optimizations on their systems, such as query expansion and file indexing. In this work we propose a method that suggests a list of related queries given an initial input query. The related queries are based in the query log of previously submitted queries by human users, which can be identified using an enhanced model of association rules. Users can utilize the suggested related queries to tune or redirect the search process. Our method not only discovers the related queries, but also ranks them according to the degree of their relatedness. Unlike many other rival techniques, it also performs reasonably well on less frequent input queries.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1871-1883
  13. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.00
    3.6770437E-4 = product of:
      0.005515565 = sum of:
        0.005515565 = product of:
          0.01103113 = sum of:
            0.01103113 = weight(_text_:information in 4367) [ClassicSimilarity], result of:
              0.01103113 = score(doc=4367,freq=10.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.21684799 = fieldWeight in 4367, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4367)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.
    Source
    Journal of the American Society for Information Science and Technology. 62(2011) no.4, S.727-737
  14. Jones, K.M.L.; Rubel, A.; LeClere, E.: ¬A matter of trust : higher education institutions as information fiduciaries in an age of educational data mining and learning analytics (2020) 0.00
    3.6770437E-4 = product of:
      0.005515565 = sum of:
        0.005515565 = product of:
          0.01103113 = sum of:
            0.01103113 = weight(_text_:information in 5968) [ClassicSimilarity], result of:
              0.01103113 = score(doc=5968,freq=10.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.21684799 = fieldWeight in 5968, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5968)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Higher education institutions are mining and analyzing student data to effect educational, political, and managerial outcomes. Done under the banner of "learning analytics," this work can-and often does-surface sensitive data and information about, inter alia, a student's demographics, academic performance, offline and online movements, physical fitness, mental wellbeing, and social network. With these data, institutions and third parties are able to describe student life, predict future behaviors, and intervene to address academic or other barriers to student success (however defined). Learning analytics, consequently, raise serious issues concerning student privacy, autonomy, and the appropriate flow of student data. We argue that issues around privacy lead to valid questions about the degree to which students should trust their institution to use learning analytics data and other artifacts (algorithms, predictive scores) with their interests in mind. We argue that higher education institutions are paradigms of information fiduciaries. As such, colleges and universities have a special responsibility to their students. In this article, we use the information fiduciary concept to analyze cases when learning analytics violate an institution's responsibility to its students.
    Source
    Journal of the Association for Information Science and Technology. 71(2020) no.10, S.1227-1241
  15. Benoit, G.: Data mining (2002) 0.00
    3.4178712E-4 = product of:
      0.0051268064 = sum of:
        0.0051268064 = product of:
          0.010253613 = sum of:
            0.010253613 = weight(_text_:information in 4296) [ClassicSimilarity], result of:
              0.010253613 = score(doc=4296,freq=6.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.20156369 = fieldWeight in 4296, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4296)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Data mining (DM) is a multistaged process of extracting previously unanticipated knowledge from large databases, and applying the results to decision making. Data mining tools detect patterns from the data and infer associations and rules from them. The extracted information may then be applied to prediction or classification models by identifying relations within the data records or between databases. Those patterns and rules can then guide decision making and forecast the effects of those decisions. However, this definition may be applied equally to "knowledge discovery in databases" (KDD). Indeed, in the recent literature of DM and KDD, a source of confusion has emerged, making it difficult to determine the exact parameters of both. KDD is sometimes viewed as the broader discipline, of which data mining is merely a component-specifically pattern extraction, evaluation, and cleansing methods (Raghavan, Deogun, & Sever, 1998, p. 397). Thurasingham (1999, p. 2) remarked that "knowledge discovery," "pattern discovery," "data dredging," "information extraction," and "knowledge mining" are all employed as synonyms for DM. Trybula, in his ARIST chapter an text mining, observed that the "existing work [in KDD] is confusing because the terminology is inconsistent and poorly defined.
    Source
    Annual review of information science and technology. 36(2002), S.265-312
  16. Li, J.; Zhang, P.; Cao, J.: External concept support for group support systems through Web mining (2009) 0.00
    3.4178712E-4 = product of:
      0.0051268064 = sum of:
        0.0051268064 = product of:
          0.010253613 = sum of:
            0.010253613 = weight(_text_:information in 2806) [ClassicSimilarity], result of:
              0.010253613 = score(doc=2806,freq=6.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.20156369 = fieldWeight in 2806, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2806)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    External information plays an important role in group decision-making processes, yet research about external information support for Group Support Systems (GSS) has been lacking. In this study, we propose an approach to build a concept space to provide external concept support for GSS users. Built on a Web mining algorithm, the approach can mine a concept space from the Web and retrieve related concepts from the concept space based on users' comments in a real-time manner. We conduct two experiments to evaluate the quality of the proposed approach and the effectiveness of the external concept support provided by this approach. The experiment results indicate that the concept space mined from the Web contained qualified concepts to stimulate divergent thinking. The results also demonstrate that external concept support in GSS greatly enhanced group productivity for idea generation tasks.
    Source
    Journal of the American Society for Information Science and Technology. 60(2009) no.5, S.1057-1070
  17. Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.00
    3.4178712E-4 = product of:
      0.0051268064 = sum of:
        0.0051268064 = product of:
          0.010253613 = sum of:
            0.010253613 = weight(_text_:information in 92) [ClassicSimilarity], result of:
              0.010253613 = score(doc=92,freq=6.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.20156369 = fieldWeight in 92, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=92)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
    Source
    Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a
  18. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.00
    3.4178712E-4 = product of:
      0.0051268064 = sum of:
        0.0051268064 = product of:
          0.010253613 = sum of:
            0.010253613 = weight(_text_:information in 1326) [ClassicSimilarity], result of:
              0.010253613 = score(doc=1326,freq=6.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.20156369 = fieldWeight in 1326, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1326)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1597-1614
  19. Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.00
    3.2888478E-4 = product of:
      0.0049332716 = sum of:
        0.0049332716 = product of:
          0.009866543 = sum of:
            0.009866543 = weight(_text_:information in 607) [ClassicSimilarity], result of:
              0.009866543 = score(doc=607,freq=8.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.19395474 = fieldWeight in 607, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=607)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
    Source
    Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1884-1898
  20. Wei, C.-P.; Lee, Y.-H.; Chiang, Y.-S.; Chen, C.-T.; Yang, C.C.C.: Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora (2014) 0.00
    3.2888478E-4 = product of:
      0.0049332716 = sum of:
        0.0049332716 = product of:
          0.009866543 = sum of:
            0.009866543 = weight(_text_:information in 1225) [ClassicSimilarity], result of:
              0.009866543 = score(doc=1225,freq=8.0), product of:
                0.050870337 = queryWeight, product of:
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.028978055 = queryNorm
                0.19395474 = fieldWeight in 1225, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  1.7554779 = idf(docFreq=20772, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1225)
          0.5 = coord(1/2)
      0.06666667 = coord(1/15)
    
    Abstract
    An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempo (TF×IDFTempo) and TF×Enhanced-IDFTempo, and develop a temporal-based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.3, S.621-634

Years

Types

  • a 96
  • m 11
  • s 9
  • el 2
  • More… Less…