Search (30 results, page 1 of 2)

  • × theme_ss:"Data Mining"
  1. Lam, W.; Yang, C.C.; Menczer, F.: Introduction to the special topic section on mining Web resources for enhancing information retrieval (2007) 0.04
    0.042164885 = product of:
      0.16865954 = sum of:
        0.16865954 = weight(_text_:sites in 600) [ClassicSimilarity], result of:
          0.16865954 = score(doc=600,freq=6.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.7002758 = fieldWeight in 600, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=600)
      0.25 = coord(1/4)
    
    Abstract
    The amount of information on the Web has been expanding at an enormous pace. There are a variety of Web documents in different genres, such as news, reports, reviews. Traditionally, the information displayed on Web sites has been static. Recently, there are many Web sites offering content that is dynamically generated and frequently updated. It is also common for Web sites to contain information in different languages since many countries adopt more than one language. Moreover, content may exist in multimedia formats including text, images, video, and audio.
  2. Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.03
    0.029509272 = product of:
      0.11803709 = sum of:
        0.11803709 = weight(_text_:sites in 602) [ClassicSimilarity], result of:
          0.11803709 = score(doc=602,freq=4.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.49009097 = fieldWeight in 602, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=602)
      0.25 = coord(1/4)
    
    Abstract
    We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
  3. Thelwall, M.; Wilkinson, D.: Public dialogs in social network sites : What is their purpose? (2010) 0.03
    0.029509272 = product of:
      0.11803709 = sum of:
        0.11803709 = weight(_text_:sites in 3327) [ClassicSimilarity], result of:
          0.11803709 = score(doc=3327,freq=4.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.49009097 = fieldWeight in 3327, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=3327)
      0.25 = coord(1/4)
    
    Abstract
    Social network sites (SNSs) such as MySpace and Facebook are important venues for interpersonal communication, especially among youth. One way in which members can communicate is to write public messages on each other's profile, but how is this unusual means of communication used in practice? An analysis of 2,293 public comment exchanges extracted from large samples of U.S. and U.K. MySpace members found them to be relatively rapid, but rarely used for prolonged exchanges. They seem to fulfill two purposes: making initial contact and keeping in touch occasionally such as at birthdays and other important dates. Although about half of the dialogs seem to exchange some gossip, the dialogs seem typically too short to play the role of gossip-based social grooming for typical pairs of Friends, but close Friends may still communicate extensively in SNSs with other methods.
  4. Search tools (1997) 0.02
    0.024343908 = product of:
      0.09737563 = sum of:
        0.09737563 = weight(_text_:sites in 3834) [ClassicSimilarity], result of:
          0.09737563 = score(doc=3834,freq=2.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.40430441 = fieldWeight in 3834, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3834)
      0.25 = coord(1/4)
    
    Abstract
    Offers brief accounts of Internet search tools. Covers the Lycos revamp; the new navigation service produced jointly by Excite and Netscape, delivering a language specific, locally relevant Web guide for Japan, Germany, France, the UK and Australia; InfoWatcher, a combination offline browser, search engine and push product from Carvelle Inc., USA; Alexa by Alexa Internet and WBI from IBM which are free and provide users with information on how others have used the Web sites which they are visiting; and Concept Explorer from Knowledge Discovery Systems, Inc., California which performs data mining from the Web, Usenet groups, MEDLINE and the US Patent and Trademark Office patent abstracts
  5. Liu, W.; Weichselbraun, A.; Scharl, A.; Chang, E.: Semi-automatic ontology extension using spreading activation (2005) 0.02
    0.024343908 = product of:
      0.09737563 = sum of:
        0.09737563 = weight(_text_:sites in 3028) [ClassicSimilarity], result of:
          0.09737563 = score(doc=3028,freq=2.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.40430441 = fieldWeight in 3028, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3028)
      0.25 = coord(1/4)
    
    Abstract
    This paper describes a system to semi-automatically extend and refine ontologies by mining textual data from the Web sites of international online media. Expanding a seed ontology creates a semantic network through co-occurrence analysis, trigger phrase analysis, and disambiguation based on the WordNet lexical dictionary. Spreading activation then processes this semantic network to find the most probable candidates for inclusion in an extended ontology. Approaches to identifying hierarchical relationships such as subsumption, head noun analysis and WordNet consultation are used to confirm and classify the found relationships. Using a seed ontology on "climate change" as an example, this paper demonstrates how spreading activation improves the result by naturally integrating the mentioned methods.
  6. Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.02
    0.023516573 = product of:
      0.09406629 = sum of:
        0.09406629 = sum of:
          0.050371516 = weight(_text_:design in 2908) [ClassicSimilarity], result of:
            0.050371516 = score(doc=2908,freq=2.0), product of:
              0.17322445 = queryWeight, product of:
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.046071928 = queryNorm
              0.29078758 = fieldWeight in 2908, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.7598698 = idf(docFreq=2798, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2908)
          0.04369477 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
            0.04369477 = score(doc=2908,freq=2.0), product of:
              0.16133605 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046071928 = queryNorm
              0.2708308 = fieldWeight in 2908, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2908)
      0.25 = coord(1/4)
    
    Abstract
    Focuses on the information modelling side of conceptual modelling. Deals with the exploitation of fact verbalisations after finishing the actual information system. Verbalisations are used as input for the design of the so-called information model. Exploits these verbalisation in 4 directions: considers their use for a conceptual query language, the verbalisation of instances, the description of the contents of a database and for the verbalisation of queries in a computer supported query environment. Provides an example session with an envisioned tool for end user query formulations that exploits the verbalisation
    Source
    Information systems. 22(1997) nos.5/6, S.349-385
  7. Thelwall, M.; Wilkinson, D.; Uppal, S.: Data mining emotion in social network communication : gender differences in MySpace (2009) 0.02
    0.020866206 = product of:
      0.08346482 = sum of:
        0.08346482 = weight(_text_:sites in 3322) [ClassicSimilarity], result of:
          0.08346482 = score(doc=3322,freq=2.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.34654665 = fieldWeight in 3322, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046875 = fieldNorm(doc=3322)
      0.25 = coord(1/4)
    
    Abstract
    Despite the rapid growth in social network sites and in data mining for emotion (sentiment analysis), little research has tied the two together, and none has had social science goals. This article examines the extent to which emotion is present in MySpace comments, using a combination of data mining and content analysis, and exploring age and gender. A random sample of 819 public comments to or from U.S. users was manually classified for strength of positive and negative emotion. Two thirds of the comments expressed positive emotion, but a minority (20%) contained negative emotion, confirming that MySpace is an extraordinarily emotion-rich environment. Females are likely to give and receive more positive comments than are males, but there is no difference for negative comments. It is thus possible that females are more successful social network site users partly because of their greater ability to textually harness positive affect.
  8. Li, D.; Tang, J.; Ding, Y.; Shuai, X.; Chambers, T.; Sun, G.; Luo, Z.; Zhang, J.: Topic-level opinion influence model (TOIM) : an investigation using tencent microblogging (2015) 0.02
    0.017388504 = product of:
      0.069554016 = sum of:
        0.069554016 = weight(_text_:sites in 2345) [ClassicSimilarity], result of:
          0.069554016 = score(doc=2345,freq=2.0), product of:
            0.2408473 = queryWeight, product of:
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.046071928 = queryNorm
            0.28878886 = fieldWeight in 2345, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.227637 = idf(docFreq=644, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2345)
      0.25 = coord(1/4)
    
    Abstract
    Text mining has been widely used in multiple types of user-generated data to infer user opinion, but its application to microblogging is difficult because text messages are short and noisy, providing limited information about user opinion. Given that microblogging users communicate with each other to form a social network, we hypothesize that user opinion is influenced by its neighbors in the network. In this paper, we infer user opinion on a topic by combining two factors: the user's historical opinion about relevant topics and opinion influence from his/her neighbors. We thus build a topic-level opinion influence model (TOIM) by integrating both topic factor and opinion influence factor into a unified probabilistic model. We evaluate our model in one of the largest microblogging sites in China, Tencent Weibo, and the experiments show that TOIM outperforms baseline methods in opinion inference accuracy. Moreover, incorporating indirect influence further improves inference recall and f1-measure. Finally, we demonstrate some useful applications of TOIM in analyzing users' behaviors in Tencent Weibo.
  9. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.01
    0.010923693 = product of:
      0.04369477 = sum of:
        0.04369477 = product of:
          0.08738954 = sum of:
            0.08738954 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.08738954 = score(doc=4577,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    2. 4.2000 18:01:22
  10. Wong, S.K.M.; Butz, C.J.; Xiang, X.: Automated database schema design using mined data dependencies (1998) 0.01
    0.010905754 = product of:
      0.043623015 = sum of:
        0.043623015 = product of:
          0.08724603 = sum of:
            0.08724603 = weight(_text_:design in 2897) [ClassicSimilarity], result of:
              0.08724603 = score(doc=2897,freq=6.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.5036589 = fieldWeight in 2897, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2897)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Data dependencies are used in database schema design to enforce the correctness of a database as well as to reduce redundant data. These dependencies are usually determined from the semantics of the attributes and are then enforced upon the relations. Describes a bottom-up procedure for discovering multivalued dependencies in observed data without knowing a priori the relationships among the attributes. The proposed algorithm is an application of the technique designed for learning conditional independencies in probabilistic reasoning. A prototype system for automated database schema design has been implemented. Experiments were carried out to demonstrate both the effectiveness and efficiency of the method
  11. KDD : techniques and applications (1998) 0.01
    0.009363165 = product of:
      0.03745266 = sum of:
        0.03745266 = product of:
          0.07490532 = sum of:
            0.07490532 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.07490532 = score(doc=6783,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  12. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.01
    0.0062421104 = product of:
      0.024968442 = sum of:
        0.024968442 = product of:
          0.049936883 = sum of:
            0.049936883 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.049936883 = score(doc=1737,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22.11.1998 18:57:22
  13. Lusti, M.: Data Warehousing and Data Mining : Eine Einführung in entscheidungsunterstützende Systeme (1999) 0.01
    0.0062421104 = product of:
      0.024968442 = sum of:
        0.024968442 = product of:
          0.049936883 = sum of:
            0.049936883 = weight(_text_:22 in 4261) [ClassicSimilarity], result of:
              0.049936883 = score(doc=4261,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.30952093 = fieldWeight in 4261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4261)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    17. 7.2002 19:22:06
  14. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01
    0.0062421104 = product of:
      0.024968442 = sum of:
        0.024968442 = product of:
          0.049936883 = sum of:
            0.049936883 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
              0.049936883 = score(doc=1270,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.30952093 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
  15. Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 4242) [ClassicSimilarity], result of:
              0.04317559 = score(doc=4242,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 4242, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4242)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
  16. Dang, X.H.; Ong. K.-L.: Knowledge discovery in data streams (2009) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 3829) [ClassicSimilarity], result of:
              0.04317559 = score(doc=3829,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 3829, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3829)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Knowing what to do with the massive amount of data collected has always been an ongoing issue for many organizations. While data mining has been touted to be the solution, it has failed to deliver the impact despite its successes in many areas. One reason is that data mining algorithms were not designed for the real world, i.e., they usually assume a static view of the data and a stable execution environment where resourcesare abundant. The reality however is that data are constantly changing and the execution environment is dynamic. Hence, it becomes difficult for data mining to truly deliver timely and relevant results. Recently, the processing of stream data has received many attention. What is interesting is that the methodology to design stream-based algorithms may well be the solution to the above problem. In this entry, we discuss this issue and present an overview of recent works.
  17. Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.01
    0.0053969487 = product of:
      0.021587795 = sum of:
        0.021587795 = product of:
          0.04317559 = sum of:
            0.04317559 = weight(_text_:design in 521) [ClassicSimilarity], result of:
              0.04317559 = score(doc=521,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.24924651 = fieldWeight in 521, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046875 = fieldNorm(doc=521)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.
  18. Lackes, R.; Tillmanns, C.: Data Mining für die Unternehmenspraxis : Entscheidungshilfen und Fallstudien mit führenden Softwarelösungen (2006) 0.00
    0.0046815826 = product of:
      0.01872633 = sum of:
        0.01872633 = product of:
          0.03745266 = sum of:
            0.03745266 = weight(_text_:22 in 1383) [ClassicSimilarity], result of:
              0.03745266 = score(doc=1383,freq=2.0), product of:
                0.16133605 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046071928 = queryNorm
                0.23214069 = fieldWeight in 1383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1383)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2008 14:46:06
  19. Lihui, C.; Lian, C.W.: Using Web structure and summarisation techniques for Web content mining (2005) 0.00
    0.0044974573 = product of:
      0.01798983 = sum of:
        0.01798983 = product of:
          0.03597966 = sum of:
            0.03597966 = weight(_text_:design in 1046) [ClassicSimilarity], result of:
              0.03597966 = score(doc=1046,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.20770542 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a "Tree-like" Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.
  20. Chen, C.-C.; Chen, A.-P.: Using data mining technology to provide a recommendation service in the digital library (2007) 0.00
    0.0044974573 = product of:
      0.01798983 = sum of:
        0.01798983 = product of:
          0.03597966 = sum of:
            0.03597966 = weight(_text_:design in 2533) [ClassicSimilarity], result of:
              0.03597966 = score(doc=2533,freq=2.0), product of:
                0.17322445 = queryWeight, product of:
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.046071928 = queryNorm
                0.20770542 = fieldWeight in 2533, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.7598698 = idf(docFreq=2798, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2533)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - Since library storage has been increasing day by day, it is difficult for readers to find the books which interest them as well as representative booklists. How to utilize meaningful information effectively to improve the service quality of the digital library appears to be very important. The purpose of this paper is to provide a recommendation system architecture to promote digital library services in electronic libraries. Design/methodology/approach - In the proposed architecture, a two-phase data mining process used by association rule and clustering methods is designed to generate a recommendation system. The process considers not only the relationship of a cluster of users but also the associations among the information accessed. Findings - The process considered not only the relationship of a cluster of users but also the associations among the information accessed. With the advanced filter, the recommendation supported by the proposed system architecture would be closely served to meet users' needs. Originality/value - This paper not only constructs a recommendation service for readers to search books from the web but takes the initiative in finding the most suitable books for readers as well. Furthermore, library managers are expected to purchase core and hot books from a limited budget to maintain and satisfy the requirements of readers along with promoting digital library services.

Languages

  • e 23
  • d 7

Types