Search (114 results, page 1 of 6)

  • × language_ss:"e"
  • × theme_ss:"Data Mining"
  • × type_ss:"a"
  1. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.05
    0.048416097 = product of:
      0.09683219 = sum of:
        0.09683219 = sum of:
          0.009471525 = weight(_text_:a in 4577) [ClassicSimilarity], result of:
            0.009471525 = score(doc=4577,freq=2.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.17835285 = fieldWeight in 4577, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.109375 = fieldNorm(doc=4577)
          0.087360665 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
            0.087360665 = score(doc=4577,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.5416616 = fieldWeight in 4577, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.109375 = fieldNorm(doc=4577)
      0.5 = coord(1/2)
    
    Date
    2. 4.2000 18:01:22
    Type
    a
  2. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.03
    0.031588875 = product of:
      0.06317775 = sum of:
        0.06317775 = sum of:
          0.013257373 = weight(_text_:a in 1270) [ClassicSimilarity], result of:
            0.013257373 = score(doc=1270,freq=12.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.24964198 = fieldWeight in 1270, product of:
                3.4641016 = tf(freq=12.0), with freq of:
                  12.0 = termFreq=12.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0625 = fieldNorm(doc=1270)
          0.04992038 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
            0.04992038 = score(doc=1270,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.30952093 = fieldWeight in 1270, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1270)
      0.5 = coord(1/2)
    
    Abstract
    Current algorithms for finding associations among the attributes describing data in a database have a number of shortcomings. Presents a novel method for association generation, that answers all desiderata. The method is different from all existing algorithms and especially suitable to textual databases with binary attributes. Uses subword trees for quick indexing into the required database statistics. Tests the algorithm on the Reuters-22173 database with satisfactory results
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
    Type
    a
  3. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.03
    0.028787265 = product of:
      0.05757453 = sum of:
        0.05757453 = sum of:
          0.007654148 = weight(_text_:a in 1737) [ClassicSimilarity], result of:
            0.007654148 = score(doc=1737,freq=4.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.14413087 = fieldWeight in 1737, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0625 = fieldNorm(doc=1737)
          0.04992038 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
            0.04992038 = score(doc=1737,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.30952093 = fieldWeight in 1737, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1737)
      0.5 = coord(1/2)
    
    Abstract
    Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
    Date
    22.11.1998 18:57:22
    Type
    a
  4. Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.03
    0.026575929 = product of:
      0.053151857 = sum of:
        0.053151857 = sum of:
          0.009471525 = weight(_text_:a in 2908) [ClassicSimilarity], result of:
            0.009471525 = score(doc=2908,freq=8.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.17835285 = fieldWeight in 2908, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2908)
          0.043680333 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
            0.043680333 = score(doc=2908,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.2708308 = fieldWeight in 2908, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2908)
      0.5 = coord(1/2)
    
    Abstract
    Focuses on the information modelling side of conceptual modelling. Deals with the exploitation of fact verbalisations after finishing the actual information system. Verbalisations are used as input for the design of the so-called information model. Exploits these verbalisation in 4 directions: considers their use for a conceptual query language, the verbalisation of instances, the description of the contents of a database and for the verbalisation of queries in a computer supported query environment. Provides an example session with an envisioned tool for end user query formulations that exploits the verbalisation
    Source
    Information systems. 22(1997) nos.5/6, S.349-385
    Type
    a
  5. Fonseca, F.; Marcinkowski, M.; Davis, C.: Cyber-human systems of thought and understanding (2019) 0.02
    0.020074995 = product of:
      0.04014999 = sum of:
        0.04014999 = sum of:
          0.00894975 = weight(_text_:a in 5011) [ClassicSimilarity], result of:
            0.00894975 = score(doc=5011,freq=14.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.1685276 = fieldWeight in 5011, product of:
                3.7416575 = tf(freq=14.0), with freq of:
                  14.0 = termFreq=14.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5011)
          0.03120024 = weight(_text_:22 in 5011) [ClassicSimilarity], result of:
            0.03120024 = score(doc=5011,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.19345059 = fieldWeight in 5011, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5011)
      0.5 = coord(1/2)
    
    Abstract
    The present challenge faced by scientists working with Big Data comes in the overwhelming volume and level of detail provided by current data sets. Exceeding traditional empirical approaches, Big Data opens a new perspective on scientific work in which data comes to play a role in the development of the scientific problematic to be developed. Addressing this reconfiguration of our relationship with data through readings of Wittgenstein, Macherey, and Popper, we propose a picture of science that encourages scientists to engage with the data in a direct way, using the data itself as an instrument for scientific investigation. Using GIS as a theme, we develop the concept of cyber-human systems of thought and understanding to bridge the divide between representative (theoretical) thinking and (non-theoretical) data-driven science. At the foundation of these systems, we invoke the concept of the "semantic pixel" to establish a logical and virtual space linking data and the work of scientists. It is with this discussion of the relationship between analysts in their pursuit of knowledge and the rise of Big Data that this present discussion of the philosophical foundations of Big Data addresses the central questions raised by social informatics research.
    Date
    7. 3.2019 16:32:22
    Type
    a
  6. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.02
    0.01938208 = product of:
      0.03876416 = sum of:
        0.03876416 = sum of:
          0.0075639198 = weight(_text_:a in 668) [ClassicSimilarity], result of:
            0.0075639198 = score(doc=668,freq=10.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.14243183 = fieldWeight in 668, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
          0.03120024 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
            0.03120024 = score(doc=668,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.19345059 = fieldWeight in 668, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=668)
      0.5 = coord(1/2)
    
    Abstract
    20th century massification of higher education and research in academia is said to have produced structurally stratified higher education systems in many countries. Most manifestly, the research mission of universities appears to be divisive. Authors have claimed that the Swedish system, while formally unified, has developed into a binary state, and statistics seem to support this conclusion. This article makes use of a comprehensive statistical data source on Swedish higher education institutions to illustrate stratification, and uses literature on Swedish research policy history to contextualize the statistics. Highlighting the opportunities as well as constraints of the data, the article argues that there is great merit in combining statistics with a qualitative analysis when studying the structural characteristics of national higher education systems. Not least the article shows that it is an over-simplification to describe the Swedish system as binary; the stratification is more complex. On basis of the analysis, the article also argues that while global trends certainly influence national developments, higher education systems have country-specific features that may enrich the understanding of how systems evolve and therefore should be analyzed as part of a broader study of the increasingly globalized academic system.
    Date
    22. 3.2013 19:43:01
    Type
    a
  7. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.02
    0.018982807 = product of:
      0.037965614 = sum of:
        0.037965614 = sum of:
          0.006765375 = weight(_text_:a in 1605) [ClassicSimilarity], result of:
            0.006765375 = score(doc=1605,freq=8.0), product of:
              0.053105544 = queryWeight, product of:
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.046056706 = queryNorm
              0.12739488 = fieldWeight in 1605, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                1.153047 = idf(docFreq=37942, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
          0.03120024 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
            0.03120024 = score(doc=1605,freq=2.0), product of:
              0.16128273 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046056706 = queryNorm
              0.19345059 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(1/2)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
    Type
    a
  8. Fong, A.C.M.: Mining a Web citation database for document clustering (2002) 0.00
    0.00334869 = product of:
      0.00669738 = sum of:
        0.00669738 = product of:
          0.01339476 = sum of:
            0.01339476 = weight(_text_:a in 3940) [ClassicSimilarity], result of:
              0.01339476 = score(doc=3940,freq=4.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.25222903 = fieldWeight in 3940, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3940)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Type
    a
  9. Ebrahimi, M.; ShafieiBavani, E.; Wong, R.; Chen, F.: Twitter user geolocation by filtering of highly mentioned users (2018) 0.00
    0.0032090992 = product of:
      0.0064181983 = sum of:
        0.0064181983 = product of:
          0.012836397 = sum of:
            0.012836397 = weight(_text_:a in 4286) [ClassicSimilarity], result of:
              0.012836397 = score(doc=4286,freq=20.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.24171482 = fieldWeight in 4286, product of:
                  4.472136 = tf(freq=20.0), with freq of:
                    20.0 = termFreq=20.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4286)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Geolocated social media data provide a powerful source of information about places and regional human behavior. Because only a small amount of social media data have been geolocation-annotated, inference techniques play a substantial role to increase the volume of annotated data. Conventional research in this area has been based on the text content of posts from a given user or the social network of the user, with some recent crossovers between the text- and network-based approaches. This paper proposes a novel approach to categorize highly-mentioned users (celebrities) into Local and Global types, and consequently use Local celebrities as location indicators. A label propagation algorithm is then used over the refined social network for geolocation inference. Finally, we propose a hybrid approach by merging a text-based method as a back-off strategy into our network-based approach. Empirical experiments over three standard Twitter benchmark data sets demonstrate that our approach outperforms state-of-the-art user geolocation methods.
    Type
    a
  10. Ekbia, H.; Mattioli, M.; Kouper, I.; Arave, G.; Ghazinejad, A.; Bowman, T.; Suri, V.R.; Tsou, A.; Weingart, S.; Sugimoto, C.R.: Big data, bigger dilemmas : a critical review (2015) 0.00
    0.0031642143 = product of:
      0.0063284286 = sum of:
        0.0063284286 = product of:
          0.012656857 = sum of:
            0.012656857 = weight(_text_:a in 2155) [ClassicSimilarity], result of:
              0.012656857 = score(doc=2155,freq=28.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.23833402 = fieldWeight in 2155, product of:
                  5.2915025 = tf(freq=28.0), with freq of:
                    28.0 = termFreq=28.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2155)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The recent interest in Big Data has generated a broad range of new academic, corporate, and policy practices along with an evolving debate among its proponents, detractors, and skeptics. While the practices draw on a common set of tools, techniques, and technologies, most contributions to the debate come either from a particular disciplinary perspective or with a focus on a domain-specific issue. A close examination of these contributions reveals a set of common problematics that arise in various guises and in different places. It also demonstrates the need for a critical synthesis of the conceptual and practical dilemmas surrounding Big Data. The purpose of this article is to provide such a synthesis by drawing on relevant writings in the sciences, humanities, policy, and trade literature. In bringing these diverse literatures together, we aim to shed light on the common underlying issues that concern and affect all of these areas. By contextualizing the phenomenon of Big Data within larger socioeconomic developments, we also seek to provide a broader understanding of its drivers, barriers, and challenges. This approach allows us to identify attributes of Big Data that require more attention-autonomy, opacity, generativity, disparity, and futurity-leading to questions and ideas for moving beyond dilemmas.
    Type
    a
  11. Bell, D.A.; Guan, J.W.: Computational methods for rough classification and discovery (1998) 0.00
    0.0031324127 = product of:
      0.0062648254 = sum of:
        0.0062648254 = product of:
          0.012529651 = sum of:
            0.012529651 = weight(_text_:a in 2909) [ClassicSimilarity], result of:
              0.012529651 = score(doc=2909,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.23593865 = fieldWeight in 2909, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2909)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Rough set theory is a mathematical tool to deal with vagueness and uncertainty. To apply the theory, it needs to be associated with efficient and effective computational methods. A relation can be used to represent a decison table for use in decision making. By using this kind of table, rough set theory can be applied successfully to rough classification and knowledge discovery. Presents computational methods for using rough sets to identify classes in datasets, finding dependencies in relations, and discovering rules which are hidden in databases. Illustrates the methods with a running example from a database of car test results
    Footnote
    Contribution to a special issue devoted to knowledge discovery and data mining
    Type
    a
  12. Liu, W.; Weichselbraun, A.; Scharl, A.; Chang, E.: Semi-automatic ontology extension using spreading activation (2005) 0.00
    0.0031324127 = product of:
      0.0062648254 = sum of:
        0.0062648254 = product of:
          0.012529651 = sum of:
            0.012529651 = weight(_text_:a in 3028) [ClassicSimilarity], result of:
              0.012529651 = score(doc=3028,freq=14.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.23593865 = fieldWeight in 3028, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3028)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper describes a system to semi-automatically extend and refine ontologies by mining textual data from the Web sites of international online media. Expanding a seed ontology creates a semantic network through co-occurrence analysis, trigger phrase analysis, and disambiguation based on the WordNet lexical dictionary. Spreading activation then processes this semantic network to find the most probable candidates for inclusion in an extended ontology. Approaches to identifying hierarchical relationships such as subsumption, head noun analysis and WordNet consultation are used to confirm and classify the found relationships. Using a seed ontology on "climate change" as an example, this paper demonstrates how spreading activation improves the result by naturally integrating the mentioned methods.
    Type
    a
  13. Pons-Porrata, A.; Berlanga-Llavori, R.; Ruiz-Shulcloper, J.: Topic discovery based on text mining techniques (2007) 0.00
    0.0030444188 = product of:
      0.0060888375 = sum of:
        0.0060888375 = product of:
          0.012177675 = sum of:
            0.012177675 = weight(_text_:a in 916) [ClassicSimilarity], result of:
              0.012177675 = score(doc=916,freq=18.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22931081 = fieldWeight in 916, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=916)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper, we present a topic discovery system aimed to reveal the implicit knowledge present in news streams. This knowledge is expressed as a hierarchy of topic/subtopics, where each topic contains the set of documents that are related to it and a summary extracted from these documents. Summaries so built are useful to browse and select topics of interest from the generated hierarchies. Our proposal consists of a new incremental hierarchical clustering algorithm, which combines both partitional and agglomerative approaches, taking the main benefits from them. Finally, a new summarization method based on Testor Theory has been proposed to build the topic summaries. Experimental results in the TDT2 collection demonstrate its usefulness and effectiveness not only as a topic detection system, but also as a classification and summarization tool.
    Type
    a
  14. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.00
    0.0030444188 = product of:
      0.0060888375 = sum of:
        0.0060888375 = product of:
          0.012177675 = sum of:
            0.012177675 = weight(_text_:a in 4226) [ClassicSimilarity], result of:
              0.012177675 = score(doc=4226,freq=18.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.22931081 = fieldWeight in 4226, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4226)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
    Type
    a
  15. Wong, S.K.M.; Butz, C.J.; Xiang, X.: Automated database schema design using mined data dependencies (1998) 0.00
    0.0029000505 = product of:
      0.005800101 = sum of:
        0.005800101 = product of:
          0.011600202 = sum of:
            0.011600202 = weight(_text_:a in 2897) [ClassicSimilarity], result of:
              0.011600202 = score(doc=2897,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21843673 = fieldWeight in 2897, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2897)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Data dependencies are used in database schema design to enforce the correctness of a database as well as to reduce redundant data. These dependencies are usually determined from the semantics of the attributes and are then enforced upon the relations. Describes a bottom-up procedure for discovering multivalued dependencies in observed data without knowing a priori the relationships among the attributes. The proposed algorithm is an application of the technique designed for learning conditional independencies in probabilistic reasoning. A prototype system for automated database schema design has been implemented. Experiments were carried out to demonstrate both the effectiveness and efficiency of the method
    Footnote
    Contribution to a special issue devoted to knowledge discovery and data mining
    Type
    a
  16. Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.00
    0.0029000505 = product of:
      0.005800101 = sum of:
        0.005800101 = product of:
          0.011600202 = sum of:
            0.011600202 = weight(_text_:a in 2338) [ClassicSimilarity], result of:
              0.011600202 = score(doc=2338,freq=12.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21843673 = fieldWeight in 2338, product of:
                  3.4641016 = tf(freq=12.0), with freq of:
                    12.0 = termFreq=12.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.
    Type
    a
  17. Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 602) [ClassicSimilarity], result of:
              0.011481222 = score(doc=602,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 602, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=602)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
    Type
    a
  18. Kraker, P.; Kittel, C,; Enkhbayar, A.: Open Knowledge Maps : creating a visual interface to the world's scientific knowledge based on natural language processing (2016) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 3205) [ClassicSimilarity], result of:
              0.011481222 = score(doc=3205,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 3205, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3205)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The goal of Open Knowledge Maps is to create a visual interface to the world's scientific knowledge. The base for this visual interface consists of so-called knowledge maps, which enable the exploration of existing knowledge and the discovery of new knowledge. Our open source knowledge mapping software applies a mixture of summarization techniques and similarity measures on article metadata, which are iteratively chained together. After processing, the representation is saved in a database for use in a web visualization. In the future, we want to create a space for collective knowledge mapping that brings together individuals and communities involved in exploration and discovery. We want to enable people to guide each other in their discovery by collaboratively annotating and modifying the automatically created maps.
    Type
    a
  19. Maaten, L. van den; Hinton, G.: Visualizing non-metric similarities in multiple maps (2012) 0.00
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = product of:
          0.011481222 = sum of:
            0.011481222 = weight(_text_:a in 3884) [ClassicSimilarity], result of:
              0.011481222 = score(doc=3884,freq=16.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.2161963 = fieldWeight in 3884, product of:
                  4.0 = tf(freq=16.0), with freq of:
                    16.0 = termFreq=16.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3884)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize "central" objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.
    Type
    a
  20. Nicholson, S.: Bibliomining for automated collection development in a digital library setting : using data mining to discover Web-based scholarly research works (2003) 0.00
    0.0028047764 = product of:
      0.005609553 = sum of:
        0.005609553 = product of:
          0.011219106 = sum of:
            0.011219106 = weight(_text_:a in 1867) [ClassicSimilarity], result of:
              0.011219106 = score(doc=1867,freq=22.0), product of:
                0.053105544 = queryWeight, product of:
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.046056706 = queryNorm
                0.21126054 = fieldWeight in 1867, product of:
                  4.690416 = tf(freq=22.0), with freq of:
                    22.0 = termFreq=22.0
                  1.153047 = idf(docFreq=37942, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1867)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This research creates an intelligent agent for automated collection development in a digital library setting. It uses a predictive model based an facets of each Web page to select scholarly works. The criteria came from the academic library selection literature, and a Delphi study was used to refine the list to 41 criteria. A Perl program was designed to analyze a Web page for each criterion and applied to a large collection of scholarly and nonscholarly Web pages. Bibliomining, or data mining for libraries, was then used to create different classification models. Four techniques were used: logistic regression, nonparametric discriminant analysis, classification trees, and neural networks. Accuracy and return were used to judge the effectiveness of each model an test datasets. In addition, a set of problematic pages that were difficult to classify because of their similarity to scholarly research was gathered and classified using the models. The resulting models could be used in the selection process to automatically create a digital library of Webbased scholarly research works. In addition, the technique can be extended to create a digital library of any type of structured electronic information.
    Type
    a

Years

Classifications