Search (44 results, page 1 of 3)

  • × theme_ss:"Data Mining"
  1. KDD : techniques and applications (1998) 0.09
    0.09336738 = product of:
      0.37346953 = sum of:
        0.34125552 = weight(_text_:pacific in 6783) [ClassicSimilarity], result of:
          0.34125552 = score(doc=6783,freq=2.0), product of:
            0.3193714 = queryWeight, product of:
              8.059301 = idf(docFreq=37, maxDocs=44218)
              0.03962768 = queryNorm
            1.0685225 = fieldWeight in 6783, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.059301 = idf(docFreq=37, maxDocs=44218)
              0.09375 = fieldNorm(doc=6783)
        0.03221402 = product of:
          0.06442804 = sum of:
            0.06442804 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.06442804 = score(doc=6783,freq=2.0), product of:
                0.13876937 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03962768 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  2. Methodologies for knowledge discovery and data mining : Third Pacific-Asia Conference, PAKDD'99, Beijing, China, April 26-28, 1999, Proceedings (1999) 0.02
    0.024883216 = product of:
      0.19906573 = sum of:
        0.19906573 = weight(_text_:pacific in 3821) [ClassicSimilarity], result of:
          0.19906573 = score(doc=3821,freq=2.0), product of:
            0.3193714 = queryWeight, product of:
              8.059301 = idf(docFreq=37, maxDocs=44218)
              0.03962768 = queryNorm
            0.6233048 = fieldWeight in 3821, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.059301 = idf(docFreq=37, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3821)
      0.125 = coord(1/8)
    
  3. Wongthontham, P.; Abu-Salih, B.: Ontology-based approach for semantic data extraction from social big data : state-of-the-art and research directions (2018) 0.02
    0.023150655 = product of:
      0.09260262 = sum of:
        0.05077526 = weight(_text_:case in 4097) [ClassicSimilarity], result of:
          0.05077526 = score(doc=4097,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 4097, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=4097)
        0.04182736 = weight(_text_:studies in 4097) [ClassicSimilarity], result of:
          0.04182736 = score(doc=4097,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 4097, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=4097)
      0.25 = coord(2/8)
    
    Abstract
    A challenge of managing and extracting useful knowledge from social media data sources has attracted much attention from academic and industry. To address this challenge, semantic analysis of textual data is focused in this paper. We propose an ontology-based approach to extract semantics of textual data and define the domain of data. In other words, we semantically analyse the social data at two levels i.e. the entity level and the domain level. We have chosen Twitter as a social channel challenge for a purpose of concept proof. Domain knowledge is captured in ontologies which are then used to enrich the semantics of tweets provided with specific semantic conceptual representation of entities that appear in the tweets. Case studies are used to demonstrate this approach. We experiment and evaluate our proposed approach with a public dataset collected from Twitter and from the politics domain. The ontology-based approach leverages entity extraction and concept mappings in terms of quantity and accuracy of concept identification.
  4. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.02
    0.021736393 = product of:
      0.08694557 = sum of:
        0.065469556 = weight(_text_:libraries in 1737) [ClassicSimilarity], result of:
          0.065469556 = score(doc=1737,freq=6.0), product of:
            0.13017908 = queryWeight, product of:
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.03962768 = queryNorm
            0.5029192 = fieldWeight in 1737, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.0625 = fieldNorm(doc=1737)
        0.021476014 = product of:
          0.042952027 = sum of:
            0.042952027 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.042952027 = score(doc=1737,freq=2.0), product of:
                0.13876937 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03962768 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
    Date
    22.11.1998 18:57:22
  5. Haravu, L.J.; Neelameghan, A.: Text mining and data mining in knowledge organization and discovery : the making of knowledge-based products (2003) 0.02
    0.019292213 = product of:
      0.07716885 = sum of:
        0.042312715 = weight(_text_:case in 5653) [ClassicSimilarity], result of:
          0.042312715 = score(doc=5653,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.24286987 = fieldWeight in 5653, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5653)
        0.034856133 = weight(_text_:studies in 5653) [ClassicSimilarity], result of:
          0.034856133 = score(doc=5653,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.22043361 = fieldWeight in 5653, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5653)
      0.25 = coord(2/8)
    
    Abstract
    Discusses the importance of knowledge organization in the context of the information overload caused by the vast quantities of data and information accessible on internal and external networks of an organization. Defines the characteristics of a knowledge-based product. Elaborates on the techniques and applications of text mining in developing knowledge products. Presents two approaches, as case studies, to the making of knowledge products: (1) steps and processes in the planning, designing and development of a composite multilingual multimedia CD product, with the potential international, inter-cultural end users in view, and (2) application of natural language processing software in text mining. Using a text mining software, it is possible to link concept terms from a processed text to a related thesaurus, glossary, schedules of a classification scheme, and facet structured subject representations. Concludes that the products of text mining and data mining could be made more useful if the features of a faceted scheme for subject classification are incorporated into text mining techniques and products.
  6. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.01
    0.01206966 = product of:
      0.04827864 = sum of:
        0.034856133 = weight(_text_:studies in 1605) [ClassicSimilarity], result of:
          0.034856133 = score(doc=1605,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.22043361 = fieldWeight in 1605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1605)
        0.013422508 = product of:
          0.026845016 = sum of:
            0.026845016 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.026845016 = score(doc=1605,freq=2.0), product of:
                0.13876937 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03962768 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.5 = coord(1/2)
      0.25 = coord(2/8)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  7. Gill, A.J.; Hinrichs-Krapels, S.; Blanke, T.; Grant, J.; Hedges, M.; Tanner, S.: Insight workflow : systematically combining human and computational methods to explore textual data (2017) 0.01
    0.007479902 = product of:
      0.059839215 = sum of:
        0.059839215 = weight(_text_:case in 3682) [ClassicSimilarity], result of:
          0.059839215 = score(doc=3682,freq=4.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.34346986 = fieldWeight in 3682, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3682)
      0.125 = coord(1/8)
    
    Abstract
    Analyzing large quantities of real-world textual data has the potential to provide new insights for researchers. However, such data present challenges for both human and computational methods, requiring a diverse range of specialist skills, often shared across a number of individuals. In this paper we use the analysis of a real-world data set as our case study, and use this exploration as a demonstration of our "insight workflow," which we present for use and adaptation by other researchers. The data we use are impact case study documents collected as part of the UK Research Excellence Framework (REF), consisting of 6,679 documents and 6.25 million words; the analysis was commissioned by the Higher Education Funding Council for England (published as report HEFCE 2015). In our exploration and analysis we used a variety of techniques, ranging from keyword in context and frequency information to more sophisticated methods (topic modeling), with these automated techniques providing an empirical point of entry for in-depth and intensive human analysis. We present the 60 topics to demonstrate the output of our methods, and illustrate how the variety of analysis techniques can be combined to provide insights. We note potential limitations and propose future work.
  8. Deogun, J.S.: Feature selection and effective classifiers (1998) 0.01
    0.0063469075 = product of:
      0.05077526 = sum of:
        0.05077526 = weight(_text_:case in 2911) [ClassicSimilarity], result of:
          0.05077526 = score(doc=2911,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 2911, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=2911)
      0.125 = coord(1/8)
    
    Abstract
    Develops and analyzes 4 algorithms for feature selection in the context of rough set methodology. Develops the notion of accuracy of classification that can be used for upper or lower classification methods and defines the feature selection problem. Presents a discussion of upper classifiers and develops 4 features selection heuristics and discusses the family of stepwise backward selection algorithms. Analyzes the worst case time complexity in all algorithms presented. Discusses details of the experiments and results of using a family of stepwise backward selection learning data sets and a duodenal ulcer data set. Includes the experimental setup and results of comparison of lower classifiers and upper classiers on the duodenal ulcer data set. Discusses exteded decision tables
  9. Gluck , M.: Multimedia exploratory data analysis for geospatial data mining : the case for augmented seriation (2001) 0.01
    0.0063469075 = product of:
      0.05077526 = sum of:
        0.05077526 = weight(_text_:case in 5214) [ClassicSimilarity], result of:
          0.05077526 = score(doc=5214,freq=2.0), product of:
            0.1742197 = queryWeight, product of:
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.03962768 = queryNorm
            0.29144385 = fieldWeight in 5214, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.3964143 = idf(docFreq=1480, maxDocs=44218)
              0.046875 = fieldNorm(doc=5214)
      0.125 = coord(1/8)
    
  10. Knowledge management in fuzzy databases (2000) 0.01
    0.006099823 = product of:
      0.048798583 = sum of:
        0.048798583 = weight(_text_:studies in 4260) [ClassicSimilarity], result of:
          0.048798583 = score(doc=4260,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.30860704 = fieldWeight in 4260, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4260)
      0.125 = coord(1/8)
    
    Series
    Studies in fuzziness and soft computing; vol.39
  11. Gaizauskas, R.; Wilks, Y.: Information extraction : beyond document retrieval (1998) 0.01
    0.005637133 = product of:
      0.045097064 = sum of:
        0.045097064 = product of:
          0.09019413 = sum of:
            0.09019413 = weight(_text_:area in 4716) [ClassicSimilarity], result of:
              0.09019413 = score(doc=4716,freq=4.0), product of:
                0.1952553 = queryWeight, product of:
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.03962768 = queryNorm
                0.46192923 = fieldWeight in 4716, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4716)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Abstract
    In this paper we give a synoptic view of the growth of the text processing technology of informatione xtraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining
  12. Wu, K.J.; Chen, M.-C.; Sun, Y.: Automatic topics discovery from hyperlinked documents (2004) 0.01
    0.00522842 = product of:
      0.04182736 = sum of:
        0.04182736 = weight(_text_:studies in 2563) [ClassicSimilarity], result of:
          0.04182736 = score(doc=2563,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 2563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=2563)
      0.125 = coord(1/8)
    
    Abstract
    Topic discovery is an important means for marketing, e-Business and social science studies. As well, it can be applied to various purposes, such as identifying a group with certain properties and observing the emergence and diminishment of a certain cyber community. Previous topic discovery work (J.M. Kleinberg, Proceedings of the 9th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, California, p. 668) requires manual judgment of usefulness of outcomes and is thus incapable of handling the explosive growth of the Internet. In this paper, we propose the Automatic Topic Discovery (ATD) method, which combines a method of base set construction, a clustering algorithm and an iterative principal eigenvector computation method to discover the topics relevant to a given query without using manual examination. Given a query, ATD returns with topics associated with the query and top representative pages for each topic. Our experiments show that the ATD method performs better than the traditional eigenvector method in terms of computation time and topic discovery quality.
  13. Loh, S.; Oliveira, J.P.M. de; Gastal, F.L.: Knowledge discovery in textual documentation : qualitative and quantitative analyses (2001) 0.01
    0.00522842 = product of:
      0.04182736 = sum of:
        0.04182736 = weight(_text_:studies in 4482) [ClassicSimilarity], result of:
          0.04182736 = score(doc=4482,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 4482, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=4482)
      0.125 = coord(1/8)
    
    Abstract
    This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high-level textual characteristics. Instead of applying mining techniques on attribute values, terms or keywords extracted from texts, the discovery process works over conceptss identified in texts. Concepts represent real world events and objects, and they help the user to understand ideas, trends, thoughts, opinions and intentions present in texts. The approach combines a quasi-automatic categorisation task (for qualitative analysis) with a mining process (for quantitative analysis). The goal is to find new and useful knowledge inside a textual collection through the use of mining techniques applied over concepts (representing text content). In this paper, an application of the approach to medical records of a psychiatric hospital is presented. The approach helps physicians to extract knowledge about patients and diseases. This knowledge may be used for epidemiological studies, for training professionals and it may be also used to support physicians to diagnose and evaluate diseases.
  14. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.01
    0.00522842 = product of:
      0.04182736 = sum of:
        0.04182736 = weight(_text_:studies in 4226) [ClassicSimilarity], result of:
          0.04182736 = score(doc=4226,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 4226, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=4226)
      0.125 = coord(1/8)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
  15. Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.01
    0.00522842 = product of:
      0.04182736 = sum of:
        0.04182736 = weight(_text_:studies in 521) [ClassicSimilarity], result of:
          0.04182736 = score(doc=521,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=521)
      0.125 = coord(1/8)
    
    Abstract
    Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.
  16. Carter, D.; Sholler, D.: Data science on the ground : hype, criticism, and everyday work (2016) 0.01
    0.00522842 = product of:
      0.04182736 = sum of:
        0.04182736 = weight(_text_:studies in 3111) [ClassicSimilarity], result of:
          0.04182736 = score(doc=3111,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.26452032 = fieldWeight in 3111, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.046875 = fieldNorm(doc=3111)
      0.125 = coord(1/8)
    
    Abstract
    Modern organizations often employ data scientists to improve business processes using diverse sets of data. Researchers and practitioners have both touted the benefits and warned of the drawbacks associated with data science and big data approaches, but few studies investigate how data science is carried out "on the ground." In this paper, we first review the hype and criticisms surrounding data science and big data approaches. We then present the findings of semistructured interviews with 18 data analysts from various industries and organizational roles. Using qualitative coding techniques, we evaluated these interviews in light of the hype and criticisms surrounding data science in the popular discourse. We found that although the data analysts we interviewed were sensitive to both the allure and the potential pitfalls of data science, their motivations and evaluations of their work were more nuanced. We conclude by reflecting on the relationship between data analysts' work and the discourses around data science and big data, suggesting how future research can better account for the everyday practices of this profession.
  17. Organisciak, P.; Schmidt, B.M.; Downie, J.S.: Giving shape to large digital libraries through exploratory data analysis (2022) 0.01
    0.005011469 = product of:
      0.040091753 = sum of:
        0.040091753 = weight(_text_:libraries in 473) [ClassicSimilarity], result of:
          0.040091753 = score(doc=473,freq=4.0), product of:
            0.13017908 = queryWeight, product of:
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.03962768 = queryNorm
            0.30797386 = fieldWeight in 473, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2850544 = idf(docFreq=4499, maxDocs=44218)
              0.046875 = fieldNorm(doc=473)
      0.125 = coord(1/8)
    
    Abstract
    The emergence of large multi-institutional digital libraries has opened the door to aggregate-level examinations of the published word. Such large-scale analysis offers a new way to pursue traditional problems in the humanities and social sciences, using digital methods to ask routine questions of large corpora. However, inquiry into multiple centuries of books is constrained by the burdens of scale, where statistical inference is technically complex and limited by hurdles to access and flexibility. This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets. We present one such tool, HathiTrust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library, and center it in the broader space of scholarly tools for exploratory data analysis.
  18. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.00
    0.004697878 = product of:
      0.037583023 = sum of:
        0.037583023 = product of:
          0.07516605 = sum of:
            0.07516605 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.07516605 = score(doc=4577,freq=2.0), product of:
                0.13876937 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03962768 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    2. 4.2000 18:01:22
  19. Zhou, L.; Chaovalit, P.: Ontology-supported polarity mining (2008) 0.00
    0.004650397 = product of:
      0.037203178 = sum of:
        0.037203178 = product of:
          0.074406356 = sum of:
            0.074406356 = weight(_text_:area in 1343) [ClassicSimilarity], result of:
              0.074406356 = score(doc=1343,freq=2.0), product of:
                0.1952553 = queryWeight, product of:
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.03962768 = queryNorm
                0.38107216 = fieldWeight in 1343, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.927245 = idf(docFreq=870, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1343)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Abstract
    Polarity mining provides an in-depth analysis of semantic orientations of text information. Motivated by its success in the area of topic mining, we propose an ontology-supported polarity mining (OSPM) approach. The approach aims to enhance polarity mining with ontology by providing detailed topic-specific information. OSPM was evaluated in the movie review domain using both supervised and unsupervised techniques. Results revealed that OSPM outperformed the baseline method without ontology support. The findings of this study not only advance the state of polarity mining research but also shed light on future research directions.
  20. Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.00
    0.0043570166 = product of:
      0.034856133 = sum of:
        0.034856133 = weight(_text_:studies in 5997) [ClassicSimilarity], result of:
          0.034856133 = score(doc=5997,freq=2.0), product of:
            0.15812531 = queryWeight, product of:
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.03962768 = queryNorm
            0.22043361 = fieldWeight in 5997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9902744 = idf(docFreq=2222, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5997)
      0.125 = coord(1/8)
    
    Series
    Proceedings of the ... annual conference of the Gesellschaft für Klassifikation e.V. ; 24)(Studies in classification, data analysis, and knowledge organization

Years

Languages

  • e 37
  • d 7

Types