Search (28 results, page 1 of 2)

  • × theme_ss:"Data Mining"
  1. Kraker, P.; Kittel, C,; Enkhbayar, A.: Open Knowledge Maps : creating a visual interface to the world's scientific knowledge based on natural language processing (2016) 0.03
    0.02709896 = product of:
      0.10839584 = sum of:
        0.10839584 = weight(_text_:open in 3205) [ClassicSimilarity], result of:
          0.10839584 = score(doc=3205,freq=6.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.5170568 = fieldWeight in 3205, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046875 = fieldNorm(doc=3205)
      0.25 = coord(1/4)
    
    Abstract
    The goal of Open Knowledge Maps is to create a visual interface to the world's scientific knowledge. The base for this visual interface consists of so-called knowledge maps, which enable the exploration of existing knowledge and the discovery of new knowledge. Our open source knowledge mapping software applies a mixture of summarization techniques and similarity measures on article metadata, which are iteratively chained together. After processing, the representation is saved in a database for use in a web visualization. In the future, we want to create a space for collective knowledge mapping that brings together individuals and communities involved in exploration and discovery. We want to enable people to guide each other in their discovery by collaboratively annotating and modifying the automatically created maps.
  2. Kipcic, O.; Cramer, C.: Wie Zeitungsinhalte Forschung und Entwicklung befördern (2017) 0.02
    0.01825319 = product of:
      0.07301276 = sum of:
        0.07301276 = weight(_text_:open in 3885) [ClassicSimilarity], result of:
          0.07301276 = score(doc=3885,freq=2.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.3482767 = fieldWeight in 3885, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3885)
      0.25 = coord(1/4)
    
    Source
    Open Password. 2017, Nr.237 vom 20.07.2017 [http://www.password-online.de/?wysija-page=1&controller=email&action=view&email_id=294&wysijap=subscriptions&user_id=623]
  3. Srinivasan, P.: Text mining : generating hypotheses from MEDLINE (2004) 0.02
    0.015645592 = product of:
      0.062582366 = sum of:
        0.062582366 = weight(_text_:open in 2225) [ClassicSimilarity], result of:
          0.062582366 = score(doc=2225,freq=2.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.2985229 = fieldWeight in 2225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046875 = fieldNorm(doc=2225)
      0.25 = coord(1/4)
    
    Abstract
    Hypothesis generation, a crucial initial step for making scientific discoveries, relies an prior knowledge, experience, and intuition. Chance connections made between seemingly distinct subareas sometimes turn out to be fruitful. The goal in text mining is to assist in this process by automatically discovering a small set of interesting hypotheses from a suitable text collection. In this report, we present open and closed text mining algorithms that are built within the discovery framework established by Swanson and Smalheiser. Our algorithms represent topics using metadata profiles. When applied to MEDLINE, these are McSH based profiles. We present experiments that demonstrate the effectiveness of our algorithms. Specifically, our algorithms successfully generate ranked term lists where the key terms representing novel relationships between topics are ranked high.
  4. Loonus, Y.: Einsatzbereiche der KI und ihre Relevanz für Information Professionals (2017) 0.02
    0.015645592 = product of:
      0.062582366 = sum of:
        0.062582366 = weight(_text_:open in 5668) [ClassicSimilarity], result of:
          0.062582366 = score(doc=5668,freq=2.0), product of:
            0.20964009 = queryWeight, product of:
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046553567 = queryNorm
            0.2985229 = fieldWeight in 5668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.5032015 = idf(docFreq=1330, maxDocs=44218)
              0.046875 = fieldNorm(doc=5668)
      0.25 = coord(1/4)
    
    Source
    Open Password. 2017, Nr.265 vom 11.10.2017. [http://www.password-online.de/?wysija-page=1&controller=email&action=view&email_id=340&wysijap=subscriptions&user_id=1045]
  5. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.01
    0.01103789 = product of:
      0.04415156 = sum of:
        0.04415156 = product of:
          0.08830312 = sum of:
            0.08830312 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.08830312 = score(doc=4577,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    2. 4.2000 18:01:22
  6. Budzik, J.; Hammond, K.J.; Birnbaum, L.: Information access in context (2001) 0.01
    0.010340675 = product of:
      0.0413627 = sum of:
        0.0413627 = product of:
          0.0827254 = sum of:
            0.0827254 = weight(_text_:access in 3835) [ClassicSimilarity], result of:
              0.0827254 = score(doc=3835,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.5242754 = fieldWeight in 3835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3835)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
  7. KDD : techniques and applications (1998) 0.01
    0.009461049 = product of:
      0.037844196 = sum of:
        0.037844196 = product of:
          0.07568839 = sum of:
            0.07568839 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.07568839 = score(doc=6783,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  8. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.01
    0.006307366 = product of:
      0.025229463 = sum of:
        0.025229463 = product of:
          0.050458927 = sum of:
            0.050458927 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.050458927 = score(doc=1737,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22.11.1998 18:57:22
  9. Lusti, M.: Data Warehousing and Data Mining : Eine Einführung in entscheidungsunterstützende Systeme (1999) 0.01
    0.006307366 = product of:
      0.025229463 = sum of:
        0.025229463 = product of:
          0.050458927 = sum of:
            0.050458927 = weight(_text_:22 in 4261) [ClassicSimilarity], result of:
              0.050458927 = score(doc=4261,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.30952093 = fieldWeight in 4261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4261)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    17. 7.2002 19:22:06
  10. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01
    0.006307366 = product of:
      0.025229463 = sum of:
        0.025229463 = product of:
          0.050458927 = sum of:
            0.050458927 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
              0.050458927 = score(doc=1270,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.30952093 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
  11. Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.01
    0.006267395 = product of:
      0.02506958 = sum of:
        0.02506958 = product of:
          0.05013916 = sum of:
            0.05013916 = weight(_text_:access in 602) [ClassicSimilarity], result of:
              0.05013916 = score(doc=602,freq=4.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.31775886 = fieldWeight in 602, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046875 = fieldNorm(doc=602)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
  12. Information visualization in data mining and knowledge discovery (2002) 0.01
    0.006108161 = product of:
      0.024432644 = sum of:
        0.024432644 = sum of:
          0.0118179135 = weight(_text_:access in 1789) [ClassicSimilarity], result of:
            0.0118179135 = score(doc=1789,freq=2.0), product of:
              0.15778996 = queryWeight, product of:
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.046553567 = queryNorm
              0.074896485 = fieldWeight in 1789, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.389428 = idf(docFreq=4053, maxDocs=44218)
                0.015625 = fieldNorm(doc=1789)
          0.012614732 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
            0.012614732 = score(doc=1789,freq=2.0), product of:
              0.16302267 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046553567 = queryNorm
              0.07738023 = fieldWeight in 1789, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.015625 = fieldNorm(doc=1789)
      0.25 = coord(1/4)
    
    Date
    23. 3.2008 19:10:22
    Footnote
    In 13 chapters, Part Two provides an introduction to KDD, an overview of data mining techniques, and examples of the usefulness of data model visualizations. The importance of visualization throughout the KDD process is stressed in many of the chapters. In particular, the need for measures of visualization effectiveness, benchmarking for identifying best practices, and the use of standardized sample data sets is convincingly presented. Many of the important data mining approaches are discussed in this complementary context. Cluster and outlier detection, classification techniques, and rule discovery algorithms are presented as the basic techniques common to the KDD process. The potential effectiveness of using visualization in the data modeling process are illustrated in chapters focused an using visualization for helping users understand the KDD process, ask questions and form hypotheses about their data, and evaluate the accuracy and veracity of their results. The 11 chapters of Part Three provide an overview of the KDD process and successful approaches to integrating KDD, data mining, and visualization in complementary domains. Rhodes (Chapter 21) begins this section with an excellent overview of the relation between the KDD process and data mining techniques. He states that the "primary goals of data mining are to describe the existing data and to predict the behavior or characteristics of future data of the same type" (p. 281). These goals are met by data mining tasks such as classification, regression, clustering, summarization, dependency modeling, and change or deviation detection. Subsequent chapters demonstrate how visualization can aid users in the interactive process of knowledge discovery by graphically representing the results from these iterative tasks. Finally, examples of the usefulness of integrating visualization and data mining tools in the domain of business, imagery and text mining, and massive data sets are provided. This text concludes with a thorough and useful 17-page index and lengthy yet integrating 17-page summary of the academic and industrial backgrounds of the contributing authors. A 16-page set of color inserts provide a better representation of the visualizations discussed, and a URL provided suggests that readers may view all the book's figures in color on-line, although as of this submission date it only provides access to a summary of the book and its contents. The overall contribution of this work is its focus an bridging two distinct areas of research, making it a valuable addition to the Morgan Kaufmann Series in Database Management Systems. The editors of this text have met their main goal of providing the first textbook integrating knowledge discovery, data mining, and visualization. Although it contributes greatly to our under- standing of the development and current state of the field, a major weakness of this text is that there is no concluding chapter to discuss the contributions of the sum of these contributed papers or give direction to possible future areas of research. "Integration of expertise between two different disciplines is a difficult process of communication and reeducation. Integrating data mining and visualization is particularly complex because each of these fields in itself must draw an a wide range of research experience" (p. 300). Although this work contributes to the crossdisciplinary communication needed to advance visualization in KDD, a more formal call for an interdisciplinary research agenda in a concluding chapter would have provided a more satisfying conclusion to a very good introductory text.
  13. Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.01
    0.0059089568 = product of:
      0.023635827 = sum of:
        0.023635827 = product of:
          0.047271654 = sum of:
            0.047271654 = weight(_text_:access in 505) [ClassicSimilarity], result of:
              0.047271654 = score(doc=505,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.29958594 = fieldWeight in 505, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0625 = fieldNorm(doc=505)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
  14. Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.01
    0.005518945 = product of:
      0.02207578 = sum of:
        0.02207578 = product of:
          0.04415156 = sum of:
            0.04415156 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
              0.04415156 = score(doc=2908,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.2708308 = fieldWeight in 2908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2908)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    Information systems. 22(1997) nos.5/6, S.349-385
  15. Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.01
    0.005222829 = product of:
      0.020891316 = sum of:
        0.020891316 = product of:
          0.041782632 = sum of:
            0.041782632 = weight(_text_:access in 606) [ClassicSimilarity], result of:
              0.041782632 = score(doc=606,freq=4.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.26479906 = fieldWeight in 606, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=606)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    With more and more information available on the Internet, the task of making personalized recommendations to assist the user's navigation has become increasingly important. Considering there might be millions of users with different backgrounds accessing a Web site everyday, it is infeasible to build a separate recommendation system for each user. To address this problem, clustering techniques can first be employed to discover user groups. Then, user navigation patterns for each group can be discovered, to allow the adaptation of a Web site to the interest of each individual group. In this paper, we propose to model user access sequences as stochastic processes, and a mixture of Markov models based approach is taken to cluster users and to capture the sequential relationships inherent in user access histories. Several important issues that arise in constructing the Markov models are also addressed. The first issue lies in the complexity of the mixture of Markov models. To improve the efficiency of building/maintaining the mixture of Markov models, we develop a lightweight adapt-ive algorithm to update the model parameters without recomputing model parameters from scratch. The second issue concerns the proper selection of training data for building the mixture of Markov models. We investigate two different training data selection strategies and perform extensive experiments to compare their effectiveness on a real dataset that is generated by a Web-based knowledge management system, Livelink.
  16. Lackes, R.; Tillmanns, C.: Data Mining für die Unternehmenspraxis : Entscheidungshilfen und Fallstudien mit führenden Softwarelösungen (2006) 0.00
    0.0047305245 = product of:
      0.018922098 = sum of:
        0.018922098 = product of:
          0.037844196 = sum of:
            0.037844196 = weight(_text_:22 in 1383) [ClassicSimilarity], result of:
              0.037844196 = score(doc=1383,freq=2.0), product of:
                0.16302267 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046553567 = queryNorm
                0.23214069 = fieldWeight in 1383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1383)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2008 14:46:06
  17. Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.00
    0.0044317176 = product of:
      0.01772687 = sum of:
        0.01772687 = product of:
          0.03545374 = sum of:
            0.03545374 = weight(_text_:access in 4242) [ClassicSimilarity], result of:
              0.03545374 = score(doc=4242,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.22468945 = fieldWeight in 4242, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4242)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
  18. Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.00
    0.0044317176 = product of:
      0.01772687 = sum of:
        0.01772687 = product of:
          0.03545374 = sum of:
            0.03545374 = weight(_text_:access in 1205) [ClassicSimilarity], result of:
              0.03545374 = score(doc=1205,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.22468945 = fieldWeight in 1205, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1205)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.
  19. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.00
    0.0044317176 = product of:
      0.01772687 = sum of:
        0.01772687 = product of:
          0.03545374 = sum of:
            0.03545374 = weight(_text_:access in 1326) [ClassicSimilarity], result of:
              0.03545374 = score(doc=1326,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.22468945 = fieldWeight in 1326, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1326)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
  20. Organisciak, P.; Schmidt, B.M.; Downie, J.S.: Giving shape to large digital libraries through exploratory data analysis (2022) 0.00
    0.0044317176 = product of:
      0.01772687 = sum of:
        0.01772687 = product of:
          0.03545374 = sum of:
            0.03545374 = weight(_text_:access in 473) [ClassicSimilarity], result of:
              0.03545374 = score(doc=473,freq=2.0), product of:
                0.15778996 = queryWeight, product of:
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046553567 = queryNorm
                0.22468945 = fieldWeight in 473, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.389428 = idf(docFreq=4053, maxDocs=44218)
                  0.046875 = fieldNorm(doc=473)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The emergence of large multi-institutional digital libraries has opened the door to aggregate-level examinations of the published word. Such large-scale analysis offers a new way to pursue traditional problems in the humanities and social sciences, using digital methods to ask routine questions of large corpora. However, inquiry into multiple centuries of books is constrained by the burdens of scale, where statistical inference is technically complex and limited by hurdles to access and flexibility. This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets. We present one such tool, HathiTrust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library, and center it in the broader space of scholarly tools for exploratory data analysis.

Years

Languages

  • e 19
  • d 9

Types