Search (52 results, page 2 of 3)

  • × theme_ss:"Data Mining"
  • × year_i:[2010 TO 2020}
  1. Maaten, L. van den: Accelerating t-SNE using Tree-Based Algorithms (2014) 0.00
    0.0042281803 = product of:
      0.012684541 = sum of:
        0.012684541 = weight(_text_:in in 3886) [ClassicSimilarity], result of:
          0.012684541 = score(doc=3886,freq=6.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.1822149 = fieldWeight in 3886, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3886)
      0.33333334 = coord(1/3)
    
    Abstract
    The paper investigates the acceleration of t-SNE-an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots-using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N*logN). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.
  2. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00
    0.0041848132 = product of:
      0.012554439 = sum of:
        0.012554439 = weight(_text_:in in 3464) [ClassicSimilarity], result of:
          0.012554439 = score(doc=3464,freq=8.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.18034597 = fieldWeight in 3464, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.33333334 = coord(1/3)
    
    Abstract
    We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.
  3. Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.00
    0.0041848132 = product of:
      0.012554439 = sum of:
        0.012554439 = weight(_text_:in in 1326) [ClassicSimilarity], result of:
          0.012554439 = score(doc=1326,freq=8.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.18034597 = fieldWeight in 1326, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
      0.33333334 = coord(1/3)
    
    Abstract
    The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
  4. Loonus, Y.: Einsatzbereiche der KI und ihre Relevanz für Information Professionals (2017) 0.00
    0.0041848132 = product of:
      0.012554439 = sum of:
        0.012554439 = weight(_text_:in in 5668) [ClassicSimilarity], result of:
          0.012554439 = score(doc=5668,freq=8.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.18034597 = fieldWeight in 5668, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=5668)
      0.33333334 = coord(1/3)
    
    Abstract
    Es liegt in der Natur des Menschen, Erfahrungen und Ideen in Wort und Schrift mit anderen teilen zu wollen. So produzieren wir jeden Tag gigantische Mengen an Texten, die in digitaler Form geteilt und abgelegt werden. The Radicati Group schätzt, dass 2017 täglich 269 Milliarden E-Mails versendet und empfangen werden. Hinzu kommen größtenteils unstrukturierte Daten wie Soziale Medien, Presse, Websites und firmeninterne Systeme, beispielsweise in Form von CRM-Software oder PDF-Dokumenten. Der weltweite Bestand an unstrukturierten Daten wächst so rasant, dass es kaum möglich ist, seinen Umfang zu quantifizieren. Der Versuch, eine belastbare Zahl zu recherchieren, führt unweigerlich zu diversen Artikeln, die den Anteil unstrukturierter Texte am gesamten Datenbestand auf 80% schätzen. Auch wenn nicht mehr einwandfrei nachvollziehbar ist, woher diese Zahl stammt, kann bei kritischer Reflexion unseres Tagesablaufs kaum bezweifelt werden, dass diese Daten von großer wirtschaftlicher Relevanz sind.
  5. Derek Doran, D.; Gokhale, S.S.: ¬A classification framework for web robots (2012) 0.00
    0.00394548 = product of:
      0.011836439 = sum of:
        0.011836439 = weight(_text_:in in 505) [ClassicSimilarity], result of:
          0.011836439 = score(doc=505,freq=4.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.17003182 = fieldWeight in 505, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=505)
      0.33333334 = coord(1/3)
    
    Abstract
    The behavior of modern web robots varies widely when they crawl for different purposes. In this article, we present a framework to classify these web robots from two orthogonal perspectives, namely, their functionality and the types of resources they consume. Applying the classification framework to a year-long access log from the UConn SoE web server, we present trends that point to significant differences in their crawling behavior.
  6. Winterhalter, C.: Licence to mine : ein Überblick über Rahmenbedingungen von Text and Data Mining und den aktuellen Stand der Diskussion (2016) 0.00
    0.00394548 = product of:
      0.011836439 = sum of:
        0.011836439 = weight(_text_:in in 673) [ClassicSimilarity], result of:
          0.011836439 = score(doc=673,freq=4.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.17003182 = fieldWeight in 673, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0625 = fieldNorm(doc=673)
      0.33333334 = coord(1/3)
    
    Abstract
    Der Artikel gibt einen Überblick über die Möglichkeiten der Anwendung von Text and Data Mining (TDM) und ähnlichen Verfahren auf der Grundlage bestehender Regelungen in Lizenzverträgen zu kostenpflichtigen elektronischen Ressourcen, die Debatte über zusätzliche Lizenzen für TDM am Beispiel von Elseviers TDM Policy und den Stand der Diskussion über die Einführung von Schrankenregelungen im Urheberrecht für TDM zu nichtkommerziellen wissenschaftlichen Zwecken.
    Content
    Beitrag in einem Themenschwerpunkt 'Computerlinguistik und Bibliotheken'. Vgl.: http://0277.ch/ojs/index.php/cdrs_0277/article/view/153/350.
  7. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.00
    0.0038989696 = product of:
      0.011696909 = sum of:
        0.011696909 = weight(_text_:in in 4367) [ClassicSimilarity], result of:
          0.011696909 = score(doc=4367,freq=10.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.16802745 = fieldWeight in 4367, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4367)
      0.33333334 = coord(1/3)
    
    Abstract
    Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.
  8. Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Huang, J.X.; Jemaa, M.B.: Mining correlations between medically dependent features and image retrieval models for query classification (2017) 0.00
    0.0038989696 = product of:
      0.011696909 = sum of:
        0.011696909 = weight(_text_:in in 3607) [ClassicSimilarity], result of:
          0.011696909 = score(doc=3607,freq=10.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.16802745 = fieldWeight in 3607, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3607)
      0.33333334 = coord(1/3)
    
    Abstract
    The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.
  9. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.00
    0.0038989696 = product of:
      0.011696909 = sum of:
        0.011696909 = weight(_text_:in in 4019) [ClassicSimilarity], result of:
          0.011696909 = score(doc=4019,freq=10.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.16802745 = fieldWeight in 4019, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4019)
      0.33333334 = coord(1/3)
    
    Abstract
    What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
    Footnote
    Rez. in: JASIST 69(2018) no.1, S.181-184 (Jacques Savoy).
  10. Mining text data (2012) 0.00
    0.0036906586 = product of:
      0.011071975 = sum of:
        0.011071975 = weight(_text_:in in 362) [ClassicSimilarity], result of:
          0.011071975 = score(doc=362,freq=14.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.15905021 = fieldWeight in 362, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.03125 = fieldNorm(doc=362)
      0.33333334 = coord(1/3)
    
    Abstract
    Text mining applications have experienced tremendous advances because of web 2.0 and social networking applications. Recent advances in hardware and software technology have lead to a number of unique scenarios where text mining algorithms are learned. Mining Text Data introduces an important niche in the text analytics field, and is an edited volume contributed by leading international researchers and practitioners focused on social networks & data mining. This book contains a wide swath in topics across social networks & data mining. Each chapter contains a comprehensive survey including the key research content on the topic, and the future directions of research in the field. There is a special focus on Text Embedded with Heterogeneous and Multimedia Data which makes the mining process much more challenging. A number of methods have been designed such as transfer learning and cross-lingual mining for such cases. Mining Text Data simplifies the content, so that advanced-level students, practitioners and researchers in computer science can benefit from this book. Academic and corporate libraries, as well as ACM, IEEE, and Management Science focused on information security, electronic commerce, databases, data mining, machine learning, and statistics are the primary buyers for this reference book.
    Content
    Inhalt: An Introduction to Text Mining.- Information Extraction from Text.- A Survey of Text Summarization Techniques.- A Survey of Text Clustering Algorithms.- Dimensionality Reduction and Topic Modeling.- A Survey of Text Classification Algorithms.- Transfer Learning for Text Mining.- Probabilistic Models for Text Mining.- Mining Text Streams.- Translingual Mining from Text Data.- Text Mining in Multimedia.- Text Analytics in Social Media.- A Survey of Opinion Mining and Sentiment Analysis.- Biomedical Text Mining: A Survey of Recent Progress.- Index.
  11. Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.00
    0.0036241547 = product of:
      0.010872464 = sum of:
        0.010872464 = weight(_text_:in in 4226) [ClassicSimilarity], result of:
          0.010872464 = score(doc=4226,freq=6.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.1561842 = fieldWeight in 4226, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=4226)
      0.33333334 = coord(1/3)
    
    Abstract
    Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.
  12. Carter, D.; Sholler, D.: Data science on the ground : hype, criticism, and everyday work (2016) 0.00
    0.0036241547 = product of:
      0.010872464 = sum of:
        0.010872464 = weight(_text_:in in 3111) [ClassicSimilarity], result of:
          0.010872464 = score(doc=3111,freq=6.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.1561842 = fieldWeight in 3111, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3111)
      0.33333334 = coord(1/3)
    
    Abstract
    Modern organizations often employ data scientists to improve business processes using diverse sets of data. Researchers and practitioners have both touted the benefits and warned of the drawbacks associated with data science and big data approaches, but few studies investigate how data science is carried out "on the ground." In this paper, we first review the hype and criticisms surrounding data science and big data approaches. We then present the findings of semistructured interviews with 18 data analysts from various industries and organizational roles. Using qualitative coding techniques, we evaluated these interviews in light of the hype and criticisms surrounding data science in the popular discourse. We found that although the data analysts we interviewed were sensitive to both the allure and the potential pitfalls of data science, their motivations and evaluations of their work were more nuanced. We conclude by reflecting on the relationship between data analysts' work and the discourses around data science and big data, suggesting how future research can better account for the everyday practices of this profession.
  13. Drees, B.: Text und data mining : Herausforderungen und Möglichkeiten für Bibliotheken (2016) 0.00
    0.0036241547 = product of:
      0.010872464 = sum of:
        0.010872464 = weight(_text_:in in 3952) [ClassicSimilarity], result of:
          0.010872464 = score(doc=3952,freq=6.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.1561842 = fieldWeight in 3952, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3952)
      0.33333334 = coord(1/3)
    
    Abstract
    Text und Data Mining (TDM) gewinnt als wissenschaftliche Methode zunehmend an Bedeutung und stellt wissenschaftliche Bibliotheken damit vor neue Herausforderungen, bietet gleichzeitig aber auch neue Möglichkeiten. Der vorliegende Beitrag gibt einen Überblick über das Thema TDM aus bibliothekarischer Sicht. Hierzu wird der Begriff Text und Data Mining im Kontext verwandter Begriffe diskutiert sowie Ziele, Aufgaben und Methoden von TDM erläutert. Diese werden anhand beispielhafter TDM-Anwendungen in Wissenschaft und Forschung illustriert. Ferner werden technische und rechtliche Probleme und Hindernisse im TDM-Kontext dargelegt. Abschließend wird die Relevanz von TDM für Bibliotheken, sowohl in ihrer Rolle als Informationsvermittler und -anbieter als auch als Anwender von TDM-Methoden, aufgezeigt. Zudem wurde im Rahmen dieser Arbeit eine Befragung der Betreiber von Dokumentenservern an Bibliotheken in Deutschland zum aktuellen Umgang mit TDM durchgeführt, die zeigt, dass hier noch viel Ausbaupotential besteht. Die dem Artikel zugrunde liegenden Forschungsdaten sind unter dem DOI 10.11588/data/10090 publiziert.
  14. Mandl, T.: Text mining und data minig (2013) 0.00
    0.0034873444 = product of:
      0.010462033 = sum of:
        0.010462033 = weight(_text_:in in 713) [ClassicSimilarity], result of:
          0.010462033 = score(doc=713,freq=2.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.15028831 = fieldWeight in 713, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.078125 = fieldNorm(doc=713)
      0.33333334 = coord(1/3)
    
    Source
    Grundlagen der praktischen Information und Dokumentation. Handbuch zur Einführung in die Informationswissenschaft und -praxis. 6., völlig neu gefaßte Ausgabe. Hrsg. von R. Kuhlen, W. Semar u. D. Strauch. Begründet von Klaus Laisiepen, Ernst Lutterbeck, Karl-Heinrich Meyer-Uhlenried
  15. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
    0.0034873444 = product of:
      0.010462033 = sum of:
        0.010462033 = weight(_text_:in in 967) [ClassicSimilarity], result of:
          0.010462033 = score(doc=967,freq=8.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.15028831 = fieldWeight in 967, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
    
    Abstract
    Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
  16. Varathan, K.D.; Giachanou, A.; Crestani, F.: Comparative opinion mining : a review (2017) 0.00
    0.0034873444 = product of:
      0.010462033 = sum of:
        0.010462033 = weight(_text_:in in 3540) [ClassicSimilarity], result of:
          0.010462033 = score(doc=3540,freq=8.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.15028831 = fieldWeight in 3540, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3540)
      0.33333334 = coord(1/3)
    
    Abstract
    Opinion mining refers to the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyze public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining which deals with identifying and extracting information that is expressed in a comparative form (e.g., "paper X is better than the Y"). Comparative opinion mining plays a very important role when one tries to evaluate something because it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from the perspective of techniques and the other from the perspective of comparative opinion elements. It also incorporates preprocessing tools as well as data set that were used by past researchers that can be useful to future researchers in the field of comparative opinion mining.
  17. Tu, Y.-N.; Hsu, S.-L.: Constructing conceptual trajectory maps to trace the development of research fields (2016) 0.00
    0.0030201292 = product of:
      0.0090603875 = sum of:
        0.0090603875 = weight(_text_:in in 3059) [ClassicSimilarity], result of:
          0.0090603875 = score(doc=3059,freq=6.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.1301535 = fieldWeight in 3059, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3059)
      0.33333334 = coord(1/3)
    
    Abstract
    This study proposes a new method to construct and trace the trajectory of conceptual development of a research field by combining main path analysis, citation analysis, and text-mining techniques. Main path analysis, a method used commonly to trace the most critical path in a citation network, helps describe the developmental trajectory of a research field. This study extends the main path analysis method and applies text-mining techniques in the new method, which reflects the trajectory of conceptual development in an academic research field more accurately than citation frequency, which represents only the articles examined. Articles can be merged based on similarity of concepts, and by merging concepts the history of a research field can be described more precisely. The new method was applied to the "h-index" and "text mining" fields. The precision, recall, and F-measures of the h-index were 0.738, 0.652, and 0.658 and those of text-mining were 0.501, 0.653, and 0.551, respectively. Last, this study not only establishes the conceptual trajectory map of a research field, but also recommends keywords that are more precise than those used currently by researchers. These precise keywords could enable researchers to gather related works more quickly than before.
  18. Leydesdorff, L.; Persson, O.: Mapping the geography of science : distribution patterns and networks of relations among cities and institutes (2010) 0.00
    0.0029591098 = product of:
      0.0088773295 = sum of:
        0.0088773295 = weight(_text_:in in 3704) [ClassicSimilarity], result of:
          0.0088773295 = score(doc=3704,freq=4.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.12752387 = fieldWeight in 3704, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=3704)
      0.33333334 = coord(1/3)
    
    Abstract
    Using Google Earth, Google Maps, and/or network visualization programs such as Pajek, one can overlay the network of relations among addresses in scientific publications onto the geographic map. The authors discuss the pros and cons of various options, and provide software (freeware) for bridging existing gaps between the Science Citation Indices (Thomson Reuters) and Scopus (Elsevier), on the one hand, and these various visualization tools on the other. At the level of city names, the global map can be drawn reliably on the basis of the available address information. At the level of the names of organizations and institutes, there are problems of unification both in the ISI databases and with Scopus. Pajek enables a combination of visualization and statistical analysis, whereas the Google Maps and its derivatives provide superior tools on the Internet.
  19. Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.00
    0.0029591098 = product of:
      0.0088773295 = sum of:
        0.0088773295 = weight(_text_:in in 91) [ClassicSimilarity], result of:
          0.0088773295 = score(doc=91,freq=4.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.12752387 = fieldWeight in 91, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=91)
      0.33333334 = coord(1/3)
    
    Abstract
    Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.
  20. Mohr, J.W.; Bogdanov, P.: Topic models : what they are and why they matter (2013) 0.00
    0.0029591098 = product of:
      0.0088773295 = sum of:
        0.0088773295 = weight(_text_:in in 1142) [ClassicSimilarity], result of:
          0.0088773295 = score(doc=1142,freq=4.0), product of:
            0.069613084 = queryWeight, product of:
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.051176514 = queryNorm
            0.12752387 = fieldWeight in 1142, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.3602545 = idf(docFreq=30841, maxDocs=44218)
              0.046875 = fieldNorm(doc=1142)
      0.33333334 = coord(1/3)
    
    Abstract
    We provide a brief, non-technical introduction to the text mining methodology known as "topic modeling." We summarize the theory and background of the method and discuss what kinds of things are found by topic models. Using a text corpus comprised of the eight articles from the special issue of Poetics on the subject of topic models, we run a topic model on these articles, both as a way to introduce the methodology and also to help summarize some of the ways in which social and cultural scientists are using topic models. We review some of the critiques and debates over the use of the method and finally, we link these developments back to some of the original innovations in the field of content analysis that were pioneered by Harold D. Lasswell and colleagues during and just after World War II.

Languages

  • e 45
  • d 7

Types

  • a 49
  • el 11
  • m 2
  • p 1
  • s 1
  • More… Less…