Search (683 results, page 2 of 35)

Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.05
```
0.053462073 = product of:
  0.10692415 = sum of:
    0.10692415 = product of:
      0.2138483 = sum of:
        0.2138483 = weight(_text_:mining in 521) [ClassicSimilarity], result of:
          0.2138483 = score(doc=521,freq=8.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.74808997 = fieldWeight in 521, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=521)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.

Theme

Data Mining
Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.05
```
0.053462073 = product of:
  0.10692415 = sum of:
    0.10692415 = product of:
      0.2138483 = sum of:
        0.2138483 = weight(_text_:mining in 1205) [ClassicSimilarity], result of:
          0.2138483 = score(doc=1205,freq=8.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.74808997 = fieldWeight in 1205, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=1205)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.

Theme

Data Mining
Drees, B.: Text und data mining : Herausforderungen und Möglichkeiten für Bibliotheken (2016) 0.05
```
0.053462073 = product of:
  0.10692415 = sum of:
    0.10692415 = product of:
      0.2138483 = sum of:
        0.2138483 = weight(_text_:mining in 3952) [ClassicSimilarity], result of:
          0.2138483 = score(doc=3952,freq=8.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.74808997 = fieldWeight in 3952, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=3952)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Text und Data Mining (TDM) gewinnt als wissenschaftliche Methode zunehmend an Bedeutung und stellt wissenschaftliche Bibliotheken damit vor neue Herausforderungen, bietet gleichzeitig aber auch neue Möglichkeiten. Der vorliegende Beitrag gibt einen Überblick über das Thema TDM aus bibliothekarischer Sicht. Hierzu wird der Begriff Text und Data Mining im Kontext verwandter Begriffe diskutiert sowie Ziele, Aufgaben und Methoden von TDM erläutert. Diese werden anhand beispielhafter TDM-Anwendungen in Wissenschaft und Forschung illustriert. Ferner werden technische und rechtliche Probleme und Hindernisse im TDM-Kontext dargelegt. Abschließend wird die Relevanz von TDM für Bibliotheken, sowohl in ihrer Rolle als Informationsvermittler und -anbieter als auch als Anwender von TDM-Methoden, aufgezeigt. Zudem wurde im Rahmen dieser Arbeit eine Befragung der Betreiber von Dokumentenservern an Bibliotheken in Deutschland zum aktuellen Umgang mit TDM durchgeführt, die zeigt, dass hier noch viel Ausbaupotential besteht. Die dem Artikel zugrunde liegenden Forschungsdaten sind unter dem DOI 10.11588/data/10090 publiziert.

Theme

Data Mining

Chardonnens, A.; Hengchen, S.: Text mining for cultural heritage institutions : a 5-step method for cultural heritage institutions (2017) 0.05

0.050404526 = product of:
  0.10080905 = sum of:
    0.10080905 = product of:
      0.2016181 = sum of:
        0.2016181 = weight(_text_:mining in 646) [ClassicSimilarity], result of:
          0.2016181 = score(doc=646,freq=4.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.705306 = fieldWeight in 646, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0625 = fieldNorm(doc=646)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Theme: Data Mining

Tu, Y.-N.; Hsu, S.-L.: Constructing conceptual trajectory maps to trace the development of research fields (2016) 0.05
```
0.049810346 = product of:
  0.09962069 = sum of:
    0.09962069 = product of:
      0.19924138 = sum of:
        0.19924138 = weight(_text_:mining in 3059) [ClassicSimilarity], result of:
          0.19924138 = score(doc=3059,freq=10.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.6969917 = fieldWeight in 3059, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3059)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This study proposes a new method to construct and trace the trajectory of conceptual development of a research field by combining main path analysis, citation analysis, and text-mining techniques. Main path analysis, a method used commonly to trace the most critical path in a citation network, helps describe the developmental trajectory of a research field. This study extends the main path analysis method and applies text-mining techniques in the new method, which reflects the trajectory of conceptual development in an academic research field more accurately than citation frequency, which represents only the articles examined. Articles can be merged based on similarity of concepts, and by merging concepts the history of a research field can be described more precisely. The new method was applied to the "h-index" and "text mining" fields. The precision, recall, and F-measures of the h-index were 0.738, 0.652, and 0.658 and those of text-mining were 0.501, 0.653, and 0.551, respectively. Last, this study not only establishes the conceptual trajectory map of a research field, but also recommends keywords that are more precise than those used currently by researchers. These precise keywords could enable researchers to gather related works more quickly than before.

Theme

Data Mining
Calvanese, D.; Kalayci, T.E.; Montali, M.; Santoso, A.: OBDA for log extraction in process mining (2017) 0.05
```
0.049810346 = product of:
  0.09962069 = sum of:
    0.09962069 = product of:
      0.19924138 = sum of:
        0.19924138 = weight(_text_:mining in 3931) [ClassicSimilarity], result of:
          0.19924138 = score(doc=3931,freq=10.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.6969917 = fieldWeight in 3931, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3931)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Process mining is an emerging area that synergically combines model-based and data-oriented analysis techniques to obtain useful insights on how business processes are executed within an organization. Through process mining, decision makers can discover process models from data, compare expected and actual behaviors, and enrich models with key information about their actual execution. To be applicable, process mining techniques require the input data to be explicitly structured in the form of an event log, which lists when and by whom different case objects (i.e., process instances) have been subject to the execution of tasks. Unfortunately, in many real world set-ups, such event logs are not explicitly given, but are instead implicitly represented in legacy information systems. To apply process mining in this widespread setting, there is a pressing need for techniques able to support various process stakeholders in data preparation and log extraction from legacy information systems. The purpose of this paper is to single out this challenging, open issue, and didactically introduce how techniques from intelligent data management, and in particular ontology-based data access, provide a viable solution with a solid theoretical basis.

Jäger, L.: Von Big Data zu Big Brother (2018) 0.05

0.049369447 = product of:
  0.098738894 = sum of:
    0.098738894 = sum of:
      0.07128276 = weight(_text_:mining in 5234) [ClassicSimilarity], result of:
        0.07128276 = score(doc=5234,freq=2.0), product of:
          0.28585905 = queryWeight, product of:
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.05066224 = queryNorm
          0.24936332 = fieldWeight in 5234, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.642448 = idf(docFreq=425, maxDocs=44218)
            0.03125 = fieldNorm(doc=5234)
      0.027456136 = weight(_text_:22 in 5234) [ClassicSimilarity], result of:
        0.027456136 = score(doc=5234,freq=2.0), product of:
          0.17741053 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05066224 = queryNorm
          0.15476047 = fieldWeight in 5234, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=5234)
  0.5 = coord(1/2)

Date: 22. 1.2018 11:33:49
Theme: Data Mining

Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.05
```
0.046299513 = product of:
  0.09259903 = sum of:
    0.09259903 = product of:
      0.18519805 = sum of:
        0.18519805 = weight(_text_:mining in 3464) [ClassicSimilarity], result of:
          0.18519805 = score(doc=3464,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.64786494 = fieldWeight in 3464, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=3464)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We propose a new hybrid clustering framework to incorporate text mining with bibliometrics in journal set analysis. The framework integrates two different approaches: clustering ensemble and kernel-fusion clustering. To improve the flexibility and the efficiency of processing large-scale data, we propose an information-based weighting scheme to leverage the effect of multiple data sources in hybrid clustering. Three different algorithms are extended by the proposed weighting scheme and they are employed on a large journal set retrieved from the Web of Science (WoS) database. The clustering performance of the proposed algorithms is systematically evaluated using multiple evaluation methods, and they were cross-compared with alternative methods. Experimental results demonstrate that the proposed weighted hybrid clustering strategy is superior to other methods in clustering performance and efficiency. The proposed approach also provides a more refined structural mapping of journal sets, which is useful for monitoring and detecting new trends in different scientific fields.

Theme

Data Mining
Berendt, B.; Krause, B.; Kolbe-Nusser, S.: Intelligent scientific authoring tools : interactive data mining for constructive uses of citation networks (2010) 0.05
```
0.046299513 = product of:
  0.09259903 = sum of:
    0.09259903 = product of:
      0.18519805 = sum of:
        0.18519805 = weight(_text_:mining in 4226) [ClassicSimilarity], result of:
          0.18519805 = score(doc=4226,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.64786494 = fieldWeight in 4226, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=4226)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Many powerful methods and tools exist for extracting meaning from scientific publications, their texts, and their citation links. However, existing proposals often neglect a fundamental aspect of learning: that understanding and learning require an active and constructive exploration of a domain. In this paper, we describe a new method and a tool that use data mining and interactivity to turn the typical search and retrieve dialogue, in which the user asks questions and a system gives answers, into a dialogue that also involves sense-making, in which the user has to become active by constructing a bibliography and a domain model of the search term(s). This model starts from an automatically generated and annotated clustering solution that is iteratively modified by users. The tool is part of an integrated authoring system covering all phases from search through reading and sense-making to writing. Two evaluation studies demonstrate the usability of this interactive and constructive approach, and they show that clusters and groups represent identifiable sub-topics.

Theme

Data Mining
Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.05
```
0.046299513 = product of:
  0.09259903 = sum of:
    0.09259903 = product of:
      0.18519805 = sum of:
        0.18519805 = weight(_text_:mining in 91) [ClassicSimilarity], result of:
          0.18519805 = score(doc=91,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.64786494 = fieldWeight in 91, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=91)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.

Theme

Data Mining
Narock, T.; Zhou, L.; Yoon, V.: Semantic similarity of ontology instances using polarity mining (2013) 0.05
```
0.046299513 = product of:
  0.09259903 = sum of:
    0.09259903 = product of:
      0.18519805 = sum of:
        0.18519805 = weight(_text_:mining in 620) [ClassicSimilarity], result of:
          0.18519805 = score(doc=620,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.64786494 = fieldWeight in 620, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=620)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Semantic similarity is vital to many areas, such as information retrieval. Various methods have been proposed with a focus on comparing unstructured text documents. Several of these have been enhanced with ontology; however, they have not been applied to ontology instances. With the growth in ontology instance data published online through, for example, Linked Open Data, there is an increasing need to apply semantic similarity to ontology instances. Drawing on ontology-supported polarity mining (OSPM), we propose an algorithm that enhances the computation of semantic similarity with polarity mining techniques. The algorithm is evaluated with online customer review data. The experimental results show that the proposed algorithm outperforms the baseline algorithm in multiple settings.
Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.05
```
0.046299513 = product of:
  0.09259903 = sum of:
    0.09259903 = product of:
      0.18519805 = sum of:
        0.18519805 = weight(_text_:mining in 3015) [ClassicSimilarity], result of:
          0.18519805 = score(doc=3015,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.64786494 = fieldWeight in 3015, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

Theme

Data Mining
McArthur, D.; Crompton, H.: Understanding public-access cyberlearning projects using text mining and topic analysis (2012) 0.04
```
0.044103958 = product of:
  0.088207915 = sum of:
    0.088207915 = product of:
      0.17641583 = sum of:
        0.17641583 = weight(_text_:mining in 504) [ClassicSimilarity], result of:
          0.17641583 = score(doc=504,freq=4.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.61714274 = fieldWeight in 504, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0546875 = fieldNorm(doc=504)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The federal government has encouraged open access to publicly funded federal science research results, but it is unclear what knowledge can be gleaned from them and how the knowledge can be used to improve scientific research and shape federal research policies. In this article, we present the results of a preliminary study of cyberlearning projects funded by the National Science Foundation (NSF) that address these issues. Our work demonstrates that text-mining tools can be used to partially automate the process of finding NSF's cyberlearning awards and characterizing the fine-grained topics implicit in award abstracts. The methodology we have established to assess NSF's cyberlearning investments should generalize to other areas of research and other repositories of public-access documents.

Zeng, Q.; Yu, M.; Yu, W.; Xiong, J.; Shi, Y.; Jiang, M.: Faceted hierarchy : a new graph type to organize scientific concepts and a construction method (2019) 0.04

0.04023253 = product of:
  0.08046506 = sum of:
    0.08046506 = product of:
      0.24139518 = sum of:
        0.24139518 = weight(_text_:3a in 400) [ClassicSimilarity], result of:
          0.24139518 = score(doc=400,freq=2.0), product of:
            0.429515 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05066224 = queryNorm
            0.56201804 = fieldWeight in 400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=400)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Content: Vgl.: https%3A%2F%2Faclanthology.org%2FD19-5317.pdf&usg=AOvVaw0ZZFyq5wWTtNTvNkrvjlGA.

Suchenwirth, L.: Sacherschliessung in Zeiten von Corona : neue Herausforderungen und Chancen (2019) 0.04

0.04023253 = product of:
  0.08046506 = sum of:
    0.08046506 = product of:
      0.24139518 = sum of:
        0.24139518 = weight(_text_:3a in 484) [ClassicSimilarity], result of:
          0.24139518 = score(doc=484,freq=2.0), product of:
            0.429515 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05066224 = queryNorm
            0.56201804 = fieldWeight in 484, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=484)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Footnote: https%3A%2F%2Fjournals.univie.ac.at%2Findex.php%2Fvoebm%2Farticle%2Fdownload%2F5332%2F5271%2F&usg=AOvVaw2yQdFGHlmOwVls7ANCpTii.

Djioua, B.; Desclés, J.-P.; Alrahabi, M.: Searching and mining with semantic categories (2012) 0.04
```
0.038582932 = product of:
  0.077165864 = sum of:
    0.077165864 = product of:
      0.15433173 = sum of:
        0.15433173 = weight(_text_:mining in 99) [ClassicSimilarity], result of:
          0.15433173 = score(doc=99,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.5398875 = fieldWeight in 99, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=99)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

A new model is proposed to retrieve information by building automatically a semantic metatext structure for texts that allow searching and extracting discourse and semantic information according to certain linguistic categorizations. This paper presents approaches for searching and mining full text with semantic categories. The model is built up from two engines: The first one, called EXCOM (Djioua et al., 2006; Alrahabi, 2010), is an automatic system for text annotation, related to discourse and semantic maps, which are specification of general linguistic ontologies founded on the Applicative and Cognitive Grammar. The annotation layer uses a linguistic method called Contextual Exploration, which handles the polysemic values of a term in texts. Several 'semantic maps' underlying 'point of views' for text mining guide this automatic annotation process. The second engine uses semantic annotated texts, produced previously in order to create a semantic inverted index, which is able to retrieve relevant documents for queries associated with discourse and semantic categories such as definition, quotation, causality, relations between concepts, etc. (Djioua & Desclés, 2007). This semantic indexation process builds a metatext layer for textual contents. Some data and linguistic rules sets as well as the general architecture that extend third-party software are expressed as supplementary information.
Ye, Z.; Huang, J.X.; He, B.; Lin, H.: Mining a multilingual association dictionary from Wikipedia for cross-language information retrieval (2012) 0.04
```
0.038582932 = product of:
  0.077165864 = sum of:
    0.077165864 = product of:
      0.15433173 = sum of:
        0.15433173 = weight(_text_:mining in 513) [ClassicSimilarity], result of:
          0.15433173 = score(doc=513,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.5398875 = fieldWeight in 513, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=513)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Wikipedia is characterized by its dense link structure and a large number of articles in different languages, which make it a notable Web corpus for knowledge extraction and mining, in particular for mining the multilingual associations. In this paper, motivated by a psychological theory of word meaning, we propose a graph-based approach to constructing a cross-language association dictionary (CLAD) from Wikipedia, which can be used in a variety of cross-language accessing and processing applications. In order to evaluate the quality of the mined CLAD, and to demonstrate how the mined CLAD can be used in practice, we explore two different applications of the mined CLAD to cross-language information retrieval (CLIR). First, we use the mined CLAD to conduct cross-language query expansion; and, second, we use it to filter out translation candidates with low translation probabilities. Experimental results on a variety of standard CLIR test collections show that the CLIR retrieval performance can be substantially improved with the above two applications of CLAD, which indicates that the mined CLAD is of sound quality.
Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Huang, J.X.; Jemaa, M.B.: Mining correlations between medically dependent features and image retrieval models for query classification (2017) 0.04
```
0.038582932 = product of:
  0.077165864 = sum of:
    0.077165864 = product of:
      0.15433173 = sum of:
        0.15433173 = weight(_text_:mining in 3607) [ClassicSimilarity], result of:
          0.15433173 = score(doc=3607,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.5398875 = fieldWeight in 3607, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3607)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.

Theme

Data Mining
Bandaragoda, T.R.; Silva, D. De; Alahakoon, D.; Ranasinghe, W.; Bolton, D.: Text mining for personalized knowledge extraction from online support groups (2018) 0.04
```
0.038582932 = product of:
  0.077165864 = sum of:
    0.077165864 = product of:
      0.15433173 = sum of:
        0.15433173 = weight(_text_:mining in 4574) [ClassicSimilarity], result of:
          0.15433173 = score(doc=4574,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.5398875 = fieldWeight in 4574, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4574)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The traditional approach to health care is being revolutionized by the rapid adoption of patient-centered healthcare models. The successful transformation of patients from passive recipients to active participants is largely attributed to increased access to healthcare information. Online support groups present a platform to seek and exchange information in an inclusive environment. As the volume of text on online support groups continues to grow exponentially, it is imperative to improve the quality of retrieved information in terms of relevance, reliability, and usefulness. We present a text-mining approach that generates a knowledge extraction layer to address this void in personalized information retrieval from online support groups. The knowledge extraction layer encapsulates an ensemble of text-mining techniques with a domain ontology to interpose an investigable and extensible structure on hitherto unstructured text. This structure is not limited to personalized information retrieval for patients, as it also imparts aggregates for crowdsourcing analytics by healthcare researchers. The proposed approach was successfully trialed on an active online support group consisting of 800,000 posts by 72,066 participants. Demonstrations for both patient and researcher use cases accentuate the value of the proposed approach to unlock a broad spectrum of personalized and aggregate knowledge concealed within crowdsourced content.
Xiao, D.; Ji, Y.; Li, Y.; Zhuang, F.; Shi, C.: Coupled matrix factorization and topic modeling for aspect mining (2018) 0.04
```
0.038582932 = product of:
  0.077165864 = sum of:
    0.077165864 = product of:
      0.15433173 = sum of:
        0.15433173 = weight(_text_:mining in 5042) [ClassicSimilarity], result of:
          0.15433173 = score(doc=5042,freq=6.0), product of:
            0.28585905 = queryWeight, product of:
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.05066224 = queryNorm
            0.5398875 = fieldWeight in 5042, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.642448 = idf(docFreq=425, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5042)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Aspect mining, which aims to extract ad hoc aspects from online reviews and predict rating or opinion on each aspect, can satisfy the personalized needs for evaluation of specific aspect on product quality. Recently, with the increase of related research, how to effectively integrate rating and review information has become the key issue for addressing this problem. Considering that matrix factorization is an effective tool for rating prediction and topic modeling is widely used for review processing, it is a natural idea to combine matrix factorization and topic modeling for aspect mining (or called aspect rating prediction). However, this idea faces several challenges on how to address suitable sharing factors, scale mismatch, and dependency relation of rating and review information. In this paper, we propose a novel model to effectively integrate Matrix factorization and Topic modeling for Aspect rating prediction (MaToAsp). To overcome the above challenges and ensure the performance, MaToAsp employs items as the sharing factors to combine matrix factorization and topic modeling, and introduces an interpretive preference probability to eliminate scale mismatch. In the hybrid model, we establish a dependency relation from ratings to sentiment terms in phrases. The experiments on two real datasets including Chinese Dianping and English Tripadvisor prove that MaToAsp not only obtains reasonable aspect identification but also achieves the best aspect rating prediction performance, compared to recent representative baselines.

Search (683 results, page 2 of 35)

Authors

Languages

Types

Themes

Subjects

Classifications