Search (28 results, page 1 of 2)

  • × theme_ss:"Data Mining"
  1. Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.06
    0.05536002 = product of:
      0.11072004 = sum of:
        0.11072004 = sum of:
          0.0616245 = weight(_text_:language in 2908) [ClassicSimilarity], result of:
            0.0616245 = score(doc=2908,freq=2.0), product of:
              0.2030952 = queryWeight, product of:
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.051766515 = queryNorm
              0.30342668 = fieldWeight in 2908, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2908)
          0.049095538 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
            0.049095538 = score(doc=2908,freq=2.0), product of:
              0.18127751 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051766515 = queryNorm
              0.2708308 = fieldWeight in 2908, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2908)
      0.5 = coord(1/2)
    
    Abstract
    Focuses on the information modelling side of conceptual modelling. Deals with the exploitation of fact verbalisations after finishing the actual information system. Verbalisations are used as input for the design of the so-called information model. Exploits these verbalisation in 4 directions: considers their use for a conceptual query language, the verbalisation of instances, the description of the contents of a database and for the verbalisation of queries in a computer supported query environment. Provides an example session with an envisioned tool for end user query formulations that exploits the verbalisation
    Source
    Information systems. 22(1997) nos.5/6, S.349-385
  2. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.04
    0.03954287 = product of:
      0.07908574 = sum of:
        0.07908574 = sum of:
          0.0440175 = weight(_text_:language in 1605) [ClassicSimilarity], result of:
            0.0440175 = score(doc=1605,freq=2.0), product of:
              0.2030952 = queryWeight, product of:
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.051766515 = queryNorm
              0.21673335 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9232929 = idf(docFreq=2376, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
          0.03506824 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
            0.03506824 = score(doc=1605,freq=2.0), product of:
              0.18127751 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051766515 = queryNorm
              0.19345059 = fieldWeight in 1605, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1605)
      0.5 = coord(1/2)
    
    Abstract
    Numerous studies have explored the possibility of uncovering information from web search queries but few have examined the factors that affect web query data sources. We conducted a study that investigated this issue by comparing Google Trends and Baidu Index. Data from these two services are based on queries entered by users into Google and Baidu, two of the largest search engines in the world. We first compared the features and functions of the two services based on documents and extensive testing. We then carried out an empirical study that collected query volume data from the two sources. We found that data from both sources could be used to predict the quality of Chinese universities and companies. Despite the differences between the two services in terms of technology, such as differing methods of language processing, the search volume data from the two were highly correlated and combining the two data sources did not improve the predictive power of the data. However, there was a major difference between the two in terms of data availability. Baidu Index was able to provide more search volume data than Google Trends did. Our analysis showed that the disadvantage of Google Trends in this regard was due to Google's smaller user base in China. The implication of this finding goes beyond China. Google's user bases in many countries are smaller than that in China, so the search volume data related to those countries could result in the same issue as that related to China.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22
  3. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.02
    0.024547769 = product of:
      0.049095538 = sum of:
        0.049095538 = product of:
          0.098191075 = sum of:
            0.098191075 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.098191075 = score(doc=4577,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    2. 4.2000 18:01:22
  4. KDD : techniques and applications (1998) 0.02
    0.021040944 = product of:
      0.04208189 = sum of:
        0.04208189 = product of:
          0.08416378 = sum of:
            0.08416378 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.08416378 = score(doc=6783,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  5. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.02
    0.01906014 = product of:
      0.03812028 = sum of:
        0.03812028 = product of:
          0.07624056 = sum of:
            0.07624056 = weight(_text_:language in 604) [ClassicSimilarity], result of:
              0.07624056 = score(doc=604,freq=6.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.3753932 = fieldWeight in 604, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=604)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
  6. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.02
    0.018675046 = product of:
      0.037350092 = sum of:
        0.037350092 = product of:
          0.074700184 = sum of:
            0.074700184 = weight(_text_:language in 3015) [ClassicSimilarity], result of:
              0.074700184 = score(doc=3015,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.3678087 = fieldWeight in 3015, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
  7. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.02
    0.015562538 = product of:
      0.031125076 = sum of:
        0.031125076 = product of:
          0.062250152 = sum of:
            0.062250152 = weight(_text_:language in 4019) [ClassicSimilarity], result of:
              0.062250152 = score(doc=4019,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30650726 = fieldWeight in 4019, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4019)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
  8. Search tools (1997) 0.02
    0.015406125 = product of:
      0.03081225 = sum of:
        0.03081225 = product of:
          0.0616245 = sum of:
            0.0616245 = weight(_text_:language in 3834) [ClassicSimilarity], result of:
              0.0616245 = score(doc=3834,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30342668 = fieldWeight in 3834, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3834)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Offers brief accounts of Internet search tools. Covers the Lycos revamp; the new navigation service produced jointly by Excite and Netscape, delivering a language specific, locally relevant Web guide for Japan, Germany, France, the UK and Australia; InfoWatcher, a combination offline browser, search engine and push product from Carvelle Inc., USA; Alexa by Alexa Internet and WBI from IBM which are free and provide users with information on how others have used the Web sites which they are visiting; and Concept Explorer from Knowledge Discovery Systems, Inc., California which performs data mining from the Web, Usenet groups, MEDLINE and the US Patent and Trademark Office patent abstracts
  9. Lam, W.; Yang, C.C.; Menczer, F.: Introduction to the special topic section on mining Web resources for enhancing information retrieval (2007) 0.02
    0.015406125 = product of:
      0.03081225 = sum of:
        0.03081225 = product of:
          0.0616245 = sum of:
            0.0616245 = weight(_text_:language in 600) [ClassicSimilarity], result of:
              0.0616245 = score(doc=600,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30342668 = fieldWeight in 600, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=600)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The amount of information on the Web has been expanding at an enormous pace. There are a variety of Web documents in different genres, such as news, reports, reviews. Traditionally, the information displayed on Web sites has been static. Recently, there are many Web sites offering content that is dynamically generated and frequently updated. It is also common for Web sites to contain information in different languages since many countries adopt more than one language. Moreover, content may exist in multimedia formats including text, images, video, and audio.
  10. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.01
    0.014027297 = product of:
      0.028054593 = sum of:
        0.028054593 = product of:
          0.056109186 = sum of:
            0.056109186 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.056109186 = score(doc=1737,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22.11.1998 18:57:22
  11. Lusti, M.: Data Warehousing and Data Mining : Eine Einführung in entscheidungsunterstützende Systeme (1999) 0.01
    0.014027297 = product of:
      0.028054593 = sum of:
        0.028054593 = product of:
          0.056109186 = sum of:
            0.056109186 = weight(_text_:22 in 4261) [ClassicSimilarity], result of:
              0.056109186 = score(doc=4261,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30952093 = fieldWeight in 4261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4261)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    17. 7.2002 19:22:06
  12. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01
    0.014027297 = product of:
      0.028054593 = sum of:
        0.028054593 = product of:
          0.056109186 = sum of:
            0.056109186 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
              0.056109186 = score(doc=1270,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.30952093 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
  13. Wong, M.L.; Leung, K.S.; Cheng, J.C.Y.: Discovering knowledge from noisy databases using genetic programming (2000) 0.01
    0.01320525 = product of:
      0.0264105 = sum of:
        0.0264105 = product of:
          0.052821 = sum of:
            0.052821 = weight(_text_:language in 4863) [ClassicSimilarity], result of:
              0.052821 = score(doc=4863,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.26008 = fieldWeight in 4863, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4863)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines genetic programming and inductive logic programming to induce knowledge represented in various knowledge representation formalisms from noisy databases (LOGENPRO). Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains
  14. Heyer, G.; Läuter, M.; Quasthoff, U.; Wolff, C.: Texttechnologische Anwendungen am Beispiel Text Mining (2000) 0.01
    0.01320525 = product of:
      0.0264105 = sum of:
        0.0264105 = product of:
          0.052821 = sum of:
            0.052821 = weight(_text_:language in 5565) [ClassicSimilarity], result of:
              0.052821 = score(doc=5565,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.26008 = fieldWeight in 5565, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5565)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Sprachtechnologie für eine dynamische Wirtschaft im Medienzeitalter - Language technologies for dynamic business in the age of the media - L'ingénierie linguistique au service de la dynamisation économique à l'ère du multimédia: Tagungsakten der XXVI. Jahrestagung der Internationalen Vereinigung Sprache und Wirtschaft e.V., 23.-25.11.2000, Fachhochschule Köln. Hrsg.: K.-D. Schmitz
  15. Gaizauskas, R.; Wilks, Y.: Information extraction : beyond document retrieval (1998) 0.01
    0.01320525 = product of:
      0.0264105 = sum of:
        0.0264105 = product of:
          0.052821 = sum of:
            0.052821 = weight(_text_:language in 4716) [ClassicSimilarity], result of:
              0.052821 = score(doc=4716,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.26008 = fieldWeight in 4716, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4716)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this paper we give a synoptic view of the growth of the text processing technology of informatione xtraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining
  16. Kraker, P.; Kittel, C,; Enkhbayar, A.: Open Knowledge Maps : creating a visual interface to the world's scientific knowledge based on natural language processing (2016) 0.01
    0.01320525 = product of:
      0.0264105 = sum of:
        0.0264105 = product of:
          0.052821 = sum of:
            0.052821 = weight(_text_:language in 3205) [ClassicSimilarity], result of:
              0.052821 = score(doc=3205,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.26008 = fieldWeight in 3205, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3205)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  17. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 0.01
    0.01245003 = product of:
      0.02490006 = sum of:
        0.02490006 = product of:
          0.04980012 = sum of:
            0.04980012 = weight(_text_:language in 2121) [ClassicSimilarity], result of:
              0.04980012 = score(doc=2121,freq=4.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.2452058 = fieldWeight in 2121, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2121)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Natural Language Processing (NLP) techniques have been successfully used to automatically extract information from unstructured text through a detailed analysis of their content, often to satisfy particular information needs. In this paper, an automatic concept map construction technique, Fuzzy Association Concept Mapping (FACM), is proposed for the conversion of abstracted short texts into concept maps. The approach consists of a linguistic module and a recommendation module. The linguistic module is a text mining method that does not require the use to have any prior knowledge about using NLP techniques. It incorporates rule-based reasoning (RBR) and case based reasoning (CBR) for anaphoric resolution. It aims at extracting the propositions in text so as to construct a concept map automatically. The recommendation module is arrived at by adopting fuzzy set theories. It is an interactive process which provides suggestions of propositions for further human refinement of the automatically generated concept maps. The suggested propositions are relationships among the concepts which are not explicitly found in the paragraphs. This technique helps to stimulate individual reflection and generate new knowledge. Evaluation was carried out by using the Science Citation Index (SCI) abstract database and CNET News as test data, which are well known databases and the quality of the text is assured. Experimental results show that the automatically generated concept maps conform to the outputs generated manually by domain experts, since the degree of difference between them is proportionally small. The method provides users with the ability to convert scientific and short texts into a structured format which can be easily processed by computer. Moreover, it provides knowledge workers with extra time to re-think their written text and to view their knowledge from another angle.
  18. Haravu, L.J.; Neelameghan, A.: Text mining and data mining in knowledge organization and discovery : the making of knowledge-based products (2003) 0.01
    0.011004375 = product of:
      0.02200875 = sum of:
        0.02200875 = product of:
          0.0440175 = sum of:
            0.0440175 = weight(_text_:language in 5653) [ClassicSimilarity], result of:
              0.0440175 = score(doc=5653,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.21673335 = fieldWeight in 5653, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5653)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Discusses the importance of knowledge organization in the context of the information overload caused by the vast quantities of data and information accessible on internal and external networks of an organization. Defines the characteristics of a knowledge-based product. Elaborates on the techniques and applications of text mining in developing knowledge products. Presents two approaches, as case studies, to the making of knowledge products: (1) steps and processes in the planning, designing and development of a composite multilingual multimedia CD product, with the potential international, inter-cultural end users in view, and (2) application of natural language processing software in text mining. Using a text mining software, it is possible to link concept terms from a processed text to a related thesaurus, glossary, schedules of a classification scheme, and facet structured subject representations. Concludes that the products of text mining and data mining could be made more useful if the features of a faceted scheme for subject classification are incorporated into text mining techniques and products.
  19. Varathan, K.D.; Giachanou, A.; Crestani, F.: Comparative opinion mining : a review (2017) 0.01
    0.011004375 = product of:
      0.02200875 = sum of:
        0.02200875 = product of:
          0.0440175 = sum of:
            0.0440175 = weight(_text_:language in 3540) [ClassicSimilarity], result of:
              0.0440175 = score(doc=3540,freq=2.0), product of:
                0.2030952 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.051766515 = queryNorm
                0.21673335 = fieldWeight in 3540, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3540)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Opinion mining refers to the use of natural language processing, text analysis, and computational linguistics to identify and extract subjective information in textual material. Opinion mining, also known as sentiment analysis, has received a lot of attention in recent times, as it provides a number of tools to analyze public opinion on a number of different topics. Comparative opinion mining is a subfield of opinion mining which deals with identifying and extracting information that is expressed in a comparative form (e.g., "paper X is better than the Y"). Comparative opinion mining plays a very important role when one tries to evaluate something because it provides a reference point for the comparison. This paper provides a review of the area of comparative opinion mining. It is the first review that cover specifically this topic as all previous reviews dealt mostly with general opinion mining. This survey covers comparative opinion mining from two different angles. One from the perspective of techniques and the other from the perspective of comparative opinion elements. It also incorporates preprocessing tools as well as data set that were used by past researchers that can be useful to future researchers in the field of comparative opinion mining.
  20. Lackes, R.; Tillmanns, C.: Data Mining für die Unternehmenspraxis : Entscheidungshilfen und Fallstudien mit führenden Softwarelösungen (2006) 0.01
    0.010520472 = product of:
      0.021040944 = sum of:
        0.021040944 = product of:
          0.04208189 = sum of:
            0.04208189 = weight(_text_:22 in 1383) [ClassicSimilarity], result of:
              0.04208189 = score(doc=1383,freq=2.0), product of:
                0.18127751 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051766515 = queryNorm
                0.23214069 = fieldWeight in 1383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1383)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 3.2008 14:46:06

Languages

  • e 20
  • d 8

Types