Search (29 results, page 1 of 2)

  • × theme_ss:"Data Mining"
  1. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.05
    0.045636237 = product of:
      0.13690871 = sum of:
        0.13690871 = sum of:
          0.07453227 = weight(_text_:indexing in 1270) [ClassicSimilarity], result of:
            0.07453227 = score(doc=1270,freq=2.0), product of:
              0.2202888 = queryWeight, product of:
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.057548698 = queryNorm
              0.3383389 = fieldWeight in 1270, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.8278677 = idf(docFreq=2614, maxDocs=44218)
                0.0625 = fieldNorm(doc=1270)
          0.062376432 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
            0.062376432 = score(doc=1270,freq=2.0), product of:
              0.20152573 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.057548698 = queryNorm
              0.30952093 = fieldWeight in 1270, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1270)
      0.33333334 = coord(1/3)
    
    Abstract
    Current algorithms for finding associations among the attributes describing data in a database have a number of shortcomings. Presents a novel method for association generation, that answers all desiderata. The method is different from all existing algorithms and especially suitable to textual databases with binary attributes. Uses subword trees for quick indexing into the required database statistics. Tests the algorithm on the Reuters-22173 database with satisfactory results
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
  2. Fong, A.C.M.: Mining a Web citation database for document clustering (2002) 0.02
    0.02173858 = product of:
      0.06521574 = sum of:
        0.06521574 = product of:
          0.13043147 = sum of:
            0.13043147 = weight(_text_:indexing in 3940) [ClassicSimilarity], result of:
              0.13043147 = score(doc=3940,freq=2.0), product of:
                0.2202888 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.057548698 = queryNorm
                0.5920931 = fieldWeight in 3940, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3940)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Theme
    Citation indexing
  3. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.02
    0.018193126 = product of:
      0.054579377 = sum of:
        0.054579377 = product of:
          0.109158754 = sum of:
            0.109158754 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.109158754 = score(doc=4577,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    2. 4.2000 18:01:22
  4. Maaten, L. van den; Hinton, G.: Visualizing non-metric similarities in multiple maps (2012) 0.02
    0.016934892 = product of:
      0.050804675 = sum of:
        0.050804675 = product of:
          0.15241402 = sum of:
            0.15241402 = weight(_text_:objects in 3884) [ClassicSimilarity], result of:
              0.15241402 = score(doc=3884,freq=4.0), product of:
                0.30587542 = queryWeight, product of:
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.057548698 = queryNorm
                0.49828792 = fieldWeight in 3884, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3884)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Techniques for multidimensional scaling visualize objects as points in a low-dimensional metric map. As a result, the visualizations are subject to the fundamental limitations of metric spaces. These limitations prevent multidimensional scaling from faithfully representing non-metric similarity data such as word associations or event co-occurrences. In particular, multidimensional scaling cannot faithfully represent intransitive pairwise similarities in a visualization, and it cannot faithfully visualize "central" objects. In this paper, we present an extension of a recently proposed multidimensional scaling technique called t-SNE. The extension aims to address the problems of traditional multidimensional scaling techniques when these techniques are used to visualize non-metric similarities. The new technique, called multiple maps t-SNE, alleviates these problems by constructing a collection of maps that reveal complementary structure in the similarity data. We apply multiple maps t-SNE to a large data set of word association data and to a data set of NIPS co-authorships, demonstrating its ability to successfully visualize non-metric similarities.
  5. Fayyad, U.M.; Djorgovski, S.G.; Weir, N.: From digitized images to online catalogs : data ming a sky server (1996) 0.02
    0.01596637 = product of:
      0.04789911 = sum of:
        0.04789911 = product of:
          0.14369732 = sum of:
            0.14369732 = weight(_text_:objects in 6625) [ClassicSimilarity], result of:
              0.14369732 = score(doc=6625,freq=2.0), product of:
                0.30587542 = queryWeight, product of:
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.057548698 = queryNorm
                0.46979034 = fieldWeight in 6625, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.0625 = fieldNorm(doc=6625)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Offers a data mining approach based on machine learning classification methods to the problem of automated cataloguing of online databases of digital images resulting from sky surveys. The SKICAT system automates the reduction and analysis of 3 terabytes of images expected to contain about 2 billion sky objects. It offers a solution to problems associated with the analysis of large data sets in science
  6. KDD : techniques and applications (1998) 0.02
    0.015594108 = product of:
      0.046782322 = sum of:
        0.046782322 = product of:
          0.093564644 = sum of:
            0.093564644 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.093564644 = score(doc=6783,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  7. Maaten, L. van den: Accelerating t-SNE using Tree-Based Algorithms (2014) 0.01
    0.0139705725 = product of:
      0.041911718 = sum of:
        0.041911718 = product of:
          0.12573515 = sum of:
            0.12573515 = weight(_text_:objects in 3886) [ClassicSimilarity], result of:
              0.12573515 = score(doc=3886,freq=2.0), product of:
                0.30587542 = queryWeight, product of:
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.057548698 = queryNorm
                0.41106653 = fieldWeight in 3886, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3886)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The paper investigates the acceleration of t-SNE-an embedding technique that is commonly used for the visualization of high-dimensional data in scatter plots-using two tree-based algorithms. In particular, the paper develops variants of the Barnes-Hut algorithm and of the dual-tree algorithm that approximate the gradient used for learning t-SNE embeddings in O(N*logN). Our experiments show that the resulting algorithms substantially accelerate t-SNE, and that they make it possible to learn embeddings of data sets with millions of objects. Somewhat counterintuitively, the Barnes-Hut variant of t-SNE appears to outperform the dual-tree variant.
  8. Loh, S.; Oliveira, J.P.M. de; Gastal, F.L.: Knowledge discovery in textual documentation : qualitative and quantitative analyses (2001) 0.01
    0.011974777 = product of:
      0.03592433 = sum of:
        0.03592433 = product of:
          0.10777299 = sum of:
            0.10777299 = weight(_text_:objects in 4482) [ClassicSimilarity], result of:
              0.10777299 = score(doc=4482,freq=2.0), product of:
                0.30587542 = queryWeight, product of:
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.057548698 = queryNorm
                0.35234275 = fieldWeight in 4482, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4482)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high-level textual characteristics. Instead of applying mining techniques on attribute values, terms or keywords extracted from texts, the discovery process works over conceptss identified in texts. Concepts represent real world events and objects, and they help the user to understand ideas, trends, thoughts, opinions and intentions present in texts. The approach combines a quasi-automatic categorisation task (for qualitative analysis) with a mining process (for quantitative analysis). The goal is to find new and useful knowledge inside a textual collection through the use of mining techniques applied over concepts (representing text content). In this paper, an application of the approach to medical records of a psychiatric hospital is presented. The approach helps physicians to extract knowledge about patients and diseases. This knowledge may be used for epidemiological studies, for training professionals and it may be also used to support physicians to diagnose and evaluate diseases.
  9. Survey of text mining : clustering, classification, and retrieval (2004) 0.01
    0.010979641 = product of:
      0.032938924 = sum of:
        0.032938924 = product of:
          0.06587785 = sum of:
            0.06587785 = weight(_text_:indexing in 804) [ClassicSimilarity], result of:
              0.06587785 = score(doc=804,freq=4.0), product of:
                0.2202888 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.057548698 = queryNorm
                0.29905218 = fieldWeight in 804, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=804)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Extracting content from text continues to be an important research problem for information processing and management. Approaches to capture the semantics of text-based document collections may be based on Bayesian models, probability theory, vector space models, statistical models, or even graph theory. As the volume of digitized textual media continues to grow, so does the need for designing robust, scalable indexing and search strategies (software) to meet a variety of user needs. Knowledge extraction or creation from text requires systematic yet reliable processing that can be codified and adapted for changing needs and environments. This book will draw upon experts in both academia and industry to recommend practical approaches to the purification, indexing, and mining of textual information. It will address document identification, clustering and categorizing documents, cleaning text, and visualizing semantic models of text.
  10. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.01
    0.010979641 = product of:
      0.032938924 = sum of:
        0.032938924 = product of:
          0.06587785 = sum of:
            0.06587785 = weight(_text_:indexing in 604) [ClassicSimilarity], result of:
              0.06587785 = score(doc=604,freq=4.0), product of:
                0.2202888 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.057548698 = queryNorm
                0.29905218 = fieldWeight in 604, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=604)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
  11. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.01
    0.010396073 = product of:
      0.031188216 = sum of:
        0.031188216 = product of:
          0.062376432 = sum of:
            0.062376432 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.062376432 = score(doc=1737,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22.11.1998 18:57:22
  12. Lusti, M.: Data Warehousing and Data Mining : Eine Einführung in entscheidungsunterstützende Systeme (1999) 0.01
    0.010396073 = product of:
      0.031188216 = sum of:
        0.031188216 = product of:
          0.062376432 = sum of:
            0.062376432 = weight(_text_:22 in 4261) [ClassicSimilarity], result of:
              0.062376432 = score(doc=4261,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.30952093 = fieldWeight in 4261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4261)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    17. 7.2002 19:22:06
  13. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.01
    0.009978982 = product of:
      0.029936943 = sum of:
        0.029936943 = product of:
          0.089810826 = sum of:
            0.089810826 = weight(_text_:objects in 605) [ClassicSimilarity], result of:
              0.089810826 = score(doc=605,freq=2.0), product of:
                0.30587542 = queryWeight, product of:
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.057548698 = queryNorm
                0.29361898 = fieldWeight in 605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=605)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.
  14. Maaten, L. van den; Hinton, G.: Visualizing data using t-SNE (2008) 0.01
    0.009978982 = product of:
      0.029936943 = sum of:
        0.029936943 = product of:
          0.089810826 = sum of:
            0.089810826 = weight(_text_:objects in 3888) [ClassicSimilarity], result of:
              0.089810826 = score(doc=3888,freq=2.0), product of:
                0.30587542 = queryWeight, product of:
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.057548698 = queryNorm
                0.29361898 = fieldWeight in 3888, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.315071 = idf(docFreq=590, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3888)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    We present a new technique called "t-SNE" that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets.
  15. Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.01
    0.009096563 = product of:
      0.027289689 = sum of:
        0.027289689 = product of:
          0.054579377 = sum of:
            0.054579377 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
              0.054579377 = score(doc=2908,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.2708308 = fieldWeight in 2908, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2908)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information systems. 22(1997) nos.5/6, S.349-385
  16. Lackes, R.; Tillmanns, C.: Data Mining für die Unternehmenspraxis : Entscheidungshilfen und Fallstudien mit führenden Softwarelösungen (2006) 0.01
    0.007797054 = product of:
      0.023391161 = sum of:
        0.023391161 = product of:
          0.046782322 = sum of:
            0.046782322 = weight(_text_:22 in 1383) [ClassicSimilarity], result of:
              0.046782322 = score(doc=1383,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.23214069 = fieldWeight in 1383, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1383)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 3.2008 14:46:06
  17. Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.01
    0.0077637783 = product of:
      0.023291335 = sum of:
        0.023291335 = product of:
          0.04658267 = sum of:
            0.04658267 = weight(_text_:indexing in 597) [ClassicSimilarity], result of:
              0.04658267 = score(doc=597,freq=2.0), product of:
                0.2202888 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.057548698 = queryNorm
                0.21146181 = fieldWeight in 597, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=597)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    With the overwhelming volume of information, the task of finding relevant information on a given topic on the Web is becoming increasingly difficult. Web search engines hence become one of the most popular solutions available on the Web. However, it has never been easy for novice users to organize and represent their information needs using simple queries. Users have to keep modifying their input queries until they get expected results. Therefore, it is often desirable for search engines to give suggestions on related queries to users. Besides, by identifying those related queries, search engines can potentially perform optimizations on their systems, such as query expansion and file indexing. In this work we propose a method that suggests a list of related queries given an initial input query. The related queries are based in the query log of previously submitted queries by human users, which can be identified using an enhanced model of association rules. Users can utilize the suggested related queries to tune or redirect the search process. Our method not only discovers the related queries, but also ranks them according to the degree of their relatedness. Unlike many other rival techniques, it also performs reasonably well on less frequent input queries.
  18. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.01
    0.0077637783 = product of:
      0.023291335 = sum of:
        0.023291335 = product of:
          0.04658267 = sum of:
            0.04658267 = weight(_text_:indexing in 4019) [ClassicSimilarity], result of:
              0.04658267 = score(doc=4019,freq=2.0), product of:
                0.2202888 = queryWeight, product of:
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.057548698 = queryNorm
                0.21146181 = fieldWeight in 4019, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.8278677 = idf(docFreq=2614, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4019)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.
  19. Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.01
    0.006497545 = product of:
      0.019492636 = sum of:
        0.019492636 = product of:
          0.03898527 = sum of:
            0.03898527 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
              0.03898527 = score(doc=668,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.19345059 = fieldWeight in 668, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=668)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22. 3.2013 19:43:01
  20. Vaughan, L.; Chen, Y.: Data mining from web search queries : a comparison of Google trends and Baidu index (2015) 0.01
    0.006497545 = product of:
      0.019492636 = sum of:
        0.019492636 = product of:
          0.03898527 = sum of:
            0.03898527 = weight(_text_:22 in 1605) [ClassicSimilarity], result of:
              0.03898527 = score(doc=1605,freq=2.0), product of:
                0.20152573 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.057548698 = queryNorm
                0.19345059 = fieldWeight in 1605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1605)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.1, S.13-22

Languages

  • e 22
  • d 7

Types