Search (4 results, page 1 of 1)

  • × language_ss:"e"
  • × theme_ss:"Data Mining"
  • × year_i:[2000 TO 2010}
  1. Loh, S.; Oliveira, J.P.M. de; Gastal, F.L.: Knowledge discovery in textual documentation : qualitative and quantitative analyses (2001) 0.03
    0.03397046 = product of:
      0.13588184 = sum of:
        0.13588184 = weight(_text_:objects in 4482) [ClassicSimilarity], result of:
          0.13588184 = score(doc=4482,freq=2.0), product of:
            0.38565242 = queryWeight, product of:
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.072558284 = queryNorm
            0.35234275 = fieldWeight in 4482, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.046875 = fieldNorm(doc=4482)
      0.25 = coord(1/4)
    
    Abstract
    This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high-level textual characteristics. Instead of applying mining techniques on attribute values, terms or keywords extracted from texts, the discovery process works over conceptss identified in texts. Concepts represent real world events and objects, and they help the user to understand ideas, trends, thoughts, opinions and intentions present in texts. The approach combines a quasi-automatic categorisation task (for qualitative analysis) with a mining process (for quantitative analysis). The goal is to find new and useful knowledge inside a textual collection through the use of mining techniques applied over concepts (representing text content). In this paper, an application of the approach to medical records of a psychiatric hospital is presented. The approach helps physicians to extract knowledge about patients and diseases. This knowledge may be used for epidemiological studies, for training professionals and it may be also used to support physicians to diagnose and evaluate diseases.
  2. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.03
    0.028308718 = product of:
      0.11323487 = sum of:
        0.11323487 = weight(_text_:objects in 605) [ClassicSimilarity], result of:
          0.11323487 = score(doc=605,freq=2.0), product of:
            0.38565242 = queryWeight, product of:
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.072558284 = queryNorm
            0.29361898 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.0390625 = fieldNorm(doc=605)
      0.25 = coord(1/4)
    
    Abstract
    Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.
  3. Maaten, L. van den; Hinton, G.: Visualizing data using t-SNE (2008) 0.03
    0.028308718 = product of:
      0.11323487 = sum of:
        0.11323487 = weight(_text_:objects in 3888) [ClassicSimilarity], result of:
          0.11323487 = score(doc=3888,freq=2.0), product of:
            0.38565242 = queryWeight, product of:
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.072558284 = queryNorm
            0.29361898 = fieldWeight in 3888, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.315071 = idf(docFreq=590, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3888)
      0.25 = coord(1/4)
    
    Abstract
    We present a new technique called "t-SNE" that visualizes high-dimensional data by giving each datapoint a location in a two or three-dimensional map. The technique is a variation of Stochastic Neighbor Embedding (Hinton and Roweis, 2002) that is much easier to optimize, and produces significantly better visualizations by reducing the tendency to crowd points together in the center of the map. t-SNE is better than existing techniques at creating a single map that reveals structure at many different scales. This is particularly important for high-dimensional data that lie on several different, but related, low-dimensional manifolds, such as images of objects from multiple classes seen from multiple viewpoints. For visualizing the structure of very large data sets, we show how t-SNE can use random walks on neighborhood graphs to allow the implicit structure of all of the data to influence the way in which a subset of the data is displayed. We illustrate the performance of t-SNE on a wide variety of data sets and compare it with many other non-parametric visualization techniques, including Sammon mapping, Isomap, and Locally Linear Embedding. The visualizations produced by t-SNE are significantly better than those produced by the other techniques on almost all of the data sets.
  4. Information visualization in data mining and knowledge discovery (2002) 0.00
    0.0049153226 = product of:
      0.01966129 = sum of:
        0.01966129 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
          0.01966129 = score(doc=1789,freq=2.0), product of:
            0.25408673 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.072558284 = queryNorm
            0.07738023 = fieldWeight in 1789, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.015625 = fieldNorm(doc=1789)
      0.25 = coord(1/4)
    
    Date
    23. 3.2008 19:10:22