Search (12 results, page 1 of 1)

  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[2010 TO 2020}
  1. Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.01
    0.007500738 = product of:
      0.022502214 = sum of:
        0.022502214 = product of:
          0.06750664 = sum of:
            0.06750664 = weight(_text_:network in 5055) [ClassicSimilarity], result of:
              0.06750664 = score(doc=5055,freq=4.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.34791988 = fieldWeight in 5055, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5055)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.
  2. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.006558894 = product of:
      0.019676682 = sum of:
        0.019676682 = product of:
          0.059030045 = sum of:
            0.059030045 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.059030045 = score(doc=2748,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    1. 2.2016 18:25:22
  3. Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.01
    0.0053038225 = product of:
      0.015911467 = sum of:
        0.015911467 = product of:
          0.047734402 = sum of:
            0.047734402 = weight(_text_:network in 2706) [ClassicSimilarity], result of:
              0.047734402 = score(doc=2706,freq=2.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.2460165 = fieldWeight in 2706, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2706)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Transfer learning utilizes labeled data available from some related domain (source domain) for achieving effective knowledge transformation to the target domain. However, most state-of-the-art cross-domain classification methods treat documents as plain text and ignore the hyperlink (or citation) relationship existing among the documents. In this paper, we propose a novel cross-domain document classification approach called Link-Bridged Topic model (LBT). LBT consists of two key steps. Firstly, LBT utilizes an auxiliary link network to discover the direct or indirect co-citation relationship among documents by embedding the background knowledge into a graph kernel. The mined co-citation relationship is leveraged to bridge the gap across different domains. Secondly, LBT simultaneously combines the content information and link structures into a unified latent topic model. The model is based on an assumption that the documents of source and target domains share some common topics from the point of view of both content information and link structure. By mapping both domains data into the latent topic spaces, LBT encodes the knowledge about domain commonality and difference as the shared topics with associated differential probabilities. The learned latent topics must be consistent with the source and target data, as well as content and link statistics. Then the shared topics act as the bridge to facilitate knowledge transfer from the source to the target domains. Experiments on different types of datasets show that our algorithm significantly improves the generalization performance of cross-domain document classification.
  4. Chae, G.; Park, J.; Park, J.; Yeo, W.S.; Shi, C.: Linking and clustering artworks using social tags : revitalizing crowd-sourced information on cultural collections (2016) 0.01
    0.0053038225 = product of:
      0.015911467 = sum of:
        0.015911467 = product of:
          0.047734402 = sum of:
            0.047734402 = weight(_text_:network in 2852) [ClassicSimilarity], result of:
              0.047734402 = score(doc=2852,freq=2.0), product of:
                0.19402927 = queryWeight, product of:
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.043569047 = queryNorm
                0.2460165 = fieldWeight in 2852, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.4533744 = idf(docFreq=1398, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2852)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Social tagging is one of the most popular methods for collecting crowd-sourced information in galleries, libraries, archives, and museums (GLAMs). However, when the number of social tags grows rapidly, using them becomes problematic and, as a result, they are often left as simply big data that cannot be used for practical purposes. To revitalize the use of this crowd-sourced information, we propose using social tags to link and cluster artworks based on an experimental study using an online collection at the Gyeonggi Museum of Modern Art (GMoMA). We view social tagging as a folksonomy, where artworks are classified by keywords of the crowd's various interpretations and one artwork can belong to several different categories simultaneously. To leverage this strength of social tags, we used a clustering method called "link communities" to detect overlapping communities in a network of artworks constructed by computing similarities between all artwork pairs. We used this framework to identify semantic relationships and clusters of similar artworks. By comparing the clustering results with curators' manual classification results, we demonstrated the potential of social tagging data for automatically clustering artworks in a way that reflects the dynamic perspectives of crowds.
  5. Liu, X.; Yu, S.; Janssens, F.; Glänzel, W.; Moreau, Y.; Moor, B.de: Weighted hybrid clustering by combining text mining and bibliometrics on a large-scale journal database (2010) 0.00
    0.003971059 = product of:
      0.011913176 = sum of:
        0.011913176 = product of:
          0.035739526 = sum of:
            0.035739526 = weight(_text_:29 in 3464) [ClassicSimilarity], result of:
              0.035739526 = score(doc=3464,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23319192 = fieldWeight in 3464, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3464)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    1. 6.2010 9:29:57
  6. Sommer, M.: Automatische Generierung von DDC-Notationen für Hochschulveröffentlichungen (2012) 0.00
    0.003971059 = product of:
      0.011913176 = sum of:
        0.011913176 = product of:
          0.035739526 = sum of:
            0.035739526 = weight(_text_:29 in 587) [ClassicSimilarity], result of:
              0.035739526 = score(doc=587,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23319192 = fieldWeight in 587, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.046875 = fieldNorm(doc=587)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    29. 1.2013 15:44:43
  7. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.00
    0.0039353366 = product of:
      0.011806009 = sum of:
        0.011806009 = product of:
          0.035418026 = sum of:
            0.035418026 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.035418026 = score(doc=690,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    23. 3.2013 13:22:36
  8. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.00
    0.0039353366 = product of:
      0.011806009 = sum of:
        0.011806009 = product of:
          0.035418026 = sum of:
            0.035418026 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.035418026 = score(doc=2158,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    4. 8.2015 19:22:04
  9. Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
    0.003309216 = product of:
      0.009927647 = sum of:
        0.009927647 = product of:
          0.029782942 = sum of:
            0.029782942 = weight(_text_:29 in 967) [ClassicSimilarity], result of:
              0.029782942 = score(doc=967,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.19432661 = fieldWeight in 967, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=967)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    25. 6.2013 19:05:29
  10. Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.00
    0.003309216 = product of:
      0.009927647 = sum of:
        0.009927647 = product of:
          0.029782942 = sum of:
            0.029782942 = weight(_text_:29 in 2300) [ClassicSimilarity], result of:
              0.029782942 = score(doc=2300,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.19432661 = fieldWeight in 2300, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2300)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Source
    Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro
  11. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.00
    0.003279447 = product of:
      0.009838341 = sum of:
        0.009838341 = product of:
          0.029515022 = sum of:
            0.029515022 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
              0.029515022 = score(doc=1107,freq=2.0), product of:
                0.15257138 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.043569047 = queryNorm
                0.19345059 = fieldWeight in 1107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1107)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Date
    28.10.2013 19:22:57
  12. Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.00
    0.0026473727 = product of:
      0.007942118 = sum of:
        0.007942118 = product of:
          0.023826351 = sum of:
            0.023826351 = weight(_text_:29 in 2301) [ClassicSimilarity], result of:
              0.023826351 = score(doc=2301,freq=2.0), product of:
                0.15326229 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.043569047 = queryNorm
                0.15546128 = fieldWeight in 2301, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2301)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Source
    Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro