Search (208 results, page 1 of 11)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.04

0.043789916 = product of:
  0.054737393 = sum of:
    0.016267091 = product of:
      0.14640382 = sum of:
        0.14640382 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.14640382 = score(doc=562,freq=2.0), product of:
            0.26049665 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03072615 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.11111111 = coord(1/9)
    0.005416122 = weight(_text_:a in 562) [ClassicSimilarity], result of:
      0.005416122 = score(doc=562,freq=8.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.15287387 = fieldWeight in 562, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.020565277 = weight(_text_:j in 562) [ClassicSimilarity], result of:
      0.020565277 = score(doc=562,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.21064025 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.012488906 = product of:
      0.024977813 = sum of:
        0.024977813 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.024977813 = score(doc=562,freq=2.0), product of:
            0.10759774 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03072615 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.8 = coord(4/5)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32
Type: a

Pfister, J.: Clustering von Patent-Dokumenten am Beispiel der Datenbanken des Fachinformationszentrums Karlsruhe (2006) 0.04

0.036090147 = product of:
  0.060150243 = sum of:
    0.0036107479 = weight(_text_:a in 5976) [ClassicSimilarity], result of:
      0.0036107479 = score(doc=5976,freq=2.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.10191591 = fieldWeight in 5976, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=5976)
    0.029119128 = weight(_text_:u in 5976) [ClassicSimilarity], result of:
      0.029119128 = score(doc=5976,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.28942272 = fieldWeight in 5976, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0625 = fieldNorm(doc=5976)
    0.027420368 = weight(_text_:j in 5976) [ClassicSimilarity], result of:
      0.027420368 = score(doc=5976,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.28085366 = fieldWeight in 5976, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.0625 = fieldNorm(doc=5976)
  0.6 = coord(3/5)

Source: Effektive Information Retrieval Verfahren in Theorie und Praxis: ausgewählte und erweiterte Beiträge des Vierten Hildesheimer Evaluierungs- und Retrievalworkshop (HIER 2005), Hildesheim, 20.7.2005. Hrsg.: T. Mandl u. C. Womser-Hacker
Type: a

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.04

0.035762247 = product of:
  0.059603743 = sum of:
    0.0045134346 = weight(_text_:a in 2748) [ClassicSimilarity], result of:
      0.0045134346 = score(doc=2748,freq=2.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.12739488 = fieldWeight in 2748, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.078125 = fieldNorm(doc=2748)
    0.03427546 = weight(_text_:j in 2748) [ClassicSimilarity], result of:
      0.03427546 = score(doc=2748,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.35106707 = fieldWeight in 2748, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.078125 = fieldNorm(doc=2748)
    0.020814845 = product of:
      0.04162969 = sum of:
        0.04162969 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.04162969 = score(doc=2748,freq=2.0), product of:
            0.10759774 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03072615 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Date: 1. 2.2016 18:25:22
Source: Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al
Type: a

Panyr, J.: Automatische Klassifikation und Information Retrieval : Anwendung und Entwicklung komplexer Verfahren in Information-Retrieval-Systemen und ihre Evaluierung (1986) 0.03

0.033923697 = product of:
  0.08480924 = sum of:
    0.043678693 = weight(_text_:u in 32) [ClassicSimilarity], result of:
      0.043678693 = score(doc=32,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.43413407 = fieldWeight in 32, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
    0.041130554 = weight(_text_:j in 32) [ClassicSimilarity], result of:
      0.041130554 = score(doc=32,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.4212805 = fieldWeight in 32, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.09375 = fieldNorm(doc=32)
  0.4 = coord(2/5)

Footnote: Zugleich Dissertation U Saarbrücken 1085

Wätjen, H.-J.; Diekmann, B.; Möller, G.; Carstensen, K.-U.: Bericht zum DFG-Projekt: GERHARD : German Harvest Automated Retrieval and Directory (1998) 0.03

0.028269751 = product of:
  0.070674375 = sum of:
    0.036398914 = weight(_text_:u in 3065) [ClassicSimilarity], result of:
      0.036398914 = score(doc=3065,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.3617784 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
    0.03427546 = weight(_text_:j in 3065) [ClassicSimilarity], result of:
      0.03427546 = score(doc=3065,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.35106707 = fieldWeight in 3065, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.078125 = fieldNorm(doc=3065)
  0.4 = coord(2/5)

Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.03

0.025221486 = product of:
  0.042035807 = sum of:
    0.008981623 = weight(_text_:a in 2158) [ClassicSimilarity], result of:
      0.008981623 = score(doc=2158,freq=22.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.25351265 = fieldWeight in 2158, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
    0.020565277 = weight(_text_:j in 2158) [ClassicSimilarity], result of:
      0.020565277 = score(doc=2158,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.21064025 = fieldWeight in 2158, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.046875 = fieldNorm(doc=2158)
    0.012488906 = product of:
      0.024977813 = sum of:
        0.024977813 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
          0.024977813 = score(doc=2158,freq=2.0), product of:
            0.10759774 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03072615 = queryNorm
            0.23214069 = fieldWeight in 2158, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2158)
      0.5 = coord(1/2)
  0.6 = coord(3/5)

Abstract: This paper introduces a project to develop a reliable, cost-effective method for classifying Internet texts into register categories, and apply that approach to the analysis of a large corpus of web documents. To date, the project has proceeded in 2 key phases. First, we developed a bottom-up method for web register classification, asking end users of the web to utilize a decision-tree survey to code relevant situational characteristics of web documents, resulting in a bottom-up identification of register and subregister categories. We present details regarding the development and testing of this method through a series of 10 pilot studies. Then, in the second phase of our project we applied this procedure to a corpus of 53,000 web documents. An analysis of the results demonstrates the effectiveness of these methods for web register classification and provides a preliminary description of the types and distribution of registers on the web.
Date: 4. 8.2015 19:22:04
Type: a

Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.03
```
0.02503209 = product of:
  0.04172015 = sum of:
    0.006382961 = weight(_text_:a in 2300) [ClassicSimilarity], result of:
      0.006382961 = score(doc=2300,freq=16.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.18016359 = fieldWeight in 2300, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.018199457 = weight(_text_:u in 2300) [ClassicSimilarity], result of:
      0.018199457 = score(doc=2300,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.1808892 = fieldWeight in 2300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
    0.01713773 = weight(_text_:j in 2300) [ClassicSimilarity], result of:
      0.01713773 = score(doc=2300,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.17553353 = fieldWeight in 2300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2300)
  0.6 = coord(3/5)
```
Abstract

Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Type

a

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.02

0.024824893 = product of:
  0.062062234 = sum of:
    0.0072214957 = weight(_text_:a in 5169) [ClassicSimilarity], result of:
      0.0072214957 = score(doc=5169,freq=2.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.20383182 = fieldWeight in 5169, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.125 = fieldNorm(doc=5169)
    0.054840736 = weight(_text_:j in 5169) [ClassicSimilarity], result of:
      0.054840736 = score(doc=5169,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.5617073 = fieldWeight in 5169, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.125 = fieldNorm(doc=5169)
  0.4 = coord(2/5)

Type: a

Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.02
```
0.024518998 = product of:
  0.040864997 = sum of:
    0.0055278065 = weight(_text_:a in 5055) [ClassicSimilarity], result of:
      0.0055278065 = score(doc=5055,freq=12.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.15602624 = fieldWeight in 5055, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5055)
    0.018199457 = weight(_text_:u in 5055) [ClassicSimilarity], result of:
      0.018199457 = score(doc=5055,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.1808892 = fieldWeight in 5055, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5055)
    0.01713773 = weight(_text_:j in 5055) [ClassicSimilarity], result of:
      0.01713773 = score(doc=5055,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.17553353 = fieldWeight in 5055, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5055)
  0.6 = coord(3/5)
```
Abstract

Distant supervision (DS) has the advantage of automatically generating large amounts of labelled training data and has been widely used for relation extraction. However, there are usually many wrong labels in the automatically labelled data in distant supervision (Riedel, Yao, & McCallum, 2010). This paper presents a novel method to reduce the wrong labels. The proposed method uses the semantic Jaccard with word embedding to measure the semantic similarity between the relation phrase in the knowledge base and the dependency phrases between two entities in a sentence to filter the wrong labels. In the process of reducing wrong labels, the semantic Jaccard algorithm selects a core dependency phrase to represent the candidate relation in a sentence, which can capture features for relation classification and avoid the negative impact from irrelevant term sequences that previous neural network models of relation extraction often suffer. In the process of relation classification, the core dependency phrases are also used as the input of a convolutional neural network (CNN) for relation classification. The experimental results show that compared with the methods using original DS data, the methods using filtered DS data performed much better in relation extraction. It indicates that the semantic similarity based method is effective in reducing wrong labels. The relation extraction performance of the CNN model using the core dependency phrases as input is the best of all, which indicates that using the core dependency phrases as input of CNN is enough to capture the features for relation classification and could avoid negative impact from irrelevant terms.

Theme

Semantisches Umfeld in Indexierung u. Retrieval

Type

a

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.02

0.022885505 = product of:
  0.05721376 = sum of:
    0.036398914 = weight(_text_:u in 611) [ClassicSimilarity], result of:
      0.036398914 = score(doc=611,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.3617784 = fieldWeight in 611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.078125 = fieldNorm(doc=611)
    0.020814845 = product of:
      0.04162969 = sum of:
        0.04162969 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.04162969 = score(doc=611,freq=2.0), product of:
            0.10759774 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03072615 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.4 = coord(2/5)

Date: 22. 8.2009 12:54:24

Reiner, U.: Automatische DDC-Klassifizierung bibliografischer Titeldatensätze der Deutschen Nationalbibliografie (2009) 0.02
```
0.022002507 = product of:
  0.036670845 = sum of:
    0.0031269996 = weight(_text_:a in 3284) [ClassicSimilarity], result of:
      0.0031269996 = score(doc=3284,freq=6.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.088261776 = fieldWeight in 3284, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.025217908 = weight(_text_:u in 3284) [ClassicSimilarity], result of:
      0.025217908 = score(doc=3284,freq=6.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.25064746 = fieldWeight in 3284, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03125 = fieldNorm(doc=3284)
    0.008325938 = product of:
      0.016651876 = sum of:
        0.016651876 = weight(_text_:22 in 3284) [ClassicSimilarity], result of:
          0.016651876 = score(doc=3284,freq=2.0), product of:
            0.10759774 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03072615 = queryNorm
            0.15476047 = fieldWeight in 3284, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=3284)
      0.5 = coord(1/2)
  0.6 = coord(3/5)
```
Abstract

Das Klassifizieren von Objekten (z. B. Fauna, Flora, Texte) ist ein Verfahren, das auf menschlicher Intelligenz basiert. In der Informatik - insbesondere im Gebiet der Künstlichen Intelligenz (KI) - wird u. a. untersucht, inweit Verfahren, die menschliche Intelligenz benötigen, automatisiert werden können. Hierbei hat sich herausgestellt, dass die Lösung von Alltagsproblemen eine größere Herausforderung darstellt, als die Lösung von Spezialproblemen, wie z. B. das Erstellen eines Schachcomputers. So ist "Rybka" der seit Juni 2007 amtierende Computerschach-Weltmeistern. Inwieweit Alltagsprobleme mit Methoden der Künstlichen Intelligenz gelöst werden können, ist eine - für den allgemeinen Fall - noch offene Frage. Beim Lösen von Alltagsproblemen spielt die Verarbeitung der natürlichen Sprache, wie z. B. das Verstehen, eine wesentliche Rolle. Den "gesunden Menschenverstand" als Maschine (in der Cyc-Wissensbasis in Form von Fakten und Regeln) zu realisieren, ist Lenat's Ziel seit 1984. Bezüglich des KI-Paradeprojektes "Cyc" gibt es CycOptimisten und Cyc-Pessimisten. Das Verstehen der natürlichen Sprache (z. B. Werktitel, Zusammenfassung, Vorwort, Inhalt) ist auch beim intellektuellen Klassifizieren von bibliografischen Titeldatensätzen oder Netzpublikationen notwendig, um diese Textobjekte korrekt klassifizieren zu können. Seit dem Jahr 2007 werden von der Deutschen Nationalbibliothek nahezu alle Veröffentlichungen mit der Dewey Dezimalklassifikation (DDC) intellektuell klassifiziert.
Die Menge der zu klassifizierenden Veröffentlichungen steigt spätestens seit der Existenz des World Wide Web schneller an, als sie intellektuell sachlich erschlossen werden kann. Daher werden Verfahren gesucht, um die Klassifizierung von Textobjekten zu automatisieren oder die intellektuelle Klassifizierung zumindest zu unterstützen. Seit 1968 gibt es Verfahren zur automatischen Dokumentenklassifizierung (Information Retrieval, kurz: IR) und seit 1992 zur automatischen Textklassifizierung (ATC: Automated Text Categorization). Seit immer mehr digitale Objekte im World Wide Web zur Verfügung stehen, haben Arbeiten zur automatischen Textklassifizierung seit ca. 1998 verstärkt zugenommen. Dazu gehören seit 1996 auch Arbeiten zur automatischen DDC-Klassifizierung bzw. RVK-Klassifizierung von bibliografischen Titeldatensätzen und Volltextdokumenten. Bei den Entwicklungen handelt es sich unseres Wissens bislang um experimentelle und keine im ständigen Betrieb befindlichen Systeme. Auch das VZG-Projekt Colibri/DDC ist seit 2006 u. a. mit der automatischen DDC-Klassifizierung befasst. Die diesbezüglichen Untersuchungen und Entwicklungen dienen zur Beantwortung der Forschungsfrage: "Ist es möglich, eine inhaltlich stimmige DDC-Titelklassifikation aller GVK-PLUS-Titeldatensätze automatisch zu erzielen?"

Date

22. 1.2010 14:41:24

Type

a

Kleinoeder, H.H.; Puzicha, J.: Automatische Katalogisierung am Beispiel einer Pilotanwendung (2002) 0.02

0.021721782 = product of:
  0.054304454 = sum of:
    0.0063188085 = weight(_text_:a in 1154) [ClassicSimilarity], result of:
      0.0063188085 = score(doc=1154,freq=2.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.17835285 = fieldWeight in 1154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.109375 = fieldNorm(doc=1154)
    0.047985647 = weight(_text_:j in 1154) [ClassicSimilarity], result of:
      0.047985647 = score(doc=1154,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.4914939 = fieldWeight in 1154, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.109375 = fieldNorm(doc=1154)
  0.4 = coord(2/5)

Type: a

Ruiz, M.E.; Srinivasan, P.: Combining machine learning and hierarchical indexing structures for text categorization (2001) 0.02

0.021126885 = product of:
  0.035211474 = sum of:
    0.0034134248 = product of:
      0.030720823 = sum of:
        0.030720823 = weight(_text_:p in 1595) [ClassicSimilarity], result of:
          0.030720823 = score(doc=1595,freq=2.0), product of:
            0.11047626 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.03072615 = queryNorm
            0.27807623 = fieldWeight in 1595, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1595)
      0.11111111 = coord(1/9)
    0.0063188085 = weight(_text_:a in 1595) [ClassicSimilarity], result of:
      0.0063188085 = score(doc=1595,freq=8.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.17835285 = fieldWeight in 1595, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
    0.025479238 = weight(_text_:u in 1595) [ClassicSimilarity], result of:
      0.025479238 = score(doc=1595,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.25324488 = fieldWeight in 1595, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1595)
  0.6 = coord(3/5)

Abstract: This paper presents a method that exploits the hierarchical structure of an indexing vocabulary to guide the development and training of machine learning methods for automatic text categorization. We present the design of a hierarchical classifier based an the divide-and-conquer principle. The method is evaluated using backpropagation neural networks, such as the machine learning algorithm, that leam to assign MeSH categories to a subset of MEDLINE records. Comparisons with traditional Rocchio's algorithm adapted for text categorization, as well as flat neural network classifiers, are provided. The results indicate that the use of hierarchical structures improves Performance significantly.
Source: Advances in classification research, vol.10: proceedings of the 10th ASIS SIG/CR Classification Research Workshop. Ed.: Albrechtsen, H. u. J.E. Mai
Type: a

Bollmann, P.; Konrad, E.; Schneider, H.-J.; Zuse, H.: Anwendung automatischer Klassifikationsverfahren mit dem System FAKYR (1978) 0.02

0.020959305 = product of:
  0.034932174 = sum of:
    0.003901057 = product of:
      0.035109513 = sum of:
        0.035109513 = weight(_text_:p in 82) [ClassicSimilarity], result of:
          0.035109513 = score(doc=82,freq=2.0), product of:
            0.11047626 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.03072615 = queryNorm
            0.31780142 = fieldWeight in 82, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0625 = fieldNorm(doc=82)
      0.11111111 = coord(1/9)
    0.0036107479 = weight(_text_:a in 82) [ClassicSimilarity], result of:
      0.0036107479 = score(doc=82,freq=2.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.10191591 = fieldWeight in 82, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=82)
    0.027420368 = weight(_text_:j in 82) [ClassicSimilarity], result of:
      0.027420368 = score(doc=82,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.28085366 = fieldWeight in 82, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.0625 = fieldNorm(doc=82)
  0.6 = coord(3/5)

Type: a

Malo, P.; Sinha, A.; Wallenius, J.; Korhonen, P.: Concept-based document classification using Wikipedia and value function (2011) 0.02

0.019696293 = product of:
  0.032827154 = sum of:
    0.004137696 = product of:
      0.037239265 = sum of:
        0.037239265 = weight(_text_:p in 4948) [ClassicSimilarity], result of:
          0.037239265 = score(doc=4948,freq=4.0), product of:
            0.11047626 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.03072615 = queryNorm
            0.33707932 = fieldWeight in 4948, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.046875 = fieldNorm(doc=4948)
      0.11111111 = coord(1/9)
    0.008124183 = weight(_text_:a in 4948) [ClassicSimilarity], result of:
      0.008124183 = score(doc=4948,freq=18.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.22931081 = fieldWeight in 4948, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=4948)
    0.020565277 = weight(_text_:j in 4948) [ClassicSimilarity], result of:
      0.020565277 = score(doc=4948,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.21064025 = fieldWeight in 4948, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.046875 = fieldNorm(doc=4948)
  0.6 = coord(3/5)

Abstract: In this article, we propose a new concept-based method for document classification. The conceptual knowledge associated with the words is drawn from Wikipedia. The purpose is to utilize the abundant semantic relatedness information available in Wikipedia in an efficient value function-based query learning algorithm. The procedure learns the value function by solving a simple linear programming problem formulated using the training documents. The learning involves a step-wise iterative process that helps in generating a value function with an appropriate set of concepts (dimensions) chosen from a collection of concepts. Once the value function is formulated, it is utilized to make a decision between relevance and irrelevance. The value assigned to a particular document from the value function can be further used to rank the documents according to their relevance. Reuters newswire documents have been used to evaluate the efficacy of the procedure. An extensive comparison with other frameworks has been performed. The results are promising.
Type: a

Godby, C. J.; Stuler, J.: ¬The Library of Congress Classification as a knowledge base for automatic subject categorization (2001) 0.02

0.019049099 = product of:
  0.047622748 = sum of:
    0.00884449 = weight(_text_:a in 1567) [ClassicSimilarity], result of:
      0.00884449 = score(doc=1567,freq=12.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.24964198 = fieldWeight in 1567, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=1567)
    0.038778257 = weight(_text_:j in 1567) [ClassicSimilarity], result of:
      0.038778257 = score(doc=1567,freq=4.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.39718705 = fieldWeight in 1567, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.0625 = fieldNorm(doc=1567)
  0.4 = coord(2/5)

Abstract: This paper describes a set of experiments in adapting a subset of the Library of Congress Classification for use as a database for automatic classification. A high degree of concept integrity was obtained when subject headings were mapped from OCLC's WorldCat database and filtered using the log-likelihood statistic
Footnote: Paper, IFLA Preconference "Subject Retrieval in a Networked Environment", Dublin, OH, August 2001.

Humphrey, S.M.; Névéol, A.; Browne, A.; Gobeil, J.; Ruch, P.; Darmoni, S.J.: Comparing a rule-based versus statistical system for automatic categorization of MEDLINE documents according to biomedical specialty (2009) 0.02
```
0.015327963 = product of:
  0.025546605 = sum of:
    0.0024381608 = product of:
      0.021943446 = sum of:
        0.021943446 = weight(_text_:p in 3300) [ClassicSimilarity], result of:
          0.021943446 = score(doc=3300,freq=2.0), product of:
            0.11047626 = queryWeight, product of:
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.03072615 = queryNorm
            0.19862589 = fieldWeight in 3300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5955126 = idf(docFreq=3298, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3300)
      0.11111111 = coord(1/9)
    0.005970713 = weight(_text_:a in 3300) [ClassicSimilarity], result of:
      0.005970713 = score(doc=3300,freq=14.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.1685276 = fieldWeight in 3300, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
    0.01713773 = weight(_text_:j in 3300) [ClassicSimilarity], result of:
      0.01713773 = score(doc=3300,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.17553353 = fieldWeight in 3300, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3300)
  0.6 = coord(3/5)
```
Abstract

Automatic document categorization is an important research problem in Information Science and Natural Language Processing. Many applications, including, Word Sense Disambiguation and Information Retrieval in large collections, can benefit from such categorization. This paper focuses on automatic categorization of documents from the biomedical literature into broad discipline-based categories. Two different systems are described and contrasted: CISMeF, which uses rules based on human indexing of the documents by the Medical Subject Headings (MeSH) controlled vocabulary in order to assign metaterms (MTs), and Journal Descriptor Indexing (JDI), based on human categorization of about 4,000 journals and statistical associations between journal descriptors (JDs) and textwords in the documents. We evaluate and compare the performance of these systems against a gold standard of humanly assigned categories for 100 MEDLINE documents, using six measures selected from trec_eval. The results show that for five of the measures performance is comparable, and for one measure JDI is superior. We conclude that these results favor JDI, given the significantly greater intellectual overhead involved in human indexing and maintaining a rule base for mapping MeSH terms to MTs. We also note a JDI method that associates JDs with MeSH indexing rather than textwords, and it may be worthwhile to investigate whether this JDI method (statistical) and CISMeF (rule-based) might be combined and then evaluated showing they are complementary to one another.

Type

a

Lindholm, J.; Schönthal, T.; Jansson , K.: Experiences of harvesting Web resources in engineering using automatic classification (2003) 0.01

0.014505944 = product of:
  0.03626486 = sum of:
    0.00884449 = weight(_text_:a in 4088) [ClassicSimilarity], result of:
      0.00884449 = score(doc=4088,freq=12.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.24964198 = fieldWeight in 4088, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=4088)
    0.027420368 = weight(_text_:j in 4088) [ClassicSimilarity], result of:
      0.027420368 = score(doc=4088,freq=2.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.28085366 = fieldWeight in 4088, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.0625 = fieldNorm(doc=4088)
  0.4 = coord(2/5)

Abstract: Authors describe the background and the work involved in setting up Engine-e, a Web index that uses automatic classification as a mean for the selection of resources in Engineering. Considerations in offering a robot-generated Web index as a successor to a manually indexed quality-controlled subject gateway are also discussed
Type: a

Díaz, I.; Ranilla, J.; Montañes, E.; Fernández, J.; Combarro, E.F.: Improving performance of text categorization by combining filtering and support vector machines (2004) 0.01

0.013799927 = product of:
  0.034499817 = sum of:
    0.005416122 = weight(_text_:a in 2234) [ClassicSimilarity], result of:
      0.005416122 = score(doc=2234,freq=8.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.15287387 = fieldWeight in 2234, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=2234)
    0.029083695 = weight(_text_:j in 2234) [ClassicSimilarity], result of:
      0.029083695 = score(doc=2234,freq=4.0), product of:
        0.09763223 = queryWeight, product of:
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.03072615 = queryNorm
        0.2978903 = fieldWeight in 2234, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1774964 = idf(docFreq=5010, maxDocs=44218)
          0.046875 = fieldNorm(doc=2234)
  0.4 = coord(2/5)

Abstract: Text Categorization is the process of assigning documents to a set of previously fixed categories. A lot of research is going an with the goal of automating this time-consuming task. Several different algorithms have been applied, and Support Vector Machines (SVM) have shown very good results. In this report, we try to prove that a previous filtering of the words used by SVM in the classification can improve the overall performance. This hypothesis is systematically tested with three different measures of word relevance, an two different corpus (one of them considered in three different splits), and with both local and global vocabularies. The results show that filtering significantly improves the recall of the method, and that also has the effect of significantly improving the overall performance.
Type: a

Schulze, U.: Erfahrungen bei der Anwendung automatischer Klassifizierungsverfahren zur Inhaltsanalyse einer Dokumentenmenge (1978) 0.01

0.013091951 = product of:
  0.032729875 = sum of:
    0.0036107479 = weight(_text_:a in 83) [ClassicSimilarity], result of:
      0.0036107479 = score(doc=83,freq=2.0), product of:
        0.035428695 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03072615 = queryNorm
        0.10191591 = fieldWeight in 83, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=83)
    0.029119128 = weight(_text_:u in 83) [ClassicSimilarity], result of:
      0.029119128 = score(doc=83,freq=2.0), product of:
        0.10061107 = queryWeight, product of:
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.03072615 = queryNorm
        0.28942272 = fieldWeight in 83, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2744443 = idf(docFreq=4547, maxDocs=44218)
          0.0625 = fieldNorm(doc=83)
  0.4 = coord(2/5)

Type: a

Search (208 results, page 1 of 11)

Authors

Years

Languages

Types

Themes

Subjects