Search (13 results, page 1 of 1)

  • × year_i:[2010 TO 2020}
  • × theme_ss:"Automatisches Klassifizieren"
  1. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.04
    0.035790663 = product of:
      0.053685993 = sum of:
        0.037639882 = weight(_text_:resources in 5041) [ClassicSimilarity], result of:
          0.037639882 = score(doc=5041,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.20165458 = fieldWeight in 5041, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
        0.016046109 = product of:
          0.032092217 = sum of:
            0.032092217 = weight(_text_:management in 5041) [ClassicSimilarity], result of:
              0.032092217 = score(doc=5041,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.18620178 = fieldWeight in 5041, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5041)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.
    Source
    Information processing and management. 56(2019) no.1, S.228-246
  2. Golub, K.; Hansson, J.; Soergel, D.; Tudhope, D.: Managing classification in libraries : a methodological outline for evaluating automatic subject indexing and classification in Swedish library catalogues (2015) 0.01
    0.012546628 = product of:
      0.037639882 = sum of:
        0.037639882 = weight(_text_:resources in 2300) [ClassicSimilarity], result of:
          0.037639882 = score(doc=2300,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.20165458 = fieldWeight in 2300, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2300)
      0.33333334 = coord(1/3)
    
    Abstract
    Subject terms play a crucial role in resource discovery but require substantial effort to produce. Automatic subject classification and indexing address problems of scale and sustainability and can be used to enrich existing bibliographic records, establish more connections across and between resources and enhance consistency of bibliographic data. The paper aims to put forward a complex methodological framework to evaluate automatic classification tools of Swedish textual documents based on the Dewey Decimal Classification (DDC) recently introduced to Swedish libraries. Three major complementary approaches are suggested: a quality-built gold standard, retrieval effects, domain analysis. The gold standard is built based on input from at least two catalogue librarians, end-users expert in the subject, end users inexperienced in the subject and automated tools. Retrieval effects are studied through a combination of assigned and free tasks, including factual and comprehensive types. The study also takes into consideration the different role and character of subject terms in various knowledge domains, such as scientific disciplines. As a theoretical framework, domain analysis is used and applied in relation to the implementation of DDC in Swedish libraries and chosen domains of knowledge within the DDC itself.
  3. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
    0.012546628 = product of:
      0.037639882 = sum of:
        0.037639882 = weight(_text_:resources in 3311) [ClassicSimilarity], result of:
          0.037639882 = score(doc=3311,freq=2.0), product of:
            0.18665522 = queryWeight, product of:
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.051133685 = queryNorm
            0.20165458 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.650338 = idf(docFreq=3122, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.33333334 = coord(1/3)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
  4. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.011546515 = product of:
      0.034639545 = sum of:
        0.034639545 = product of:
          0.06927909 = sum of:
            0.06927909 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.06927909 = score(doc=2748,freq=2.0), product of:
                0.17906146 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051133685 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    1. 2.2016 18:25:22
  5. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01
    0.0069279084 = product of:
      0.020783724 = sum of:
        0.020783724 = product of:
          0.04156745 = sum of:
            0.04156745 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.04156745 = score(doc=690,freq=2.0), product of:
                0.17906146 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051133685 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    23. 3.2013 13:22:36
  6. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01
    0.0069279084 = product of:
      0.020783724 = sum of:
        0.020783724 = product of:
          0.04156745 = sum of:
            0.04156745 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.04156745 = score(doc=2158,freq=2.0), product of:
                0.17906146 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051133685 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    4. 8.2015 19:22:04
  7. Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.01
    0.0064184438 = product of:
      0.01925533 = sum of:
        0.01925533 = product of:
          0.03851066 = sum of:
            0.03851066 = weight(_text_:management in 3331) [ClassicSimilarity], result of:
              0.03851066 = score(doc=3331,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.22344214 = fieldWeight in 3331, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3331)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Automatic text classification (TC) is essential for the management of information. To properly classify a document d, it is essential to identify the semantics of each term t in d, while the semantics heavily depend on context (neighboring terms) of t in d. Therefore, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. The results of the term context recognition are used to assess term frequencies of terms, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies, without needing to modify the classifiers. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Empirical results show that CTFA successfully enhances performance of several kinds of text classifiers on different experimental data.
  8. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
    0.0057732575 = product of:
      0.017319772 = sum of:
        0.017319772 = product of:
          0.034639545 = sum of:
            0.034639545 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
              0.034639545 = score(doc=1107,freq=2.0), product of:
                0.17906146 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051133685 = queryNorm
                0.19345059 = fieldWeight in 1107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1107)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    28.10.2013 19:22:57
  9. Yang, P.; Gao, W.; Tan, Q.; Wong, K.-F.: ¬A link-bridged topic model for cross-domain document classification (2013) 0.01
    0.005348703 = product of:
      0.016046109 = sum of:
        0.016046109 = product of:
          0.032092217 = sum of:
            0.032092217 = weight(_text_:management in 2706) [ClassicSimilarity], result of:
              0.032092217 = score(doc=2706,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.18620178 = fieldWeight in 2706, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2706)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 49(2013) no.6, S.1181-1193
  10. Borodin, Y.; Polishchuk, V.; Mahmud, J.; Ramakrishnan, I.V.; Stent, A.: Live and learn from mistakes : a lightweight system for document classification (2013) 0.01
    0.005348703 = product of:
      0.016046109 = sum of:
        0.016046109 = product of:
          0.032092217 = sum of:
            0.032092217 = weight(_text_:management in 2722) [ClassicSimilarity], result of:
              0.032092217 = score(doc=2722,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.18620178 = fieldWeight in 2722, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2722)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 49(2013) no.1, S.83-98
  11. Wang, H.; Hong, M.: Supervised Hebb rule based feature selection for text classification (2019) 0.01
    0.005348703 = product of:
      0.016046109 = sum of:
        0.016046109 = product of:
          0.032092217 = sum of:
            0.032092217 = weight(_text_:management in 5036) [ClassicSimilarity], result of:
              0.032092217 = score(doc=5036,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.18620178 = fieldWeight in 5036, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5036)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 56(2019) no.1, S.167-191
  12. Ru, C.; Tang, J.; Li, S.; Xie, S.; Wang, T.: Using semantic similarity to reduce wrong labels in distant supervision for relation extraction (2018) 0.01
    0.005348703 = product of:
      0.016046109 = sum of:
        0.016046109 = product of:
          0.032092217 = sum of:
            0.032092217 = weight(_text_:management in 5055) [ClassicSimilarity], result of:
              0.032092217 = score(doc=5055,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.18620178 = fieldWeight in 5055, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5055)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 54(2018) no.4, S.593-608
  13. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.00
    0.0042789625 = product of:
      0.0128368875 = sum of:
        0.0128368875 = product of:
          0.025673775 = sum of:
            0.025673775 = weight(_text_:management in 5051) [ClassicSimilarity], result of:
              0.025673775 = score(doc=5051,freq=2.0), product of:
                0.17235184 = queryWeight, product of:
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.051133685 = queryNorm
                0.14896142 = fieldWeight in 5051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.3706124 = idf(docFreq=4130, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5051)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 54(2018) no.6, S.1129-1153