Search (9 results, page 1 of 1)

  • × theme_ss:"Automatisches Klassifizieren"
  • × year_i:[2010 TO 2020}
  1. Yilmaz, T.; Ozcan, R.; Altingovde, I.S.; Ulusoy, Ö.: Improving educational web search for question-like queries through subject classification (2019) 0.10
    0.09705824 = product of:
      0.14558735 = sum of:
        0.09489272 = weight(_text_:search in 5041) [ClassicSimilarity], result of:
          0.09489272 = score(doc=5041,freq=16.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.54307455 = fieldWeight in 5041, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5041)
        0.05069464 = product of:
          0.10138928 = sum of:
            0.10138928 = weight(_text_:engines in 5041) [ClassicSimilarity], result of:
              0.10138928 = score(doc=5041,freq=4.0), product of:
                0.25542772 = queryWeight, product of:
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.05027291 = queryNorm
                0.39693922 = fieldWeight in 5041, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.080822 = idf(docFreq=746, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5041)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.
  2. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.07
    0.06743714 = product of:
      0.10115571 = sum of:
        0.06709928 = weight(_text_:search in 2748) [ClassicSimilarity], result of:
          0.06709928 = score(doc=2748,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.3840117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
        0.03405643 = product of:
          0.06811286 = sum of:
            0.06811286 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.06811286 = score(doc=2748,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Date
    1. 2.2016 18:25:22
    Source
    Semantic keyword-based search on structured data sources: First COST Action IC1302 International KEYSTONE Conference, IKC 2015, Coimbra, Portugal, September 8-9, 2015. Revised Selected Papers. Eds.: J. Cardoso et al
  3. Barthel, S.; Tönnies, S.; Balke, W.-T.: Large-scale experiments for mathematical document classification (2013) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 1056) [ClassicSimilarity], result of:
          0.03354964 = score(doc=1056,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 1056, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1056)
      0.33333334 = coord(1/3)
    
    Abstract
    The ever increasing amount of digitally available information is curse and blessing at the same time. On the one hand, users have increasingly large amounts of information at their fingertips. On the other hand, the assessment and refinement of web search results becomes more and more tiresome and difficult for non-experts in a domain. Therefore, established digital libraries offer specialized collections with a certain degree of quality. This quality can largely be attributed to the great effort invested into semantic enrichment of the provided documents e.g. by annotating their documents with respect to a domain-specific taxonomy. This process is still done manually in many domains, e.g. chemistry CAS, medicine MeSH, or mathematics MSC. But due to the growing amount of data, this manual task gets more and more time consuming and expensive. The only solution for this problem seems to employ automated classification algorithms, but from evaluations done in previous research, conclusions to a real world scenario are difficult to make. We therefore conducted a large scale feasibility study on a real world data set from one of the biggest mathematical digital libraries, i.e. Zentralblatt MATH, with special focus on its practical applicability.
  4. Fang, H.: Classifying research articles in multidisciplinary sciences journals into subject categories (2015) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 2194) [ClassicSimilarity], result of:
          0.03354964 = score(doc=2194,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 2194, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2194)
      0.33333334 = coord(1/3)
    
    Abstract
    In the Thomson Reuters Web of Science database, the subject categories of a journal are applied to all articles in the journal. However, many articles in multidisciplinary Sciences journals may only be represented by a small number of subject categories. To provide more accurate information on the research areas of articles in such journals, we can classify articles in these journals into subject categories as defined by Web of Science based on their references. For an article in a multidisciplinary sciences journal, the method counts the subject categories in all of the article's references indexed by Web of Science, and uses the most numerous subject categories of the references to determine the most appropriate classification of the article. We used articles in an issue of Proceedings of the National Academy of Sciences (PNAS) to validate the correctness of the method by comparing the obtained results with the categories of the articles as defined by PNAS and their content. This study shows that the method provides more precise search results for the subject category of interest in bibliometric investigations through recognition of articles in multidisciplinary sciences journals whose work relates to a particular subject category.
  5. AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 2836) [ClassicSimilarity], result of:
          0.03354964 = score(doc=2836,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.33333334 = coord(1/3)
    
    Abstract
    Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.
  6. Smiraglia, R.P.; Cai, X.: Tracking the evolution of clustering, machine learning, automatic indexing and automatic classification in knowledge organization (2017) 0.01
    0.011183213 = product of:
      0.03354964 = sum of:
        0.03354964 = weight(_text_:search in 3627) [ClassicSimilarity], result of:
          0.03354964 = score(doc=3627,freq=2.0), product of:
            0.1747324 = queryWeight, product of:
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.05027291 = queryNorm
            0.19200584 = fieldWeight in 3627, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.475677 = idf(docFreq=3718, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3627)
      0.33333334 = coord(1/3)
    
    Abstract
    A very important extension of the traditional domain of knowledge organization (KO) arises from attempts to incorporate techniques devised in the computer science domain for automatic concept extraction and for grouping, categorizing, clustering and otherwise organizing knowledge using mechanical means. Four specific terms have emerged to identify the most prevalent techniques: machine learning, clustering, automatic indexing, and automatic classification. Our study presents three domain analytical case analyses in search of answers. The first case relies on citations located using the ISKO-supported "Knowledge Organization Bibliography." The second case relies on works in both Web of Science and SCOPUS. Case three applies co-word analysis and citation analysis to the contents of the papers in the present special issue. We observe scholars involved in "clustering" and "automatic classification" who share common thematic emphases. But we have found no coherence, no common activity and no social semantics. We have not found a research front, or a common teleology within the KO domain. We also have found a lively group of authors who have succeeded in submitting papers to this special issue, and their work quite interestingly aligns with the case studies we report. There is an emphasis on KO for information retrieval; there is much work on clustering (which involves conceptual points within texts) and automatic classification (which involves semantic groupings at the meta-document level).
  7. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.01
    0.0068112854 = product of:
      0.020433856 = sum of:
        0.020433856 = product of:
          0.040867712 = sum of:
            0.040867712 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.040867712 = score(doc=690,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    23. 3.2013 13:22:36
  8. Egbert, J.; Biber, D.; Davies, M.: Developing a bottom-up, user-based method of web register classification (2015) 0.01
    0.0068112854 = product of:
      0.020433856 = sum of:
        0.020433856 = product of:
          0.040867712 = sum of:
            0.040867712 = weight(_text_:22 in 2158) [ClassicSimilarity], result of:
              0.040867712 = score(doc=2158,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.23214069 = fieldWeight in 2158, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2158)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    4. 8.2015 19:22:04
  9. Liu, R.-L.: ¬A passage extractor for classification of disease aspect information (2013) 0.01
    0.0056760716 = product of:
      0.017028214 = sum of:
        0.017028214 = product of:
          0.03405643 = sum of:
            0.03405643 = weight(_text_:22 in 1107) [ClassicSimilarity], result of:
              0.03405643 = score(doc=1107,freq=2.0), product of:
                0.17604718 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05027291 = queryNorm
                0.19345059 = fieldWeight in 1107, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1107)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    28.10.2013 19:22:57