Search (26 results, page 1 of 2)

  • × language_ss:"e"
  • × theme_ss:"Automatisches Klassifizieren"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23
    0.2307443 = product of:
      0.30765906 = sum of:
        0.07228978 = product of:
          0.21686934 = sum of:
            0.21686934 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.21686934 = score(doc=562,freq=2.0), product of:
                0.38587612 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.045514934 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.21686934 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
          0.21686934 = score(doc=562,freq=2.0), product of:
            0.38587612 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045514934 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
        0.018499935 = product of:
          0.03699987 = sum of:
            0.03699987 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.03699987 = score(doc=562,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Yao, H.; Etzkorn, L.H.; Virani, S.: Automated classification and retrieval of reusable software components (2008) 0.02
    0.017135125 = product of:
      0.0685405 = sum of:
        0.0685405 = product of:
          0.137081 = sum of:
            0.137081 = weight(_text_:software in 1382) [ClassicSimilarity], result of:
              0.137081 = score(doc=1382,freq=24.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.75917953 = fieldWeight in 1382, product of:
                  4.8989797 = tf(freq=24.0), with freq of:
                    24.0 = termFreq=24.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1382)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.
  3. Koch, T.; Ardö, A.: Automatic classification of full-text HTML-documents from one specific subject area : DESIRE II D3.6a, Working Paper 2 (2000) 0.01
    0.013708099 = product of:
      0.054832395 = sum of:
        0.054832395 = product of:
          0.10966479 = sum of:
            0.10966479 = weight(_text_:software in 1667) [ClassicSimilarity], result of:
              0.10966479 = score(doc=1667,freq=6.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.6073436 = fieldWeight in 1667, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1667)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Content
    1 Introduction / 2 Method overview / 3 Ei thesaurus preprocessing / 4 Automatic classification process: 4.1 Matching -- 4.2 Weighting -- 4.3 Preparation for display / 5 Results of the classification process / 6 Evaluations / 7 Software / 8 Other applications / 9 Experiments with universal classification systems / References / Appendix A: Ei classification service: Software / Appendix B: Use of the classification software as subject filter in a WWW harvester.
  4. Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.01
    0.011060675 = product of:
      0.0442427 = sum of:
        0.0442427 = product of:
          0.0884854 = sum of:
            0.0884854 = weight(_text_:software in 1665) [ClassicSimilarity], result of:
              0.0884854 = score(doc=1665,freq=10.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.49004826 = fieldWeight in 1665, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1665)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
  5. Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.01
    0.010281074 = product of:
      0.041124295 = sum of:
        0.041124295 = product of:
          0.08224859 = sum of:
            0.08224859 = weight(_text_:software in 2100) [ClassicSimilarity], result of:
              0.08224859 = score(doc=2100,freq=6.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.4555077 = fieldWeight in 2100, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2100)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
  6. Sebastiani, F.: Classification of text, automatic (2006) 0.01
    0.009793539 = product of:
      0.039174154 = sum of:
        0.039174154 = product of:
          0.07834831 = sum of:
            0.07834831 = weight(_text_:software in 5003) [ClassicSimilarity], result of:
              0.07834831 = score(doc=5003,freq=4.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.43390724 = fieldWeight in 5003, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5003)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Automatic text classification (ATC) is a discipline at the crossroads of information retrieval (IR), machine learning (ML), and computational linguistics (CL), and consists in the realization of text classifiers, i.e. software systems capable of assigning texts to one or more categories, or classes, from a predefined set. Applications range from the automated indexing of scientific articles, to e-mail routing, spam filtering, authorship attribution, and automated survey coding. This article will focus on the ML approach to ATC, whereby a software system (called the learner) automatically builds a classifier for the categories of interest by generalizing from a "training" set of pre-classified texts.
  7. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.0092499675 = product of:
      0.03699987 = sum of:
        0.03699987 = product of:
          0.07399974 = sum of:
            0.07399974 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.07399974 = score(doc=1046,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 14:17:22
  8. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01
    0.007708307 = product of:
      0.030833228 = sum of:
        0.030833228 = product of:
          0.061666455 = sum of:
            0.061666455 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.061666455 = score(doc=2748,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 2.2016 18:25:22
  9. Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.01
    0.006995385 = product of:
      0.02798154 = sum of:
        0.02798154 = product of:
          0.05596308 = sum of:
            0.05596308 = weight(_text_:software in 977) [ClassicSimilarity], result of:
              0.05596308 = score(doc=977,freq=4.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.30993375 = fieldWeight in 977, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=977)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
  10. Koch, T.; Ardö, A.; Brümmer, A.: ¬The building and maintenance of robot based internet search services : A review of current indexing and data collection methods. Prepared to meet the requirements of Work Package 3 of EU Telematics for Research, project DESIRE. Version D3.11v0.3 (Draft version 3) (1996) 0.01
    0.005596308 = product of:
      0.022385232 = sum of:
        0.022385232 = product of:
          0.044770464 = sum of:
            0.044770464 = weight(_text_:software in 1669) [ClassicSimilarity], result of:
              0.044770464 = score(doc=1669,freq=4.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.24794699 = fieldWeight in 1669, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1669)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    After a short outline of problems, possibilities and difficulties of systematic information retrieval on the Internet and a description of efforts for development in this area, a specification of the terminology for this report is required. Although the process of retrieval is generally seen as an iterative process of browsing and information retrieval and several important services on the net have taken this fact into consideration, the emphasis of this report lays on the general retrieval tools for the whole of Internet. In order to be able to evaluate the differences, possibilities and restrictions of the different services it is necessary to begin with organizing the existing varieties in a typological/ taxonomical survey. The possibilities and weaknesses will be briefly compared and described for the most important services in the categories robot-based WWW-catalogues of different types, list- or form-based catalogues and simultaneous or collected search services respectively. It will however for different reasons not be possible to rank them in order of "best" services. Still more important are the weaknesses and problems common for all attempts of indexing the Internet. The problems of the quality of the input, the technical performance and the general problem of indexing virtual hypertext are shown to be at least as difficult as the different aspects of harvesting, indexing and information retrieval. Some of the attempts made in the area of further development of retrieval services will be mentioned in relation to descriptions of the contents of documents and standardization efforts. Internet harvesting and indexing technology and retrieval software is thoroughly reviewed. Details about all services and software are listed in analytical forms in Annex 1-3.
  11. Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.01
    0.005596308 = product of:
      0.022385232 = sum of:
        0.022385232 = product of:
          0.044770464 = sum of:
            0.044770464 = weight(_text_:software in 2301) [ClassicSimilarity], result of:
              0.044770464 = score(doc=2301,freq=4.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.24794699 = fieldWeight in 2301, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2301)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.
  12. Dubin, D.: Dimensions and discriminability (1998) 0.01
    0.005395815 = product of:
      0.02158326 = sum of:
        0.02158326 = product of:
          0.04316652 = sum of:
            0.04316652 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
              0.04316652 = score(doc=2338,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.2708308 = fieldWeight in 2338, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2338)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 9.1997 19:16:05
  13. Automatic classification research at OCLC (2002) 0.01
    0.005395815 = product of:
      0.02158326 = sum of:
        0.02158326 = product of:
          0.04316652 = sum of:
            0.04316652 = weight(_text_:22 in 1563) [ClassicSimilarity], result of:
              0.04316652 = score(doc=1563,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.2708308 = fieldWeight in 1563, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1563)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 9:22:09
  14. Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01
    0.005395815 = product of:
      0.02158326 = sum of:
        0.02158326 = product of:
          0.04316652 = sum of:
            0.04316652 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
              0.04316652 = score(doc=1673,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.2708308 = fieldWeight in 1673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1673)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 8.1996 22:08:06
  15. Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01
    0.005395815 = product of:
      0.02158326 = sum of:
        0.02158326 = product of:
          0.04316652 = sum of:
            0.04316652 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
              0.04316652 = score(doc=5273,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.2708308 = fieldWeight in 5273, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5273)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 7.2006 16:24:52
  16. Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01
    0.005395815 = product of:
      0.02158326 = sum of:
        0.02158326 = product of:
          0.04316652 = sum of:
            0.04316652 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
              0.04316652 = score(doc=2560,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.2708308 = fieldWeight in 2560, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2560)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 9.2008 18:31:54
  17. AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.00
    0.0049464838 = product of:
      0.019785935 = sum of:
        0.019785935 = product of:
          0.03957187 = sum of:
            0.03957187 = weight(_text_:software in 2836) [ClassicSimilarity], result of:
              0.03957187 = score(doc=2836,freq=2.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.21915624 = fieldWeight in 2836, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2836)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.
  18. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.00
    0.0049464838 = product of:
      0.019785935 = sum of:
        0.019785935 = product of:
          0.03957187 = sum of:
            0.03957187 = weight(_text_:software in 3311) [ClassicSimilarity], result of:
              0.03957187 = score(doc=3311,freq=2.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.21915624 = fieldWeight in 3311, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3311)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
  19. Pech, G.; Delgado, C.; Sorella, S.P.: Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics (2022) 0.00
    0.0049464838 = product of:
      0.019785935 = sum of:
        0.019785935 = product of:
          0.03957187 = sum of:
            0.03957187 = weight(_text_:software in 744) [ClassicSimilarity], result of:
              0.03957187 = score(doc=744,freq=2.0), product of:
                0.18056466 = queryWeight, product of:
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.045514934 = queryNorm
                0.21915624 = fieldWeight in 744, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9671519 = idf(docFreq=2274, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=744)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Abstract
    Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These "exclusive journals" are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy-makers, funding, and research institutions-via more accurate academic performance evaluations-, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.
  20. Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00
    0.0046249838 = product of:
      0.018499935 = sum of:
        0.018499935 = product of:
          0.03699987 = sum of:
            0.03699987 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
              0.03699987 = score(doc=2760,freq=2.0), product of:
                0.15938555 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.045514934 = queryNorm
                0.23214069 = fieldWeight in 2760, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2760)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    22. 3.2009 19:11:54