Search (26 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.23

0.2307443 = product of:
  0.30765906 = sum of:
    0.07228978 = product of:
      0.21686934 = sum of:
        0.21686934 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.21686934 = score(doc=562,freq=2.0), product of:
            0.38587612 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045514934 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.21686934 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.21686934 = score(doc=562,freq=2.0), product of:
        0.38587612 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.045514934 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.018499935 = product of:
      0.03699987 = sum of:
        0.03699987 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.03699987 = score(doc=562,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Yao, H.; Etzkorn, L.H.; Virani, S.: Automated classification and retrieval of reusable software components (2008) 0.02
```
0.017135125 = product of:
  0.0685405 = sum of:
    0.0685405 = product of:
      0.137081 = sum of:
        0.137081 = weight(_text_:software in 1382) [ClassicSimilarity], result of:
          0.137081 = score(doc=1382,freq=24.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.75917953 = fieldWeight in 1382, product of:
              4.8989797 = tf(freq=24.0), with freq of:
                24.0 = termFreq=24.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1382)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The authors describe their research which improves software reuse by using an automated approach to semantically search for and retrieve reusable software components in large software component repositories and on the World Wide Web (WWW). Using automation and smart (semantic) techniques, their approach speeds up the search and retrieval of reusable software components, while retaining good accuracy, and therefore improves the affordability of software reuse. A program understanding of software components and natural language understanding of user queries was employed. Then the software component descriptions were compared by matching the resulting semantic representations of the user queries to the semantic representations of the software components to search for software components that best match the user queries. A proof of concept system was developed to test the authors' approach. The results of this proof of concept system were compared to human experts, and statistical analysis was performed on the collected experimental data. The results from these experiments demonstrate that this automated semantic-based approach for software reusable component classification and retrieval is successful when compared to the labor-intensive results from the experts, thus showing that this approach can significantly benefit software reuse classification and retrieval.
Montesi, M.; Navarrete, T.: Classifying web genres in context : A case study documenting the web genres used by a software engineer (2008) 0.01
```
0.010281074 = product of:
  0.041124295 = sum of:
    0.041124295 = product of:
      0.08224859 = sum of:
        0.08224859 = weight(_text_:software in 2100) [ClassicSimilarity], result of:
          0.08224859 = score(doc=2100,freq=6.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.4555077 = fieldWeight in 2100, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.046875 = fieldNorm(doc=2100)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.
Sebastiani, F.: Classification of text, automatic (2006) 0.01
```
0.009793539 = product of:
  0.039174154 = sum of:
    0.039174154 = product of:
      0.07834831 = sum of:
        0.07834831 = weight(_text_:software in 5003) [ClassicSimilarity], result of:
          0.07834831 = score(doc=5003,freq=4.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.43390724 = fieldWeight in 5003, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5003)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Automatic text classification (ATC) is a discipline at the crossroads of information retrieval (IR), machine learning (ML), and computational linguistics (CL), and consists in the realization of text classifiers, i.e. software systems capable of assigning texts to one or more categories, or classes, from a predefined set. Applications range from the automated indexing of scientific articles, to e-mail routing, spam filtering, authorship attribution, and automated survey coding. This article will focus on the ML approach to ATC, whereby a software system (called the learner) automatically builds a classifier for the categories of interest by generalizing from a "training" set of pre-classified texts.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.0092499675 = product of:
  0.03699987 = sum of:
    0.03699987 = product of:
      0.07399974 = sum of:
        0.07399974 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.07399974 = score(doc=1046,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Brückner, T.; Dambeck, H.: Sortierautomaten : Grundlagen der Textklassifizierung (2003) 0.01

0.007914375 = product of:
  0.0316575 = sum of:
    0.0316575 = product of:
      0.063315 = sum of:
        0.063315 = weight(_text_:software in 2398) [ClassicSimilarity], result of:
          0.063315 = score(doc=2398,freq=2.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.35064998 = fieldWeight in 2398, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.0625 = fieldNorm(doc=2398)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Abstract: Rechnung, Kündigung oder Adressänderung? Eingehende Briefe und E-Mails werden immer häufiger von Software statt aufwändig von Menschenhand sortiert. Die Textklassifizierer arbeiten erstaunlich genau. Sie fahnden auch nach ähnlichen Texten und sorgen so für einen schnellen Überblick. Ihre Werkzeuge sind Linguistik, Statistik und Logik

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.01

0.007708307 = product of:
  0.030833228 = sum of:
    0.030833228 = product of:
      0.061666455 = sum of:
        0.061666455 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.061666455 = score(doc=2748,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22

Ahmed, M.; Mukhopadhyay, M.; Mukhopadhyay, P.: Automated knowledge organization : AI ML based subject indexing system for libraries (2023) 0.01
```
0.006995385 = product of:
  0.02798154 = sum of:
    0.02798154 = product of:
      0.05596308 = sum of:
        0.05596308 = weight(_text_:software in 977) [ClassicSimilarity], result of:
          0.05596308 = score(doc=977,freq=4.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.30993375 = fieldWeight in 977, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.0390625 = fieldNorm(doc=977)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

The research study as reported here is an attempt to explore the possibilities of an AI/ML-based semi-automated indexing system in a library setup to handle large volumes of documents. It uses the Python virtual environment to install and configure an open source AI environment (named Annif) to feed the LOD (Linked Open Data) dataset of Library of Congress Subject Headings (LCSH) as a standard KOS (Knowledge Organisation System). The framework deployed the Turtle format of LCSH after cleaning the file with Skosify, applied an array of backend algorithms (namely TF-IDF, Omikuji, and NN-Ensemble) to measure relative performance, and selected Snowball as an analyser. The training of Annif was conducted with a large set of bibliographic records populated with subject descriptors (MARC tag 650$a) and indexed by trained LIS professionals. The training dataset is first treated with MarcEdit to export it in a format suitable for OpenRefine, and then in OpenRefine it undergoes many steps to produce a bibliographic record set suitable to train Annif. The framework, after training, has been tested with a bibliographic dataset to measure indexing efficiencies, and finally, the automated indexing framework is integrated with data wrangling software (OpenRefine) to produce suggested headings on a mass scale. The entire framework is based on open-source software, open datasets, and open standards.
Kasprzik, A.: Automatisierte und semiautomatisierte Klassifizierung : eine Analyse aktueller Projekte (2014) 0.01
```
0.0059357807 = product of:
  0.023743123 = sum of:
    0.023743123 = product of:
      0.047486246 = sum of:
        0.047486246 = weight(_text_:software in 2470) [ClassicSimilarity], result of:
          0.047486246 = score(doc=2470,freq=2.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.2629875 = fieldWeight in 2470, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.046875 = fieldNorm(doc=2470)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Das sprunghafte Anwachsen der Menge digital verfügbarer Dokumente gepaart mit dem Zeit- und Personalmangel an wissenschaftlichen Bibliotheken legt den Einsatz von halb- oder vollautomatischen Verfahren für die verbale und klassifikatorische Inhaltserschließung nahe. Nach einer kurzen allgemeinen Einführung in die gängige Methodik beleuchtet dieser Artikel eine Reihe von Projekten zur automatisierten Klassifizierung aus dem Zeitraum 2007-2012 und aus dem deutschsprachigen Raum. Ein Großteil der vorgestellten Projekte verwendet Methoden des Maschinellen Lernens aus der Künstlichen Intelligenz, arbeitet meist mit angepassten Versionen einer kommerziellen Software und bezieht sich in der Regel auf die Dewey Decimal Classification (DDC). Als Datengrundlage dienen Metadatensätze, Abstracs, Inhaltsverzeichnisse und Volltexte in diversen Datenformaten. Die abschließende Analyse enthält eine Anordnung der Projekte nach einer Reihe von verschiedenen Kriterien und eine Zusammenfassung der aktuellen Lage und der größten Herausfordungen für automatisierte Klassifizierungsverfahren.
Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.01
```
0.005596308 = product of:
  0.022385232 = sum of:
    0.022385232 = product of:
      0.044770464 = sum of:
        0.044770464 = weight(_text_:software in 2301) [ClassicSimilarity], result of:
          0.044770464 = score(doc=2301,freq=4.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.24794699 = fieldWeight in 2301, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.03125 = fieldNorm(doc=2301)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.

Bock, H.-H.: Datenanalyse zur Strukturierung und Ordnung von Information (1989) 0.01

0.005395815 = product of:
  0.02158326 = sum of:
    0.02158326 = product of:
      0.04316652 = sum of:
        0.04316652 = weight(_text_:22 in 141) [ClassicSimilarity], result of:
          0.04316652 = score(doc=141,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.2708308 = fieldWeight in 141, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=141)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Pages: S.1-22

Dubin, D.: Dimensions and discriminability (1998) 0.01

0.005395815 = product of:
  0.02158326 = sum of:
    0.02158326 = product of:
      0.04316652 = sum of:
        0.04316652 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.04316652 = score(doc=2338,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.1997 19:16:05

Jenkins, C.: Automatic classification of Web resources using Java and Dewey Decimal Classification (1998) 0.01

0.005395815 = product of:
  0.02158326 = sum of:
    0.02158326 = product of:
      0.04316652 = sum of:
        0.04316652 = weight(_text_:22 in 1673) [ClassicSimilarity], result of:
          0.04316652 = score(doc=1673,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.2708308 = fieldWeight in 1673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1673)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 1. 8.1996 22:08:06

Yoon, Y.; Lee, C.; Lee, G.G.: ¬An effective procedure for constructing a hierarchical text classification system (2006) 0.01

0.005395815 = product of:
  0.02158326 = sum of:
    0.02158326 = product of:
      0.04316652 = sum of:
        0.04316652 = weight(_text_:22 in 5273) [ClassicSimilarity], result of:
          0.04316652 = score(doc=5273,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.2708308 = fieldWeight in 5273, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5273)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 7.2006 16:24:52

Yi, K.: Automatic text classification using library classification schemes : trends, issues and challenges (2007) 0.01

0.005395815 = product of:
  0.02158326 = sum of:
    0.02158326 = product of:
      0.04316652 = sum of:
        0.04316652 = weight(_text_:22 in 2560) [ClassicSimilarity], result of:
          0.04316652 = score(doc=2560,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.2708308 = fieldWeight in 2560, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2560)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 9.2008 18:31:54

AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.00
```
0.0049464838 = product of:
  0.019785935 = sum of:
    0.019785935 = product of:
      0.03957187 = sum of:
        0.03957187 = weight(_text_:software in 2836) [ClassicSimilarity], result of:
          0.03957187 = score(doc=2836,freq=2.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.21915624 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.
Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.00
```
0.0049464838 = product of:
  0.019785935 = sum of:
    0.019785935 = product of:
      0.03957187 = sum of:
        0.03957187 = weight(_text_:software in 3311) [ClassicSimilarity], result of:
          0.03957187 = score(doc=3311,freq=2.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.21915624 = fieldWeight in 3311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
Pech, G.; Delgado, C.; Sorella, S.P.: Classifying papers into subfields using Abstracts, Titles, Keywords and KeyWords Plus through pattern detection and optimization procedures : an application in Physics (2022) 0.00
```
0.0049464838 = product of:
  0.019785935 = sum of:
    0.019785935 = product of:
      0.03957187 = sum of:
        0.03957187 = weight(_text_:software in 744) [ClassicSimilarity], result of:
          0.03957187 = score(doc=744,freq=2.0), product of:
            0.18056466 = queryWeight, product of:
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.045514934 = queryNorm
            0.21915624 = fieldWeight in 744, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.9671519 = idf(docFreq=2274, maxDocs=44218)
              0.0390625 = fieldNorm(doc=744)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Classifying papers according to the fields of knowledge is critical to clearly understand the dynamics of scientific (sub)fields, their leading questions, and trends. Most studies rely on journal categories defined by popular databases such as WoS or Scopus, but some experts find that those categories may not correctly map the existing subfields nor identify the subfield of a specific article. This study addresses the classification problem using data from each paper (Abstract, Title, Keywords, and the KeyWords Plus) and the help of experts to identify the existing subfields and journals exclusive of each subfield. These "exclusive journals" are critical to obtain, through a pattern detection procedure that uses machine learning techniques (from software NVivo), a list of the frequent terms that are specific to each subfield. With that list of terms and with the help of optimization procedures, we can identify to which subfield each paper most likely belongs. This study can contribute to support scientific policy-makers, funding, and research institutions-via more accurate academic performance evaluations-, to support editors in their tasks to redefine the scopes of journals, and to support popular databases in their processes of refining categories.

Liu, R.-L.: Context recognition for hierarchical text classification (2009) 0.00

0.0046249838 = product of:
  0.018499935 = sum of:
    0.018499935 = product of:
      0.03699987 = sum of:
        0.03699987 = weight(_text_:22 in 2760) [ClassicSimilarity], result of:
          0.03699987 = score(doc=2760,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.23214069 = fieldWeight in 2760, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2760)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 3.2009 19:11:54

Pfeffer, M.: Automatische Vergabe von RVK-Notationen mittels fallbasiertem Schließen (2009) 0.00

0.0046249838 = product of:
  0.018499935 = sum of:
    0.018499935 = product of:
      0.03699987 = sum of:
        0.03699987 = weight(_text_:22 in 3051) [ClassicSimilarity], result of:
          0.03699987 = score(doc=3051,freq=2.0), product of:
            0.15938555 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045514934 = queryNorm
            0.23214069 = fieldWeight in 3051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3051)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 8.2009 19:51:28

Search (26 results, page 1 of 2)

Authors

Years

Languages

Themes