Search (50 results, page 1 of 3)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.10

0.10477377 = product of:
  0.13969836 = sum of:
    0.06800719 = product of:
      0.20402157 = sum of:
        0.20402157 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.20402157 = score(doc=562,freq=2.0), product of:
            0.36301607 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.042818543 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.060088523 = weight(_text_:representation in 562) [ClassicSimilarity], result of:
      0.060088523 = score(doc=562,freq=2.0), product of:
        0.19700786 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.042818543 = queryNorm
        0.3050057 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.011602643 = product of:
      0.034807928 = sum of:
        0.034807928 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.034807928 = score(doc=562,freq=2.0), product of:
            0.14994325 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042818543 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
  0.75 = coord(3/4)

Abstract: Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Dubin, D.: Dimensions and discriminability (1998) 0.04

0.041819844 = product of:
  0.08363969 = sum of:
    0.07010327 = weight(_text_:representation in 2338) [ClassicSimilarity], result of:
      0.07010327 = score(doc=2338,freq=2.0), product of:
        0.19700786 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.042818543 = queryNorm
        0.35583997 = fieldWeight in 2338, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2338)
    0.013536418 = product of:
      0.04060925 = sum of:
        0.04060925 = weight(_text_:22 in 2338) [ClassicSimilarity], result of:
          0.04060925 = score(doc=2338,freq=2.0), product of:
            0.14994325 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042818543 = queryNorm
            0.2708308 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Abstract: Visualization interfaces can improve subject access by highlighting the inclusion of document representation components in similarity and discrimination relationships. Within a set of retrieved documents, what kinds of groupings can index terms and subject headings make explicit? The role of controlled vocabulary in classifying search output is examined
Date: 22. 9.1997 19:16:05

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.04
```
0.03584558 = product of:
  0.07169116 = sum of:
    0.060088523 = weight(_text_:representation in 690) [ClassicSimilarity], result of:
      0.060088523 = score(doc=690,freq=2.0), product of:
        0.19700786 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.042818543 = queryNorm
        0.3050057 = fieldWeight in 690, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.011602643 = product of:
      0.034807928 = sum of:
        0.034807928 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.034807928 = score(doc=690,freq=2.0), product of:
            0.14994325 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042818543 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)
```
Abstract

We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.

Date

23. 3.2013 13:22:36

Khoo, C.S.G.; Ng, K.; Ou, S.: ¬An exploratory study of human clustering of Web pages (2003) 0.02

0.023897056 = product of:
  0.04779411 = sum of:
    0.040059015 = weight(_text_:representation in 2741) [ClassicSimilarity], result of:
      0.040059015 = score(doc=2741,freq=2.0), product of:
        0.19700786 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.042818543 = queryNorm
        0.20333713 = fieldWeight in 2741, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.03125 = fieldNorm(doc=2741)
    0.0077350955 = product of:
      0.023205286 = sum of:
        0.023205286 = weight(_text_:22 in 2741) [ClassicSimilarity], result of:
          0.023205286 = score(doc=2741,freq=2.0), product of:
            0.14994325 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042818543 = queryNorm
            0.15476047 = fieldWeight in 2741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=2741)
      0.33333334 = coord(1/3)
  0.5 = coord(2/4)

Date: 12. 9.2004 9:56:22
Source: Challenges in knowledge representation and organization for the 21st century: Integration of knowledge across boundaries. Proceedings of the 7th ISKO International Conference Granada, Spain, July 10-13, 2002. Ed.: M. López-Huertas

Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.02
```
0.017525818 = product of:
  0.07010327 = sum of:
    0.07010327 = weight(_text_:representation in 1568) [ClassicSimilarity], result of:
      0.07010327 = score(doc=1568,freq=2.0), product of:
        0.19700786 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.042818543 = queryNorm
        0.35583997 = fieldWeight in 1568, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1568)
  0.25 = coord(1/4)
```
Abstract

Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.
Sebastiani, F.: Machine learning in automated text categorization (2002) 0.02
```
0.015022131 = product of:
  0.060088523 = sum of:
    0.060088523 = weight(_text_:representation in 3389) [ClassicSimilarity], result of:
      0.060088523 = score(doc=3389,freq=2.0), product of:
        0.19700786 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.042818543 = queryNorm
        0.3050057 = fieldWeight in 3389, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.046875 = fieldNorm(doc=3389)
  0.25 = coord(1/4)
```
Abstract

The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.01
```
0.012518443 = product of:
  0.050073773 = sum of:
    0.050073773 = weight(_text_:representation in 2836) [ClassicSimilarity], result of:
      0.050073773 = score(doc=2836,freq=2.0), product of:
        0.19700786 = queryWeight, product of:
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.042818543 = queryNorm
        0.25417143 = fieldWeight in 2836, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.600994 = idf(docFreq=1206, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2836)
  0.25 = coord(1/4)
```
Abstract

Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.

Zhang, X: Rough set theory based automatic text categorization (2005) 0.01

0.009446202 = product of:
  0.037784807 = sum of:
    0.037784807 = product of:
      0.11335442 = sum of:
        0.11335442 = weight(_text_:theory in 2822) [ClassicSimilarity], result of:
          0.11335442 = score(doc=2822,freq=6.0), product of:
            0.1780563 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.042818543 = queryNorm
            0.63662124 = fieldWeight in 2822, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0625 = fieldNorm(doc=2822)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Abstract: Der Forschungsbericht "Rough Set Theory Based Automatic Text Categorization and the Handling of Semantic Heterogeneity" von Xueying Zhang ist in Buchform auf Englisch erschienen. Zhang hat in ihrer Arbeit ein Verfahren basierend auf der Rough Set Theory entwickelt, das Beziehungen zwischen Schlagwörtern verschiedener Vokabulare herstellt. Sie war von 2003 bis 2005 Mitarbeiterin des IZ und ist seit Oktober 2005 Associate Professor an der Nanjing University of Science and Technology.

Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.01

0.0078053097 = product of:
  0.031221239 = sum of:
    0.031221239 = product of:
      0.093663715 = sum of:
        0.093663715 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
          0.093663715 = score(doc=5169,freq=2.0), product of:
            0.15062225 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042818543 = queryNorm
            0.6218451 = fieldWeight in 5169, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.125 = fieldNorm(doc=5169)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Source: Nachrichten für Dokumentation. 29(1978), S.92-96

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01

0.0058013215 = product of:
  0.023205286 = sum of:
    0.023205286 = product of:
      0.069615856 = sum of:
        0.069615856 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.069615856 = score(doc=1046,freq=2.0), product of:
            0.14994325 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042818543 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Date: 5. 5.2003 14:17:22

Ingwersen, P.; Wormell, I.: Ranganathan in the perspective of advanced information retrieval (1992) 0.01

0.0054537673 = product of:
  0.021815069 = sum of:
    0.021815069 = product of:
      0.06544521 = sum of:
        0.06544521 = weight(_text_:theory in 7695) [ClassicSimilarity], result of:
          0.06544521 = score(doc=7695,freq=2.0), product of:
            0.1780563 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.042818543 = queryNorm
            0.36755344 = fieldWeight in 7695, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0625 = fieldNorm(doc=7695)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Abstract: Examnines Ranganathan's approach to knowledge organisation and its relevance to intellectual accessibility in libraries. Discusses the current and future developments of his methodology and theories in knowledge-based systems. Topics covered include: semi-automatic classification and structure of thesauri; user-intermediary interactions in information retrieval (IR); semantic value-theory and uncertainty principles in IR; and case grammar

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.00

0.004834435 = product of:
  0.01933774 = sum of:
    0.01933774 = product of:
      0.05801322 = sum of:
        0.05801322 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.05801322 = score(doc=611,freq=2.0), product of:
            0.14994325 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042818543 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00

0.004834435 = product of:
  0.01933774 = sum of:
    0.01933774 = product of:
      0.05801322 = sum of:
        0.05801322 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.05801322 = score(doc=2748,freq=2.0), product of:
            0.14994325 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.042818543 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Date: 1. 2.2016 18:25:22

Losee, R.M.: Text windows and phrases differing by discipline, location in document, and syntactic structure (1996) 0.00
```
0.0047720466 = product of:
  0.019088186 = sum of:
    0.019088186 = product of:
      0.057264555 = sum of:
        0.057264555 = weight(_text_:theory in 6962) [ClassicSimilarity], result of:
          0.057264555 = score(doc=6962,freq=2.0), product of:
            0.1780563 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.042818543 = queryNorm
            0.32160926 = fieldWeight in 6962, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6962)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

Knowledge of window style, content, location, and grammatical structure may be used to classify documents as originating within a particular discipline or may be used to place a document on a theory vs. practice spectrum. Examines characteristics of phrases and text windows, including their number, location in documents, and grammatical construction, in addition to studying variations in these window characteristics across disciplines. Examines some of the linguistic regularities for individual disciplines, and suggests families of regularities that may provide helpful for the automatic classification of documents, as well as for information retrieval and filtering applications
Huang, Y.-L.: ¬A theoretic and empirical research of cluster indexing for Mandarine Chinese full text document (1998) 0.00
```
0.0047720466 = product of:
  0.019088186 = sum of:
    0.019088186 = product of:
      0.057264555 = sum of:
        0.057264555 = weight(_text_:theory in 513) [ClassicSimilarity], result of:
          0.057264555 = score(doc=513,freq=2.0), product of:
            0.1780563 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.042818543 = queryNorm
            0.32160926 = fieldWeight in 513, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.0546875 = fieldNorm(doc=513)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

Since most popular commercialized systems for full text retrieval are designed with full text scaning and Boolean logic query mode, these systems use an oversimplified relationship between the indexing form and the content of document. Reports the use of Singular Value Decomposition (SVD) to develop a Cluster Indexing Model (CIM) based on a Vector Space Model (VSM) in orer to explore the index theory of cluster indexing for chinese full text documents. From a series of experiments, it was found that the indexing performance of CIM is better than traditional VSM, and has almost equivalent effectiveness of the authority control of index terms
Xu, Y.; Bernard, A.: Knowledge organization through statistical computation : a new approach (2009) 0.00
```
0.0040903254 = product of:
  0.016361302 = sum of:
    0.016361302 = product of:
      0.049083903 = sum of:
        0.049083903 = weight(_text_:theory in 3252) [ClassicSimilarity], result of:
          0.049083903 = score(doc=3252,freq=2.0), product of:
            0.1780563 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.042818543 = queryNorm
            0.27566507 = fieldWeight in 3252, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=3252)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

Knowledge organization (KO) is an interdisciplinary issue which includes some problems in knowledge classification such as how to classify newly emerged knowledge. With the great complexity and ambiguity of knowledge, it is becoming sometimes inefficient to classify knowledge by logical reasoning. This paper attempts to propose a statistical approach to knowledge organization in order to resolve the problems in classifying complex and mass knowledge. By integrating the classification process into a mathematical model, a knowledge classifier, based on the maximum entropy theory, is constructed and the experimental results show that the classification results acquired from the classifier are reliable. The approach proposed in this paper is quite formal and is not dependent on specific contexts, so it could easily be adapted to the use of knowledge classification in other domains within KO.
Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.00
```
0.0040903254 = product of:
  0.016361302 = sum of:
    0.016361302 = product of:
      0.049083903 = sum of:
        0.049083903 = weight(_text_:theory in 3015) [ClassicSimilarity], result of:
          0.049083903 = score(doc=3015,freq=2.0), product of:
            0.1780563 = queryWeight, product of:
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.042818543 = queryNorm
            0.27566507 = fieldWeight in 3015, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.1583924 = idf(docFreq=1878, maxDocs=44218)
              0.046875 = fieldNorm(doc=3015)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Abstract

We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.

Savic, D.: Designing an expert system for classifying office documents (1994) 0.00

0.0039026549 = product of:
  0.015610619 = sum of:
    0.015610619 = product of:
      0.046831857 = sum of:
        0.046831857 = weight(_text_:29 in 2655) [ClassicSimilarity], result of:
          0.046831857 = score(doc=2655,freq=2.0), product of:
            0.15062225 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042818543 = queryNorm
            0.31092256 = fieldWeight in 2655, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=2655)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Source: Records management quarterly. 28(1994) no.3, S.20-29

Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.00

0.0034148227 = product of:
  0.013659291 = sum of:
    0.013659291 = product of:
      0.040977873 = sum of:
        0.040977873 = weight(_text_:29 in 2219) [ClassicSimilarity], result of:
          0.040977873 = score(doc=2219,freq=2.0), product of:
            0.15062225 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042818543 = queryNorm
            0.27205724 = fieldWeight in 2219, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2219)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Source: Records management quarterly. 29(1995) no.4, S.3-18

Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.00

0.0034148227 = product of:
  0.013659291 = sum of:
    0.013659291 = product of:
      0.040977873 = sum of:
        0.040977873 = weight(_text_:29 in 1661) [ClassicSimilarity], result of:
          0.040977873 = score(doc=1661,freq=2.0), product of:
            0.15062225 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.042818543 = queryNorm
            0.27205724 = fieldWeight in 1661, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1661)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)

Date: 29. 7.1998 17:45:02

Search (50 results, page 1 of 3)

Authors

Years

Languages

Types

Themes