Search (115 results, page 1 of 6)

  • × theme_ss:"Computerlinguistik"
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.12
    0.11997249 = product of:
      0.17995873 = sum of:
        0.080158204 = product of:
          0.2404746 = sum of:
            0.2404746 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.2404746 = score(doc=562,freq=2.0), product of:
                0.427877 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05046903 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.09980053 = sum of:
          0.058773387 = weight(_text_:classification in 562) [ClassicSimilarity], result of:
            0.058773387 = score(doc=562,freq=6.0), product of:
              0.16072905 = queryWeight, product of:
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.05046903 = queryNorm
              0.3656675 = fieldWeight in 562, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
          0.04102714 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.04102714 = score(doc=562,freq=2.0), product of:
              0.17673394 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05046903 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
      0.6666667 = coord(2/3)
    
    Abstract
    Document representations for text classification are typically based on the classical Bag-Of-Words paradigm. This approach comes with deficiencies that motivate the integration of features on a higher semantic level than single words. In this paper we propose an enhancement of the classical document representation through concepts extracted from background knowledge. Boosting is used for actual classification. Experimental evaluations on two well known text corpora support our approach through consistent improvement of the results.
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  2. Sebastiani, F.: Machine learning in automated text categorization (2002) 0.07
    0.06636614 = product of:
      0.09954921 = sum of:
        0.082582794 = weight(_text_:interest in 3389) [ClassicSimilarity], result of:
          0.082582794 = score(doc=3389,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.3293521 = fieldWeight in 3389, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.046875 = fieldNorm(doc=3389)
        0.016966416 = product of:
          0.03393283 = sum of:
            0.03393283 = weight(_text_:classification in 3389) [ClassicSimilarity], result of:
              0.03393283 = score(doc=3389,freq=2.0), product of:
                0.16072905 = queryWeight, product of:
                  3.1847067 = idf(docFreq=4974, maxDocs=44218)
                  0.05046903 = queryNorm
                0.21111822 = fieldWeight in 3389, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1847067 = idf(docFreq=4974, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3389)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based an machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert labor power, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
  3. Sebastiani, F.: ¬A tutorial an automated text categorisation (1999) 0.07
    0.06636614 = product of:
      0.09954921 = sum of:
        0.082582794 = weight(_text_:interest in 3390) [ClassicSimilarity], result of:
          0.082582794 = score(doc=3390,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.3293521 = fieldWeight in 3390, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.046875 = fieldNorm(doc=3390)
        0.016966416 = product of:
          0.03393283 = sum of:
            0.03393283 = weight(_text_:classification in 3390) [ClassicSimilarity], result of:
              0.03393283 = score(doc=3390,freq=2.0), product of:
                0.16072905 = queryWeight, product of:
                  3.1847067 = idf(docFreq=4974, maxDocs=44218)
                  0.05046903 = queryNorm
                0.21111822 = fieldWeight in 3390, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.1847067 = idf(docFreq=4974, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3390)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The automated categorisation (or classification) of texts into topical categories has a long history, dating back at least to 1960. Until the late '80s, the dominant approach to the problem involved knowledge-engineering automatic categorisers, i.e. manually building a set of rules encoding expert knowledge an how to classify documents. In the '90s, with the booming production and availability of on-line documents, automated text categorisation has witnessed an increased and renewed interest. A newer paradigm based an machine learning has superseded the previous approach. Within this paradigm, a general inductive process automatically builds a classifier by "learning", from a set of previously classified documents, the characteristics of one or more categories; the advantages are a very good effectiveness, a considerable savings in terms of expert manpower, and domain independence. In this tutorial we look at the main approaches that have been taken towards automatic text categorisation within the general machine learning paradigm. Issues of document indexing, classifier construction, and classifier evaluation, will be touched upon.
  4. Montgomery, C.A.: Linguistics and information science (1972) 0.04
    0.038929906 = product of:
      0.11678971 = sum of:
        0.11678971 = weight(_text_:interest in 6669) [ClassicSimilarity], result of:
          0.11678971 = score(doc=6669,freq=4.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.46577424 = fieldWeight in 6669, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.046875 = fieldNorm(doc=6669)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper defines the relationship between linguistics and information science in terms of a common interest in natural language. The notion of automated processing of natural language - i.e., machine simulation of the language processing activities of a human - provides novel possibilities for interaction between linguistics, who have a theoretical interest in such activities, and information scientists, who have more practical goals, e.g. simulating the language processing activities of an indexer with a machine. The concept of a natural language information system is introduces as a framenwork for reviewing automated language processing efforts by computational linguists and information scientists. In terms of this framework, the former have concentrated on automating the operations of the component for content analysis and representation, while the latter have emphasized the data management component. The complementary nature of these developments allows the postulation of an integrated approach to automated language processing. This approach, which is outlined in the final sections of the paper, incorporates current notions in linguistic theory and information science, as well as design features of recent computational linguistic models
  5. Brill, E.: ¬An overview of empirical natural language processing (1997) 0.04
    0.036703467 = product of:
      0.110110395 = sum of:
        0.110110395 = weight(_text_:interest in 3249) [ClassicSimilarity], result of:
          0.110110395 = score(doc=3249,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.43913615 = fieldWeight in 3249, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.0625 = fieldNorm(doc=3249)
      0.33333334 = coord(1/3)
    
    Abstract
    Introduces a special section reviewing recent research in empirical methods in speech recognition, syntactic parsing, semantic processing, information extraction and machine translation. Introduces a series of specialized articles on these topics and attempts to describe and explain the growing interest in using learning methods to aid the development of natural language processing systems
  6. Ruge, G.: Sprache und Computer : Wortbedeutung und Termassoziation. Methoden zur automatischen semantischen Klassifikation (1995) 0.03
    0.033315543 = product of:
      0.099946626 = sum of:
        0.099946626 = sum of:
          0.045243774 = weight(_text_:classification in 1534) [ClassicSimilarity], result of:
            0.045243774 = score(doc=1534,freq=2.0), product of:
              0.16072905 = queryWeight, product of:
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.05046903 = queryNorm
              0.28149095 = fieldWeight in 1534, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.0625 = fieldNorm(doc=1534)
          0.054702852 = weight(_text_:22 in 1534) [ClassicSimilarity], result of:
            0.054702852 = score(doc=1534,freq=2.0), product of:
              0.17673394 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05046903 = queryNorm
              0.30952093 = fieldWeight in 1534, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=1534)
      0.33333334 = coord(1/3)
    
    Content
    Enthält folgende Kapitel: (1) Motivation; (2) Language philosophical foundations; (3) Structural comparison of extensions; (4) Earlier approaches towards term association; (5) Experiments; (6) Spreading-activation networks or memory models; (7) Perspective. Appendices: Heads and modifiers of 'car'. Glossary. Index. Language and computer. Word semantics and term association. Methods towards an automatic semantic classification
    Footnote
    Rez. in: Knowledge organization 22(1995) no.3/4, S.182-184 (M.T. Rolland)
  7. Morris, V.: Automated language identification of bibliographic resources (2020) 0.03
    0.033315543 = product of:
      0.099946626 = sum of:
        0.099946626 = sum of:
          0.045243774 = weight(_text_:classification in 5749) [ClassicSimilarity], result of:
            0.045243774 = score(doc=5749,freq=2.0), product of:
              0.16072905 = queryWeight, product of:
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.05046903 = queryNorm
              0.28149095 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
          0.054702852 = weight(_text_:22 in 5749) [ClassicSimilarity], result of:
            0.054702852 = score(doc=5749,freq=2.0), product of:
              0.17673394 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05046903 = queryNorm
              0.30952093 = fieldWeight in 5749, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0625 = fieldNorm(doc=5749)
      0.33333334 = coord(1/3)
    
    Date
    2. 3.2020 19:04:22
    Source
    Cataloging and classification quarterly. 58(2020) no.1, S.1-27
  8. Andrushchenko, M.; Sandberg, K.; Turunen, R.; Marjanen, J.; Hatavara, M.; Kurunmäki, J.; Nummenmaa, T.; Hyvärinen, M.; Teräs, K.; Peltonen, J.; Nummenmaa, J.: Using parsed and annotated corpora to analyze parliamentarians' talk in Finland (2022) 0.03
    0.032441586 = product of:
      0.09732476 = sum of:
        0.09732476 = weight(_text_:interest in 471) [ClassicSimilarity], result of:
          0.09732476 = score(doc=471,freq=4.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.38814518 = fieldWeight in 471, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.0390625 = fieldNorm(doc=471)
      0.33333334 = coord(1/3)
    
    Abstract
    We present a search system for grammatically analyzed corpora of Finnish parliamentary records and interviews with former parliamentarians, annotated with metadata of talk structure and involved parliamentarians, and discuss their use through carefully chosen digital humanities case studies. We first introduce the construction, contents, and principles of use of the corpora. Then we discuss the application of the search system and the corpora to study how politicians talk about power, how ideological terms are used in political speech, and how to identify narratives in the data. All case studies stem from questions in the humanities and the social sciences, but rely on the grammatically parsed corpora in both identifying and quantifying passages of interest. Finally, the paper discusses the role of natural language processing methods for questions in the (digital) humanities. It makes the claim that a digital humanities inquiry of parliamentary speech and interviews with politicians cannot only rely on computational humanities modeling, but needs to accommodate a range of perspectives starting with simple searches, quantitative exploration, and ending with modeling. Furthermore, the digital humanities need a more thorough discussion about how the utilization of tools from information science and technologies alter the research questions posed in the humanities.
    Series
    JASIST special issue on digital humanities (DH): C. Methodological innovations, challenges, and new interest in DH
  9. Chandrasekar, R.; Srinivas, B.: Automatic induction of rules for text simplification (1997) 0.03
    0.032115534 = product of:
      0.0963466 = sum of:
        0.0963466 = weight(_text_:interest in 2873) [ClassicSimilarity], result of:
          0.0963466 = score(doc=2873,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.38424414 = fieldWeight in 2873, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2873)
      0.33333334 = coord(1/3)
    
    Abstract
    Explores methods to automatically transform sentences in order to make them simpler. These methods involve the use of a rule-based system, driven by the syntax of the text in the domain of interest. Hand-crafting rules for every domain is time-consuming and impractical. Describes an algorithm and an implementation by which generalized rules for simplification are automatically induced from annotated training materials using a novel partial parsing technique, which combines constituent structure and dependency information. The algorithm employs example-based generalisations on linguistically motivated structures
  10. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.03
    0.031786136 = product of:
      0.0953584 = sum of:
        0.0953584 = weight(_text_:interest in 1264) [ClassicSimilarity], result of:
          0.0953584 = score(doc=1264,freq=6.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.38030308 = fieldWeight in 1264, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.33333334 = coord(1/3)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
  11. Thelwall, M.; Price, L.: Language evolution and the spread of ideas on the Web : a procedure for identifying emergent hybrid word (2006) 0.03
    0.027527599 = product of:
      0.082582794 = sum of:
        0.082582794 = weight(_text_:interest in 5896) [ClassicSimilarity], result of:
          0.082582794 = score(doc=5896,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.3293521 = fieldWeight in 5896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.046875 = fieldNorm(doc=5896)
      0.33333334 = coord(1/3)
    
    Abstract
    Word usage is of interest to linguists for its own sake as well as to social scientists and others who seek to track the spread of ideas, for example, in public debates over political decisions. The historical evolution of language can be analyzed with the tools of corpus linguistics through evolving corpora and the Web. But word usage statistics can only be gathered for known words. In this article, techniques are described and tested for identifying new words from the Web, focusing on the case when the words are related to a topic and have a hybrid form with a common sequence of letters. The results highlight the need to employ a combination of search techniques and show the wide potential of hybrid word family investigations in linguistics and social science.
  12. Cimiano, P.; Völker, J.; Studer, R.: Ontologies on demand? : a description of the state-of-the-art, applications, challenges and trends for ontology learning from text (2006) 0.03
    0.027527599 = product of:
      0.082582794 = sum of:
        0.082582794 = weight(_text_:interest in 6014) [ClassicSimilarity], result of:
          0.082582794 = score(doc=6014,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.3293521 = fieldWeight in 6014, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.046875 = fieldNorm(doc=6014)
      0.33333334 = coord(1/3)
    
    Abstract
    Ontologies are nowadays used for many applications requiring data, services and resources in general to be interoperable and machine understandable. Such applications are for example web service discovery and composition, information integration across databases, intelligent search, etc. The general idea is that data and services are semantically described with respect to ontologies, which are formal specifications of a domain of interest, and can thus be shared and reused in a way such that the shared meaning specified by the ontology remains formally the same across different parties and applications. As the cost of creating ontologies is relatively high, different proposals have emerged for learning ontologies from structured and unstructured resources. In this article we examine the maturity of techniques for ontology learning from textual resources, addressing the question whether the state-of-the-art is mature enough to produce ontologies 'on demand'.
  13. Costa-jussà, M.R.: How much hybridization does machine translation need? (2015) 0.03
    0.027527599 = product of:
      0.082582794 = sum of:
        0.082582794 = weight(_text_:interest in 2227) [ClassicSimilarity], result of:
          0.082582794 = score(doc=2227,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.3293521 = fieldWeight in 2227, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.046875 = fieldNorm(doc=2227)
      0.33333334 = coord(1/3)
    
    Abstract
    Rule-based and corpus-based machine translation (MT) have coexisted for more than 20 years. Recently, boundaries between the two paradigms have narrowed and hybrid approaches are gaining interest from both academia and businesses. However, since hybrid approaches involve the multidisciplinary interaction of linguists, computer scientists, engineers, and information specialists, understandably a number of issues exist. While statistical methods currently dominate research work in MT, most commercial MT systems are technically hybrid systems. The research community should investigate the benefits and questions surrounding the hybridization of MT systems more actively. This paper discusses various issues related to hybrid MT including its origins, architectures, achievements, and frustrations experienced in the community. It can be said that both rule-based and corpus- based MT systems have benefited from hybridization when effectively integrated. In fact, many of the current rule/corpus-based MT approaches are already hybridized since they do include statistics/rules at some point.
  14. Ghazzawi, N.; Robichaud, B.; Drouin, P.; Sadat, F.: Automatic extraction of specialized verbal units (2018) 0.03
    0.027527599 = product of:
      0.082582794 = sum of:
        0.082582794 = weight(_text_:interest in 4094) [ClassicSimilarity], result of:
          0.082582794 = score(doc=4094,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.3293521 = fieldWeight in 4094, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.046875 = fieldNorm(doc=4094)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper presents a methodology for the automatic extraction of specialized Arabic, English and French verbs of the field of computing. Since nominal terms are predominant in terminology, our interest is to explore to what extent verbs can also be part of a terminological analysis. Hence, our objective is to verify how an existing extraction tool will perform when it comes to specialized verbs in a given specialized domain. Furthermore, we want to investigate any particularities that a language can represent regarding verbal terms from the automatic extraction perspective. Our choice to operate on three different languages reflects our desire to see whether the chosen tool can perform better on one language compared to the others. Moreover, given that Arabic is a morphologically rich and complex language, we consider investigating the results yielded by the extraction tool. The extractor used for our experiment is TermoStat (Drouin 2003). So far, our results show that the extraction of verbs of computing represents certain differences in terms of quality and particularities of these units in this specialized domain between the languages under question.
  15. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.03
    0.026719403 = product of:
      0.080158204 = sum of:
        0.080158204 = product of:
          0.2404746 = sum of:
            0.2404746 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.2404746 = score(doc=862,freq=2.0), product of:
                0.427877 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05046903 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  16. Rahmstorf, G.: Concept structures for large vocabularies (1998) 0.02
    0.024986658 = product of:
      0.07495997 = sum of:
        0.07495997 = sum of:
          0.03393283 = weight(_text_:classification in 75) [ClassicSimilarity], result of:
            0.03393283 = score(doc=75,freq=2.0), product of:
              0.16072905 = queryWeight, product of:
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.05046903 = queryNorm
              0.21111822 = fieldWeight in 75, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.046875 = fieldNorm(doc=75)
          0.04102714 = weight(_text_:22 in 75) [ClassicSimilarity], result of:
            0.04102714 = score(doc=75,freq=2.0), product of:
              0.17673394 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05046903 = queryNorm
              0.23214069 = fieldWeight in 75, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=75)
      0.33333334 = coord(1/3)
    
    Abstract
    A technology is described which supports the acquisition, visualisation and manipulation of large vocabularies with associated structures. It is used for dictionary production, terminology data bases, thesauri, library classification systems etc. Essential features of the technology are a lexicographic user interface, variable word description, unlimited list of word readings, a concept language, automatic transformations of formulas into graphic structures, structure manipulation operations and retransformation into formulas. The concept language includes notations for undefined concepts. The structure of defined concepts can be constructed interactively. The technology supports the generation of large vocabularies with structures representing word senses. Concept structures and ordering systems for indexing and retrieval can be constructed separately and connected by associating relations.
    Date
    30.12.2001 19:01:22
  17. Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.02
    0.024986658 = product of:
      0.07495997 = sum of:
        0.07495997 = sum of:
          0.03393283 = weight(_text_:classification in 563) [ClassicSimilarity], result of:
            0.03393283 = score(doc=563,freq=2.0), product of:
              0.16072905 = queryWeight, product of:
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.05046903 = queryNorm
              0.21111822 = fieldWeight in 563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.1847067 = idf(docFreq=4974, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
          0.04102714 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
            0.04102714 = score(doc=563,freq=2.0), product of:
              0.17673394 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05046903 = queryNorm
              0.23214069 = fieldWeight in 563, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=563)
      0.33333334 = coord(1/3)
    
    Abstract
    In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
    Date
    10. 1.2013 19:22:47
  18. Chandrasekar, R.; Bangalore, S.: Glean : using syntactic information in document filtering (2002) 0.02
    0.022939665 = product of:
      0.068818994 = sum of:
        0.068818994 = weight(_text_:interest in 4257) [ClassicSimilarity], result of:
          0.068818994 = score(doc=4257,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.27446008 = fieldWeight in 4257, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4257)
      0.33333334 = coord(1/3)
    
    Abstract
    In today's networked world, a huge amount of data is available in machine-processable form. Likewise, there are any number of search engines and specialized information retrieval (IR) programs that seek to extract relevant information from these data repositories. Most IR systems and Web search engines have been designed for speed and tend to maximize the quantity of information (recall) rather than the relevance of the information (precision) to the query. As a result, search engine users get inundated with information for practically any query, and are forced to scan a large number of potentially relevant items to get to the information of interest. The Holy Grail of IR is to somehow retrieve those and only those documents pertinent to the user's query. Polysemy and synonymy - the fact that often there are several meanings for a word or phrase, and likewise, many ways to express a conceptmake this a very hard task. While conventional IR systems provide usable solutions, there are a number of open problems to be solved, in areas such as syntactic processing, semantic analysis, and user modeling, before we develop systems that "understand" user queries and text collections. Meanwhile, we can use tools and techniques available today to improve the precision of retrieval. In particular, using the approach described in this article, we can approximate understanding using the syntactic structure and patterns of language use that is latent in documents to make IR more effective.
  19. Kettunen, K.; Kunttu, T.; Järvelin, K.: To stem or lemmatize a highly inflectional language in a probabilistic IR environment? (2005) 0.02
    0.022939665 = product of:
      0.068818994 = sum of:
        0.068818994 = weight(_text_:interest in 4395) [ClassicSimilarity], result of:
          0.068818994 = score(doc=4395,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.27446008 = fieldWeight in 4395, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4395)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - To show that stem generation compares well with lemmatization as a morphological tool for a highly inflectional language for IR purposes in a best-match retrieval system. Design/methodology/approach - Effects of three different morphological methods - lemmatization, stemming and stem production - for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four-point relevance scale which is partitioned differently in different test settings. Findings - Results show that stem production, a lighter method than morphological lemmatization, compares well with lemmatization in a best-match IR environment. Differences in performance between stem production and lemmatization are small and they are not statistically significant in most of the tested settings. It is also shown that hitherto a rather neglected method of morphological processing for Finnish, stemming, performs reasonably well although the stemmer used - a Porter stemmer implementation - is far from optimal for a morphologically complex language like Finnish. In another series of tests, the effects of compound splitting and derivational expansion of queries are tested. Practical implications - Usefulness of morphological lemmatization and stem generation for IR purposes can be estimated with many factors. On the average P-R level they seem to behave very close to each other in a probabilistic IR system. Thus, the choice of the used method with highly inflectional languages needs to be estimated along other dimensions too. Originality/value - Results are achieved using Finnish as an example of a highly inflectional language. The results are of interest for anyone who is interested in processing of morphological variation of a highly inflected language for IR purposes.
  20. Niemi, T.; Jämsen, J.: ¬A query language for discovering semantic associations, part II : sample queries and query evaluation (2007) 0.02
    0.022939665 = product of:
      0.068818994 = sum of:
        0.068818994 = weight(_text_:interest in 580) [ClassicSimilarity], result of:
          0.068818994 = score(doc=580,freq=2.0), product of:
            0.25074318 = queryWeight, product of:
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.05046903 = queryNorm
            0.27446008 = fieldWeight in 580, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.9682584 = idf(docFreq=835, maxDocs=44218)
              0.0390625 = fieldNorm(doc=580)
      0.33333334 = coord(1/3)
    
    Abstract
    In our query language introduced in Part I (Journal of the American Society for Information Science and Technology. 58(2007) no.11, S.1559-1568) the user can formulate queries to find out (possibly complex) semantic relationships among entities. In this article we demonstrate the usage of our query language and discuss the new applications that it supports. We categorize several query types and give sample queries. The query types are categorized based on whether the entities specified in a query are known or unknown to the user in advance, and whether text information in documents is utilized. Natural language is used to represent the results of queries in order to facilitate correct interpretation by the user. We discuss briefly the issues related to the prototype implementation of the query language and show that an independent operation like Rho (Sheth et al., 2005; Anyanwu & Sheth, 2002, 2003), which presupposes entities of interest to be known in advance, is exceedingly inefficient in emulating the behavior of our query language. The discussion also covers potential problems, and challenges for future work.

Years

Languages

  • e 95
  • d 20
  • f 1
  • m 1
  • More… Less…

Types

  • a 94
  • el 10
  • m 9
  • s 6
  • x 4
  • p 2
  • d 1
  • More… Less…