Search (81 results, page 1 of 5)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.12

0.11821951 = product of:
  0.23643902 = sum of:
    0.0089855315 = weight(_text_:information in 563) [ClassicSimilarity], result of:
      0.0089855315 = score(doc=563,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.116372846 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.20957573 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.20957573 = score(doc=563,freq=2.0), product of:
        0.37289858 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.043984205 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.017877758 = product of:
      0.035755515 = sum of:
        0.035755515 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.035755515 = score(doc=563,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.5 = coord(3/6)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Fóris, A.: Network theory and terminology (2013) 0.05
```
0.049350783 = product of:
  0.14805235 = sum of:
    0.13315421 = weight(_text_:networks in 1365) [ClassicSimilarity], result of:
      0.13315421 = score(doc=1365,freq=12.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.640036 = fieldWeight in 1365, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1365)
    0.0148981325 = product of:
      0.029796265 = sum of:
        0.029796265 = weight(_text_:22 in 1365) [ClassicSimilarity], result of:
          0.029796265 = score(doc=1365,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.19345059 = fieldWeight in 1365, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1365)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

The paper aims to present the relations of network theory and terminology. The model of scale-free networks, which has been recently developed and widely applied since, can be effectively used in terminology research as well. Operation based on the principle of networks is a universal characteristic of complex systems. Networks are governed by general laws. The model of scale-free networks can be viewed as a statistical-probability model, and it can be described with mathematical tools. Its main feature is that "everything is connected to everything else," that is, every node is reachable (in a few steps) starting from any other node; this phenomena is called "the small world phenomenon." The existence of a linguistic network and the general laws of the operation of networks enable us to place issues of language use in the complex system of relations that reveal the deeper connection s between phenomena with the help of networks embedded in each other. The realization of the metaphor that language also has a network structure is the basis of the classification methods of the terminological system, and likewise of the ways of creating terminology databases, which serve the purpose of providing easy and versatile accessibility to specialised knowledge.

Date

2. 9.2014 21:22:48

Radev, D.R.; Joseph, M.T.; Gibson, B.; Muthukrishnan, P.: ¬A bibliometric and network analysis of the field of computational linguistics (2016) 0.03

0.028862368 = product of:
  0.0865871 = sum of:
    0.010483121 = weight(_text_:information in 2764) [ClassicSimilarity], result of:
      0.010483121 = score(doc=2764,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.13576832 = fieldWeight in 2764, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2764)
    0.07610398 = weight(_text_:networks in 2764) [ClassicSimilarity], result of:
      0.07610398 = score(doc=2764,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.36581108 = fieldWeight in 2764, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2764)
  0.33333334 = coord(2/6)

Abstract: The ACL Anthology is a large collection of research papers in computational linguistics. Citation data were obtained using text extraction from a collection of PDF files with significant manual postprocessing performed to clean up the results. Manual annotation of the references was then performed to complete the citation network. We analyzed the networks of paper citations, author citations, and author collaborations in an attempt to identify the most central papers and authors. The analysis includes general network statistics, PageRank, metrics across publication years and venues, the impact factor and h-index, as well as other measures.
Source: Journal of the Association for Information Science and Technology. 67(2016) no.3, S.683-706

Vlachidis, A.; Binding, C.; Tudhope, D.; May, K.: Excavating grey literature : a case study on the rich indexing of archaeological documents via natural language-processing techniques and knowledge-based resources (2010) 0.02
```
0.024858069 = product of:
  0.0745742 = sum of:
    0.01339484 = weight(_text_:information in 3948) [ClassicSimilarity], result of:
      0.01339484 = score(doc=3948,freq=10.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.1734784 = fieldWeight in 3948, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.03125 = fieldNorm(doc=3948)
    0.061179362 = weight(_text_:united in 3948) [ClassicSimilarity], result of:
      0.061179362 = score(doc=3948,freq=2.0), product of:
        0.24675635 = queryWeight, product of:
          5.6101127 = idf(docFreq=439, maxDocs=44218)
          0.043984205 = queryNorm
        0.2479343 = fieldWeight in 3948, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6101127 = idf(docFreq=439, maxDocs=44218)
          0.03125 = fieldNorm(doc=3948)
  0.33333334 = coord(2/6)
```
Abstract

Purpose - This paper sets out to discuss the use of information extraction (IE), a natural language-processing (NLP) technique to assist "rich" semantic indexing of diverse archaeological text resources. The focus of the research is to direct a semantic-aware "rich" indexing of diverse natural language resources with properties capable of satisfying information retrieval from online publications and datasets associated with the Semantic Technologies for Archaeological Resources (STAR) project. Design/methodology/approach - The paper proposes use of the English Heritage extension (CRM-EH) of the standard core ontology in cultural heritage, CIDOC CRM, and exploitation of domain thesauri resources for driving and enhancing an Ontology-Oriented Information Extraction process. The process of semantic indexing is based on a rule-based Information Extraction technique, which is facilitated by the General Architecture of Text Engineering (GATE) toolkit and expressed by Java Annotation Pattern Engine (JAPE) rules. Findings - Initial results suggest that the combination of information extraction with knowledge resources and standard conceptual models is capable of supporting semantic-aware term indexing. Additional efforts are required for further exploitation of the technique and adoption of formal evaluation methods for assessing the performance of the method in measurable terms. Originality/value - The value of the paper lies in the semantic indexing of 535 unpublished online documents often referred to as "Grey Literature", from the Archaeological Data Service OASIS corpus (Online AccesS to the Index of archaeological investigationS), with respect to the CRM ontological concepts E49.Time Appellation and P19.Physical Object.

Footnote

Beitrag in einem Special Issue: Content architecture: exploiting and managing diverse resources: proceedings of the first national conference of the United Kingdom chapter of the International Society for Knowedge Organization (ISKO)
Helbig, H.: Knowledge representation and the semantics of natural language (2014) 0.02
```
0.021649845 = product of:
  0.064949535 = sum of:
    0.01058955 = weight(_text_:information in 2396) [ClassicSimilarity], result of:
      0.01058955 = score(doc=2396,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.13714671 = fieldWeight in 2396, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2396)
    0.054359984 = weight(_text_:networks in 2396) [ClassicSimilarity], result of:
      0.054359984 = score(doc=2396,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.26129362 = fieldWeight in 2396, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2396)
  0.33333334 = coord(2/6)
```
Abstract

Natural Language is not only the most important means of communication between human beings, it is also used over historical periods for the preservation of cultural achievements and their transmission from one generation to the other. During the last few decades, the flod of digitalized information has been growing tremendously. This tendency will continue with the globalisation of information societies and with the growing importance of national and international computer networks. This is one reason why the theoretical understanding and the automated treatment of communication processes based on natural language have such a decisive social and economic impact. In this context, the semantic representation of knowledge originally formulated in natural language plays a central part, because it connects all components of natural language processing systems, be they the automatic understanding of natural language (analysis), the rational reasoning over knowledge bases, or the generation of natural language expressions from formal representations. This book presents a method for the semantic representation of natural language expressions (texts, sentences, phrases, etc.) which can be used as a universal knowledge representation paradigm in the human sciences, like linguistics, cognitive psychology, or philosophy of language, as well as in computational linguistics and in artificial intelligence. It is also an attempt to close the gap between these disciplines, which to a large extent are still working separately.
Levin, M.; Krawczyk, S.; Bethard, S.; Jurafsky, D.: Citation-based bootstrapping for large-scale author disambiguation (2012) 0.02
```
0.020615976 = product of:
  0.061847925 = sum of:
    0.007487943 = weight(_text_:information in 246) [ClassicSimilarity], result of:
      0.007487943 = score(doc=246,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.09697737 = fieldWeight in 246, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=246)
    0.054359984 = weight(_text_:networks in 246) [ClassicSimilarity], result of:
      0.054359984 = score(doc=246,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.26129362 = fieldWeight in 246, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=246)
  0.33333334 = coord(2/6)
```
Abstract

We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first "bootstrap" stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.1030-1047
Panicheva, P.; Cardiff, J.; Rosso, P.: Identifying subjective statements in news titles using a personal sense annotation framework (2013) 0.02
```
0.018971303 = product of:
  0.05691391 = sum of:
    0.012707461 = weight(_text_:information in 968) [ClassicSimilarity], result of:
      0.012707461 = score(doc=968,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.16457605 = fieldWeight in 968, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=968)
    0.044206448 = product of:
      0.088412896 = sum of:
        0.088412896 = weight(_text_:states in 968) [ClassicSimilarity], result of:
          0.088412896 = score(doc=968,freq=2.0), product of:
            0.24220218 = queryWeight, product of:
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.043984205 = queryNorm
            0.3650376 = fieldWeight in 968, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.506572 = idf(docFreq=487, maxDocs=44218)
              0.046875 = fieldNorm(doc=968)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

Subjective language contains information about private states. The goal of subjective language identification is to determine that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, "Personal Sense," has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal experience and carries personal characteristics. In this paper we investigate how Personal Sense can be harnessed for the purpose of identifying subjectivity in news titles. In the process, we develop a new Personal Sense annotation framework for annotating and classifying subjectivity, polarity, and emotion. The Personal Sense framework yields high performance in a fine-grained subsentence subjectivity classification. Our experiments demonstrate lexico-syntactic features to be useful for the identification of subjectivity indicators and the targets that receive the subjective Personal Sense.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1411-1422
Lian, T.; Yu, C.; Wang, W.; Yuan, Q.; Hou, Z.: Doctoral dissertations on tourism in China : a co-word analysis (2016) 0.01
```
0.009059997 = product of:
  0.054359984 = sum of:
    0.054359984 = weight(_text_:networks in 3178) [ClassicSimilarity], result of:
      0.054359984 = score(doc=3178,freq=2.0), product of:
        0.20804176 = queryWeight, product of:
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.043984205 = queryNorm
        0.26129362 = fieldWeight in 3178, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.72992 = idf(docFreq=1060, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3178)
  0.16666667 = coord(1/6)
```
Abstract

The aim of this paper is to map the foci of research in doctoral dissertations on tourism in China. In the paper, coword analysis is applied, with keywords coming from six public dissertation databases, i.e. CDFD, Wanfang Data, NLC, CALIS, ISTIC, and NSTL, as well as some university libraries providing doctoral dissertations on tourism. Altogether we have examined 928 doctoral dissertations on tourism written between 1989 and 2013. Doctoral dissertations on tourism in China involve 36 first level disciplines and 102 secondary level disciplines. We collect the top 68 keywords of practical significance in tourism which are mentioned at least four times or more. These keywords are classified into 12 categories based on co-word analysis, including cluster analysis, strategic diagrams analysis, and social network analysis. According to the strategic diagram of the 12 categories, we find the mature and immature areas in tourism study. From social networks, we can see the social network maps of original co-occurrence matrix and k-cores analysis of binary matrix. The paper provides valuable insight into the study of tourism by analyzing doctoral dissertations on tourism in China.
Lawrie, D.; Mayfield, J.; McNamee, P.; Oard, P.W.: Cross-language person-entity linking from 20 languages (2015) 0.01
```
0.00895443 = product of:
  0.026863288 = sum of:
    0.0089855315 = weight(_text_:information in 1848) [ClassicSimilarity], result of:
      0.0089855315 = score(doc=1848,freq=2.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.116372846 = fieldWeight in 1848, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1848)
    0.017877758 = product of:
      0.035755515 = sum of:
        0.035755515 = weight(_text_:22 in 1848) [ClassicSimilarity], result of:
          0.035755515 = score(doc=1848,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.23214069 = fieldWeight in 1848, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1848)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

The goal of entity linking is to associate references to an entity that is found in unstructured natural language content to an authoritative inventory of known entities. This article describes the construction of 6 test collections for cross-language person-entity linking that together span 22 languages. Fully automated components were used together with 2 crowdsourced validation stages to affordably generate ground-truth annotations with an accuracy comparable to that of a completely manual process. The resulting test collections each contain between 642 (Arabic) and 2,361 (Romanian) person references in non-English texts for which the correct resolution in English Wikipedia is known, plus a similar number of references for which no correct resolution into English Wikipedia is believed to exist. Fully automated cross-language person-name linking experiments with 20 non-English languages yielded a resolution accuracy of between 0.84 (Serbian) and 0.98 (Romanian), which compares favorably with previously reported cross-language entity linking results for Spanish.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.6, S.1106-1123
Deventer, J.P. van; Kruger, C.J.; Johnson, R.D.: Delineating knowledge management through lexical analysis : a retrospective (2015) 0.01
```
0.006502447 = product of:
  0.019507341 = sum of:
    0.009078649 = weight(_text_:information in 3807) [ClassicSimilarity], result of:
      0.009078649 = score(doc=3807,freq=6.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.11757882 = fieldWeight in 3807, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.02734375 = fieldNorm(doc=3807)
    0.010428692 = product of:
      0.020857384 = sum of:
        0.020857384 = weight(_text_:22 in 3807) [ClassicSimilarity], result of:
          0.020857384 = score(doc=3807,freq=2.0), product of:
            0.1540252 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.043984205 = queryNorm
            0.1354154 = fieldWeight in 3807, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.02734375 = fieldNorm(doc=3807)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)
```
Abstract

Purpose Academic authors tend to define terms that meet their own needs. Knowledge Management (KM) is a term that comes to mind and is examined in this study. Lexicographical research identified KM terms used by authors from 1996 to 2006 in academic outlets to define KM. Data were collected based on strict criteria which included that definitions should be unique instances. From 2006 onwards, these authors could not identify new unique instances of definitions with repetitive usage of such definition instances. Analysis revealed that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, and Process) and Contextualised Content (Information). The paper aims to discuss these issues. Design/methodology/approach The aim of this paper is to add to the body of knowledge in the KM discipline and supply KM practitioners and scholars with insight into what is commonly regarded to be KM so as to reignite the debate on what one could consider as KM. The lexicon used by KM scholars was evaluated though the application of lexicographical research methods as extended though Knowledge Discovery and Text Analysis methods. Findings By simplifying term relationships through the application of lexicographical research methods, as extended though Knowledge Discovery and Text Analysis methods, it was found that KM is directly defined by People (Person and Organisation), Processes (Codify, Share, Leverage, Process) and Contextualised Content (Information). One would therefore be able to indicate that KM, from an academic point of view, refers to people processing contextualised content.

Date

20. 1.2015 18:30:22

Source

Aslib journal of information management. 67(2015) no.2, S.203-229
Engerer, V.: Exploring interdisciplinary relationships between linguistics and information retrieval from the 1960s to today (2017) 0.00
```
0.004735791 = product of:
  0.028414747 = sum of:
    0.028414747 = weight(_text_:information in 3434) [ClassicSimilarity], result of:
      0.028414747 = score(doc=3434,freq=20.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.36800325 = fieldWeight in 3434, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3434)
  0.16666667 = coord(1/6)
```
Abstract

This article explores how linguistics has influenced information retrieval (IR) and attempts to explain the impact of linguistics through an analysis of internal developments in information science generally, and IR in particular. It notes that information science/IR has been evolving from a case science into a fully fledged, "disciplined"/disciplinary science. The article establishes correspondences between linguistics and information science/IR using the three established IR paradigms-physical, cognitive, and computational-as a frame of reference. The current relationship between information science/IR and linguistics is elucidated through discussion of some recent information science publications dealing with linguistic topics and a novel technique, "keyword collocation analysis," is introduced. Insights from interdisciplinarity research and case theory are also discussed. It is demonstrated that the three stages of interdisciplinarity, namely multidisciplinarity, interdisciplinarity (in the narrow sense), and transdisciplinarity, can be linked to different phases of the information science/IR-linguistics relationship and connected to different ways of using linguistic theory in information science and IR.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.3, S.660-680
Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.00
```
0.0042358204 = product of:
  0.025414921 = sum of:
    0.025414921 = weight(_text_:information in 2339) [ClassicSimilarity], result of:
      0.025414921 = score(doc=2339,freq=16.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.3291521 = fieldWeight in 2339, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
  0.16666667 = coord(1/6)
```
Abstract

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.12, S.2553-2565
Multi-source, multilingual information extraction and summarization (2013) 0.00
```
0.0037439712 = product of:
  0.022463826 = sum of:
    0.022463826 = weight(_text_:information in 978) [ClassicSimilarity], result of:
      0.022463826 = score(doc=978,freq=18.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.2909321 = fieldWeight in 978, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=978)
  0.16666667 = coord(1/6)
```
Abstract

Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams---in general, in multiple languages---containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought---names vs. events,--- and the nature of the sources---news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.

RSWK

Natürlichsprachiges System / Information Extraction / Automatische Inhaltsanalyse / Zusammenfassung / Aufsatzsammlung

Subject

Natürlichsprachiges System / Information Extraction / Automatische Inhaltsanalyse / Zusammenfassung / Aufsatzsammlung

Schmolz, H.: Anaphora resolution and text retrieval : a lnguistic analysis of hypertexts (2015) 0.00

0.0035298502 = product of:
  0.0211791 = sum of:
    0.0211791 = weight(_text_:information in 1172) [ClassicSimilarity], result of:
      0.0211791 = score(doc=1172,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.27429342 = fieldWeight in 1172, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=1172)
  0.16666667 = coord(1/6)

RSWK: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>
Subject: Englisch / Anapher <Syntax> / Hypertext / Information Retrieval / Korpus <Linguistik>

Keselman, A.; Rosemblat, G.; Kilicoglu, H.; Fiszman, M.; Jin, H.; Shin, D.; Rindflesch, T.C.: Adapting semantic natural language processing technology to address information overload in influenza epidemic management (2010) 0.00
```
0.0035298502 = product of:
  0.0211791 = sum of:
    0.0211791 = weight(_text_:information in 1312) [ClassicSimilarity], result of:
      0.0211791 = score(doc=1312,freq=16.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.27429342 = fieldWeight in 1312, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1312)
  0.16666667 = coord(1/6)
```
Abstract

The explosion of disaster health information results in information overload among response professionals. The objective of this project was to determine the feasibility of applying semantic natural language processing (NLP) technology to addressing this overload. The project characterizes concepts and relationships commonly used in disaster health-related documents on influenza pandemics, as the basis for adapting an existing semantic summarizer to the domain. Methods include human review and semantic NLP analysis of a set of relevant documents. This is followed by a pilot test in which two information specialists use the adapted application for a realistic information-seeking task. According to the results, the ontology of influenza epidemics management can be described via a manageable number of semantic relationships that involve concepts from a limited number of semantic types. Test users demonstrate several ways to engage with the application to obtain useful information. This suggests that existing semantic NLP algorithms can be adapted to support information summarization and visualization in influenza epidemics and other disaster health areas. However, additional research is needed in the areas of terminology development (as many relevant relationships and terms are not part of existing standardized vocabularies), NLP, and user interface design.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2531-2543

Kocijan, K.: Visualizing natural language resources (2015) 0.00

0.0035298502 = product of:
  0.0211791 = sum of:
    0.0211791 = weight(_text_:information in 2995) [ClassicSimilarity], result of:
      0.0211791 = score(doc=2995,freq=4.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.27429342 = fieldWeight in 2995, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.078125 = fieldNorm(doc=2995)
  0.16666667 = coord(1/6)

Source: Re:inventing information science in the networked society: Proceedings of the 14th International Symposium on Information Science, Zadar/Croatia, 19th-21st May 2015. Eds.: F. Pehar, C. Schloegl u. C. Wolff

Babik, W.: Keywords as linguistic tools in information and knowledge organization (2017) 0.00
```
0.0034943735 = product of:
  0.020966241 = sum of:
    0.020966241 = weight(_text_:information in 3510) [ClassicSimilarity], result of:
      0.020966241 = score(doc=3510,freq=8.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.27153665 = fieldWeight in 3510, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3510)
  0.16666667 = coord(1/6)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber
Rosemblat, G.; Resnick, M.P.; Auston, I.; Shin, D.; Sneiderman, C.; Fizsman, M.; Rindflesch, T.C.: Extending SemRep to the public health domain (2013) 0.00
```
0.00334871 = product of:
  0.02009226 = sum of:
    0.02009226 = weight(_text_:information in 2096) [ClassicSimilarity], result of:
      0.02009226 = score(doc=2096,freq=10.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.2602176 = fieldWeight in 2096, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2096)
  0.16666667 = coord(1/6)
```
Abstract

We describe the use of a domain-independent method to extend a natural language processing (NLP) application, SemRep (Rindflesch, Fiszman, & Libbus, 2005), based on the knowledge sources afforded by the Unified Medical Language System (UMLS®; Humphreys, Lindberg, Schoolman, & Barnett, 1998) to support the area of health promotion within the public health domain. Public health professionals require good information about successful health promotion policies and programs that might be considered for application within their own communities. Our effort seeks to improve access to relevant information for the public health profession, to help those in the field remain an information-savvy workforce. Natural language processing and semantic techniques hold promise to help public health professionals navigate the growing ocean of information by organizing and structuring this knowledge into a focused public health framework paired with a user-friendly visualization application as a way to summarize results of PubMed® searches in this field of knowledge.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.10, S.1963-1974
Hoenkamp, E.; Bruza, P.: How everyday language can and will boost effective information retrieval (2015) 0.00
```
0.0030569402 = product of:
  0.01834164 = sum of:
    0.01834164 = weight(_text_:information in 2123) [ClassicSimilarity], result of:
      0.01834164 = score(doc=2123,freq=12.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.23754507 = fieldWeight in 2123, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2123)
  0.16666667 = coord(1/6)
```
Abstract

Typing 2 or 3 keywords into a browser has become an easy and efficient way to find information. Yet, typing even short queries becomes tedious on ever shrinking (virtual) keyboards. Meanwhile, speech processing is maturing rapidly, facilitating everyday language input. Also, wearable technology can inform users proactively by listening in on their conversations or processing their social media interactions. Given these developments, everyday language may soon become the new input of choice. We present an information retrieval (IR) algorithm specifically designed to accept everyday language. It integrates two paradigms of information retrieval, previously studied in isolation; one directed mainly at the surface structure of language, the other primarily at the underlying meaning. The integration was achieved by a Markov machine that encodes meaning by its transition graph, and surface structure by the language it generates. A rigorous evaluation of the approach showed, first, that it can compete with the quality of existing language models, second, that it is more effective the more verbose the input, and third, as a consequence, that it is promising for an imminent transition from keyword input, where the onus is on the user to formulate concise queries, to a modality where users can express more freely, more informal, and more natural their need for information in everyday language.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.8, S.1546-1558
Kim, S.; Ko, Y.; Oard, D.W.: Combining lexical and statistical translation evidence for cross-language information retrieval (2015) 0.00
```
0.0029951772 = product of:
  0.017971063 = sum of:
    0.017971063 = weight(_text_:information in 1606) [ClassicSimilarity], result of:
      0.017971063 = score(doc=1606,freq=8.0), product of:
        0.0772133 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.043984205 = queryNorm
        0.23274569 = fieldWeight in 1606, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1606)
  0.16666667 = coord(1/6)
```
Abstract

This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.1, S.23-39

Search (81 results, page 1 of 5)

Authors

Types

Themes

Subjects

Classifications