Search (785 results, page 1 of 40)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.16

0.16028121 = product of:
  0.32056242 = sum of:
    0.32056242 = sum of:
      0.27689505 = weight(_text_:word in 563) [ClassicSimilarity], result of:
        0.27689505 = score(doc=563,freq=16.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.9831117 = fieldWeight in 563, product of:
            4.0 = tf(freq=16.0), with freq of:
              16.0 = termFreq=16.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
      0.043667372 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
        0.043667372 = score(doc=563,freq=2.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.23214069 = fieldWeight in 563, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=563)
  0.5 = coord(1/2)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Date: 10. 1.2013 19:22:47

Hajra, A. et al.: Enriching scientific publications from LOD repositories through word embeddings approach (2016) 0.12

0.11797046 = product of:
  0.23594092 = sum of:
    0.23594092 = sum of:
      0.16316196 = weight(_text_:word in 3281) [ClassicSimilarity], result of:
        0.16316196 = score(doc=3281,freq=2.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.5793041 = fieldWeight in 3281, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.078125 = fieldNorm(doc=3281)
      0.072778955 = weight(_text_:22 in 3281) [ClassicSimilarity], result of:
        0.072778955 = score(doc=3281,freq=2.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.38690117 = fieldWeight in 3281, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=3281)
  0.5 = coord(1/2)

Source: Metadata and semantics research: 10th International Conference, MTSR 2016, Göttingen, Germany, November 22-25, 2016, Proceedings. Eds.: E. Garoufallou

Xiong, C.: Knowledge based text representations for information retrieval (2016) 0.11
```
0.11339873 = sum of:
  0.056877762 = product of:
    0.17063329 = sum of:
      0.17063329 = weight(_text_:3a in 5820) [ClassicSimilarity], result of:
        0.17063329 = score(doc=5820,freq=2.0), product of:
          0.4554123 = queryWeight, product of:
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.05371688 = queryNorm
          0.3746787 = fieldWeight in 5820, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            8.478011 = idf(docFreq=24, maxDocs=44218)
            0.03125 = fieldNorm(doc=5820)
    0.33333334 = coord(1/3)
  0.05652097 = product of:
    0.11304194 = sum of:
      0.11304194 = weight(_text_:word in 5820) [ClassicSimilarity], result of:
        0.11304194 = score(doc=5820,freq=6.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.4013537 = fieldWeight in 5820, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.03125 = fieldNorm(doc=5820)
    0.5 = coord(1/2)
```
Abstract

The successes of information retrieval (IR) in recent decades were built upon bag-of-words representations. Effective as it is, bag-of-words is only a shallow text understanding; there is a limited amount of information for document ranking in the word space. This dissertation goes beyond words and builds knowledge based text representations, which embed the external and carefully curated information from knowledge bases, and provide richer and structured evidence for more advanced information retrieval systems. This thesis research first builds query representations with entities associated with the query. Entities' descriptions are used by query expansion techniques that enrich the query with explanation terms. Then we present a general framework that represents a query with entities that appear in the query, are retrieved by the query, or frequently show up in the top retrieved documents. A latent space model is developed to jointly learn the connections from query to entities and the ranking of documents, modeling the external evidence from knowledge bases and internal ranking features cooperatively. To further improve the quality of relevant entities, a defining factor of our query representations, we introduce learning to rank to entity search and retrieve better entities from knowledge bases. In the document representation part, this thesis research also moves one step forward with a bag-of-entities model, in which documents are represented by their automatic entity annotations, and the ranking is performed in the entity space.
This proposal includes plans to improve the quality of relevant entities with a co-learning framework that learns from both entity labels and document labels. We also plan to develop a hybrid ranking system that combines word based and entity based representations together with their uncertainties considered. At last, we plan to enrich the text representations with connections between entities. We propose several ways to infer entity graph representations for texts, and to rank documents using their structure representations. This dissertation overcomes the limitation of word based representations with external and carefully curated information from knowledge bases. We believe this thesis research is a solid start towards the new generation of intelligent, semantic, and structured information retrieval.

Content

Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Language and Information Technologies. Vgl.: https%3A%2F%2Fwww.cs.cmu.edu%2F~cx%2Fpapers%2Fknowledge_based_text_representation.pdf&usg=AOvVaw0SaTSvhWLTh__Uz_HtOtl3.
Murphy, M.L.: Lexical meaning (2010) 0.11
```
0.10685414 = product of:
  0.21370828 = sum of:
    0.21370828 = sum of:
      0.1845967 = weight(_text_:word in 998) [ClassicSimilarity], result of:
        0.1845967 = score(doc=998,freq=16.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.6554078 = fieldWeight in 998, product of:
            4.0 = tf(freq=16.0), with freq of:
              16.0 = termFreq=16.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.03125 = fieldNorm(doc=998)
      0.029111583 = weight(_text_:22 in 998) [ClassicSimilarity], result of:
        0.029111583 = score(doc=998,freq=2.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.15476047 = fieldWeight in 998, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=998)
  0.5 = coord(1/2)
```
Abstract

The ideal introduction for students of semantics, Lexical Meaning fills the gap left by more general semantics textbooks, providing the teacher and the student with insights into word meaning beyond the traditional overviews of lexical relations. The book explores the relationship between word meanings and syntax and semantics more generally. It provides a balanced overview of the main theoretical approaches, along with a lucid explanation of their relative strengths and weaknesses. After covering the main topics in lexical meaning, such as polysemy and sense relations, the textbook surveys the types of meanings represented by different word classes. It explains abstract concepts in clear language, using a wide range of examples, and includes linguistic puzzles in each chapter to encourage the student to practise using the concepts. 'Adopt-a-Word' exercises give students the chance to research a particular word, building a portfolio of specialist work on a single word.

Content

Inhalt: Machine generated contents note: Part I. Meaning and the Lexicon: 1. The lexicon - some preliminaries; 2. What do we mean by meaning?; 3. Components and prototypes; 4. Modern componential approaches - and some alternatives; Part II. Relations Among Words and Senses: 5. Meaning variation: polysemy, homonymy and vagueness; 6. Lexical and semantic relations; Part III. Word Classes and Semantic Types: 7. Ontological categories and word classes; 8. Nouns and countability; 9. Predication: verbs, events, and states; 10. Verbs and time; 11. Adjectives and properties.

Date

22. 7.2013 10:53:30

Verwer, K.: Freiheit und Verantwortung bei Hans Jonas (2011) 0.09

0.08531664 = product of:
  0.17063329 = sum of:
    0.17063329 = product of:
      0.5118998 = sum of:
        0.5118998 = weight(_text_:3a in 973) [ClassicSimilarity], result of:
          0.5118998 = score(doc=973,freq=2.0), product of:
            0.4554123 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05371688 = queryNorm
            1.1240361 = fieldWeight in 973, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.09375 = fieldNorm(doc=973)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Content: Vgl.: http%3A%2F%2Fcreativechoice.org%2Fdoc%2FHansJonas.pdf&usg=AOvVaw1TM3teaYKgABL5H9yoIifA&opi=89978449.

Zitt, M.; Lelu, A.; Bassecoulard, E.: Hybrid citation-word representations in science mapping : Portolan charts of research fields? (2011) 0.08
```
0.07588121 = product of:
  0.15176243 = sum of:
    0.15176243 = sum of:
      0.11537294 = weight(_text_:word in 4130) [ClassicSimilarity], result of:
        0.11537294 = score(doc=4130,freq=4.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.40962988 = fieldWeight in 4130, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4130)
      0.036389478 = weight(_text_:22 in 4130) [ClassicSimilarity], result of:
        0.036389478 = score(doc=4130,freq=2.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.19345059 = fieldWeight in 4130, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4130)
  0.5 = coord(1/2)
```
Abstract

The mapping of scientific fields, based on principles established in the seventies, has recently shown a remarkable development and applications are now booming with progress in computing efficiency. We examine here the convergence of two thematic mapping approaches, citation-based and word-based, which rely on quite different sociological backgrounds. A corpus in the nanoscience field was broken down into research themes, using the same clustering technique on the 2 networks separately. The tool for comparison is the table of intersections of the M clusters (here M=50) built on either side. A classical visual exploitation of such contingency tables is based on correspondence analysis. We investigate a rearrangement of the intersection table (block modeling), resulting in pseudo-map. The interest of this representation for confronting the two breakdowns is discussed. The amount of convergence found is, in our view, a strong argument in favor of the reliability of bibliometric mapping. However, the outcomes are not convergent at the degree where they can be substituted for each other. Differences highlight the complementarity between approaches based on different networks. In contrast with the strong informetric posture found in recent literature, where lexical and citation markers are considered as miscible tokens, the framework proposed here does not mix the two elements at an early stage, in compliance with their contrasted logic.

Date

8. 1.2011 18:22:50
Besler, G.; Szulc, J.: Gottlob Frege's theory of definition as useful tool for knowledge organization : definition of 'context' - case study (2014) 0.08
```
0.07588121 = product of:
  0.15176243 = sum of:
    0.15176243 = sum of:
      0.11537294 = weight(_text_:word in 1440) [ClassicSimilarity], result of:
        0.11537294 = score(doc=1440,freq=4.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.40962988 = fieldWeight in 1440, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1440)
      0.036389478 = weight(_text_:22 in 1440) [ClassicSimilarity], result of:
        0.036389478 = score(doc=1440,freq=2.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.19345059 = fieldWeight in 1440, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=1440)
  0.5 = coord(1/2)
```
Abstract

The aim of this paper is to analyze the Gottlob Frege's (1848-1925) theory of definition as a tool for knowledge organization. The objective was achieved by discussing the theory of definition including: the aims of definition, kinds of definition, condition of correct definition, what is undefinable. Frege indicated the following aims of a defining: (1) to introduce a new word, which has had no precise meaning until then (2) to explain the meaning of a word; (3) to catch a thought. We would like to present three kinds of definitions used by Frege: a contextual definition, a stipulative definition and a piecemeal definition. In the history of theory of definition Frege was the first to have formulated the condition of a correct definition. According to Frege not everything can be defined, what is logically simple cannot have a proper definition Usability of Frege's theory of definition is referred in the case study. Definitions that serve as an example are definitions of 'context'. The term 'context' is used in different situations and meanings in the field of knowledge organization. The paper is rounded by a discussion of how Frege's theory of definition can be useful for knowledge organization. To present G. Frege's theory of definition in view of the need for knowledge organization we shall start with different ranges of knowledge organization.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
Jiang, Z.; Gu, Q.; Yin, Y.; Wang, J.; Chen, D.: GRAW+ : a two-view graph propagation method with word coupling for readability assessment (2019) 0.08
```
0.07588121 = product of:
  0.15176243 = sum of:
    0.15176243 = sum of:
      0.11537294 = weight(_text_:word in 5218) [ClassicSimilarity], result of:
        0.11537294 = score(doc=5218,freq=4.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.40962988 = fieldWeight in 5218, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5218)
      0.036389478 = weight(_text_:22 in 5218) [ClassicSimilarity], result of:
        0.036389478 = score(doc=5218,freq=2.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.19345059 = fieldWeight in 5218, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5218)
  0.5 = coord(1/2)
```
Abstract

Existing methods for readability assessment usually construct inductive classification models to assess the readability of singular text documents based on extracted features, which have been demonstrated to be effective. However, they rarely make use of the interrelationship among documents on readability, which can help increase the accuracy of readability assessment. In this article, we adopt a graph-based classification method to model and utilize the relationship among documents using the coupled bag-of-words model. We propose a word coupling method to build the coupled bag-of-words model by estimating the correlation between words on reading difficulty. In addition, we propose a two-view graph propagation method to make use of both the coupled bag-of-words model and the linguistic features. Our method employs a graph merging operation to combine graphs built according to different views, and improves the label propagation by incorporating the ordinal relation among reading levels. Experiments were conducted on both English and Chinese data sets, and the results demonstrate both effectiveness and potential of the method.

Date

15. 4.2019 13:46:22

Kleineberg, M.: Context analysis and context indexing : formal pragmatics in knowledge organization (2014) 0.07

0.07109721 = product of:
  0.14219442 = sum of:
    0.14219442 = product of:
      0.42658323 = sum of:
        0.42658323 = weight(_text_:3a in 1826) [ClassicSimilarity], result of:
          0.42658323 = score(doc=1826,freq=2.0), product of:
            0.4554123 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05371688 = queryNorm
            0.93669677 = fieldWeight in 1826, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.078125 = fieldNorm(doc=1826)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Source: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDQQFjAE&url=http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F3131107&ei=HzFWVYvGMsiNsgGTyoFI&usg=AFQjCNE2FHUeR9oQTQlNC4TPedv4Mo3DaQ&sig2=Rlzpr7a3BLZZkqZCXXN_IA&bvm=bv.93564037,d.bGg&cad=rja

Doorn, M. van; Polman, K.: From classification to thesaurus ... and back? : subject indexing tools at the library of the Afrika-Studiecentrum Leiden (2010) 0.07
```
0.070782274 = product of:
  0.14156455 = sum of:
    0.14156455 = sum of:
      0.09789718 = weight(_text_:word in 4062) [ClassicSimilarity], result of:
        0.09789718 = score(doc=4062,freq=2.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.34758246 = fieldWeight in 4062, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.046875 = fieldNorm(doc=4062)
      0.043667372 = weight(_text_:22 in 4062) [ClassicSimilarity], result of:
        0.043667372 = score(doc=4062,freq=2.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.23214069 = fieldWeight in 4062, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=4062)
  0.5 = coord(1/2)
```
Abstract

An African Studies Thesaurus was constructed for the purpose of subject indexing and retrieval in the Library of the African Studies Centre (ASC) in Leiden in 2001-2006. A word-based system was considered a more user-friendly alternative to the Universal Decimal Classification (UDC) codes which were used for subject access in the ASC catalogue at the time. In the process of thesaurus construction UDC codes were used as a starting point. In addition, when constructing the thesaurus, each descriptor was also assigned a UDC code from the recent edition of the UDC Master Reference File (MRF), thus replacing many of the old UDC codes used by then, some of which dated from the 1952 French edition. The presence of the UDC codes in the thesaurus leaves open the possibility of linking the thesaurus to different language versions of the UDC MRF in the future. In a parallel but separate operation each UDC code which had been assigned to an item in the library's catalogue was subsequently converted into one or more thesaurus descriptors.

Date

22. 7.2010 19:48:33
Zhu, Q.; Kong, X.; Hong, S.; Li, J.; He, Z.: Global ontology research progress : a bibliometric analysis (2015) 0.07
```
0.066521734 = product of:
  0.13304347 = sum of:
    0.13304347 = sum of:
      0.08158098 = weight(_text_:word in 2590) [ClassicSimilarity], result of:
        0.08158098 = score(doc=2590,freq=2.0), product of:
          0.28165168 = queryWeight, product of:
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.05371688 = queryNorm
          0.28965205 = fieldWeight in 2590, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.2432623 = idf(docFreq=634, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2590)
      0.05146249 = weight(_text_:22 in 2590) [ClassicSimilarity], result of:
        0.05146249 = score(doc=2590,freq=4.0), product of:
          0.18810736 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05371688 = queryNorm
          0.27358043 = fieldWeight in 2590, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2590)
  0.5 = coord(1/2)
```
Abstract

Purpose - The purpose of this paper is to analyse the global scientific outputs of ontology research, an important emerging discipline that has huge potential to improve information understanding, organization, and management. Design/methodology/approach - This study collected literature published during 1900-2012 from the Web of Science database. The bibliometric analysis was performed from authorial, institutional, national, spatiotemporal, and topical aspects. Basic statistical analysis, visualization of geographic distribution, co-word analysis, and a new index were applied to the selected data. Findings - Characteristics of publication outputs suggested that ontology research has entered into the soaring stage, along with increased participation and collaboration. The authors identified the leading authors, institutions, nations, and articles in ontology research. Authors were more from North America, Europe, and East Asia. The USA took the lead, while China grew fastest. Four major categories of frequently used keywords were identified: applications in Semantic Web, applications in bioinformatics, philosophy theories, and common supporting technology. Semantic Web research played a core role, and gene ontology study was well-developed. The study focus of ontology has shifted from philosophy to information science. Originality/value - This is the first study to quantify global research patterns and trends in ontology, which might provide a potential guide for the future research. The new index provides an alternative way to evaluate the multidisciplinary influence of researchers.

Date

20. 1.2015 18:30:22
17. 9.2018 18:22:23
Leginus, M.; Zhai, C.X.; Dolog, P.: Personalized generation of word clouds from tweets (2016) 0.06
```
0.06475291 = product of:
  0.12950581 = sum of:
    0.12950581 = product of:
      0.25901163 = sum of:
        0.25901163 = weight(_text_:word in 2886) [ClassicSimilarity], result of:
          0.25901163 = score(doc=2886,freq=14.0), product of:
            0.28165168 = queryWeight, product of:
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.05371688 = queryNorm
            0.9196168 = fieldWeight in 2886, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.046875 = fieldNorm(doc=2886)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Active users of Twitter are often overwhelmed with the vast amount of tweets. In this work we attempt to help users browsing a large number of accumulated posts. We propose a personalized word cloud generation as a means for users' navigation. Various user past activities such as user published tweets, retweets, and seen but not retweeted tweets are leveraged for enhanced personalization of word clouds. The best personalization results are attained with user past retweets. However, users' own past tweets are not as useful as retweets for personalization. Negative preferences derived from seen but not retweeted tweets further enhance personalized word cloud generation. The ranking combination method outperforms the preranking approach and provides a general framework for combined ranking of various user past information for enhanced word cloud generation. To better capture subtle differences of generated word clouds, we propose an evaluation of word clouds with a mean average precision measure.
Rauber, A.: Digital preservation in data-driven science : on the importance of process capture, preservation and validation (2012) 0.06
```
0.058228545 = product of:
  0.11645709 = sum of:
    0.11645709 = product of:
      0.34937125 = sum of:
        0.34937125 = weight(_text_:object's in 469) [ClassicSimilarity], result of:
          0.34937125 = score(doc=469,freq=2.0), product of:
            0.53207254 = queryWeight, product of:
              9.905128 = idf(docFreq=5, maxDocs=44218)
              0.05371688 = queryNorm
            0.65662336 = fieldWeight in 469, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              9.905128 = idf(docFreq=5, maxDocs=44218)
              0.046875 = fieldNorm(doc=469)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)
```
Abstract

Current digital preservation is strongly biased towards data objects: digital files of document-style objects, or encapsulated and largely self-contained objects. To provide authenticity and provenance information, comprehensive metadata models are deployed to document information on an object's context. Yet, we claim that simply documenting an objects context may not be sufficient to ensure proper provenance and to fulfill the stated preservation goals. Specifically in e-Science and business settings, capturing, documenting and preserving entire processes may be necessary to meet the preservation goals. We thus present an approach for capturing, documenting and preserving processes, and means to assess their authenticity upon re-execution. We will discuss options as well as limitations and open challenges to achieve sound preservation, speci?cally within scientific processes.
Leydesdorff, L.; Nerghes, A.: Co-word maps and topic modeling : a comparison using small and medium-sized corpora (N?<?1.000) (2017) 0.05
```
0.049957946 = product of:
  0.09991589 = sum of:
    0.09991589 = product of:
      0.19983178 = sum of:
        0.19983178 = weight(_text_:word in 3538) [ClassicSimilarity], result of:
          0.19983178 = score(doc=3538,freq=12.0), product of:
            0.28165168 = queryWeight, product of:
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.05371688 = queryNorm
            0.7094997 = fieldWeight in 3538, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3538)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Induced by "big data," "topic modeling" has become an attractive alternative to mapping co-words in terms of co-occurrences and co-absences using network techniques. Does topic modeling provide an alternative for co-word mapping in research practices using moderately sized document collections? We return to the word/document matrix using first a single text with a strong argument ("The Leiden Manifesto") and then upscale to a sample of moderate size (n?=?687) to study the pros and cons of the two approaches in terms of the resulting possibilities for making semantic maps that can serve an argument. The results from co-word mapping (using two different routines) versus topic modeling are significantly uncorrelated. Whereas components in the co-word maps can easily be designated, the topic models provide sets of words that are very differently organized. In these samples, the topic models seem to reveal similarities other than semantic ones (e.g., linguistic ones). In other words, topic modeling does not replace co-word mapping in small and medium-sized sets; but the paper leaves open the possibility that topic modeling would work well for the semantic mapping of large sets.
Doval, Y.; Gómez-Rodríguez, C.: Comparing neural- and N-gram-based language models for word segmentation (2019) 0.05
```
0.049957946 = product of:
  0.09991589 = sum of:
    0.09991589 = product of:
      0.19983178 = sum of:
        0.19983178 = weight(_text_:word in 4675) [ClassicSimilarity], result of:
          0.19983178 = score(doc=4675,freq=12.0), product of:
            0.28165168 = queryWeight, product of:
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.05371688 = queryNorm
            0.7094997 = fieldWeight in 4675, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4675)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Word segmentation is the task of inserting or deleting word boundary characters in order to separate character sequences that correspond to words in some language. In this article we propose an approach based on a beam search algorithm and a language model working at the byte/character level, the latter component implemented either as an n-gram model or a recurrent neural network. The resulting system analyzes the text input with no word boundaries one token at a time, which can be a character or a byte, and uses the information gathered by the language model to determine if a boundary must be placed in the current position or not. Our aim is to use this system in a preprocessing step for a microtext normalization system. This means that it needs to effectively cope with the data sparsity present on this kind of texts. We also strove to surpass the performance of two readily available word segmentation systems: The well-known and accessible Word Breaker by Microsoft, and the Python module WordSegment by Grant Jenks. The results show that we have met our objectives, and we hope to continue to improve both the precision and the efficiency of our system in the future.

Gödert, W.; Lepsky, K.: Informationelle Kompetenz : ein humanistischer Entwurf (2019) 0.05

0.049768046 = product of:
  0.09953609 = sum of:
    0.09953609 = product of:
      0.29860827 = sum of:
        0.29860827 = weight(_text_:3a in 5955) [ClassicSimilarity], result of:
          0.29860827 = score(doc=5955,freq=2.0), product of:
            0.4554123 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.05371688 = queryNorm
            0.65568775 = fieldWeight in 5955, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5955)
      0.33333334 = coord(1/3)
  0.5 = coord(1/2)

Footnote: Rez. in: Philosophisch-ethische Rezensionen vom 09.11.2019 (Jürgen Czogalla), Unter: https://philosophisch-ethische-rezensionen.de/rezension/Goedert1.html. In: B.I.T. online 23(2020) H.3, S.345-347 (W. Sühl-Strohmenger) [Unter: https%3A%2F%2Fwww.b-i-t-online.de%2Fheft%2F2020-03-rezensionen.pdf&usg=AOvVaw0iY3f_zNcvEjeZ6inHVnOK]. In: Open Password Nr. 805 vom 14.08.2020 (H.-C. Hobohm) [Unter: https://www.password-online.de/?mailpoet_router&endpoint=view_in_browser&action=view&data=WzE0MywiOGI3NjZkZmNkZjQ1IiwwLDAsMTMxLDFd].

Eversberg, B.: Allegro-Fortbildung 2015 (2015) 0.05

0.04894859 = product of:
  0.09789718 = sum of:
    0.09789718 = product of:
      0.19579436 = sum of:
        0.19579436 = weight(_text_:word in 1123) [ClassicSimilarity], result of:
          0.19579436 = score(doc=1123,freq=2.0), product of:
            0.28165168 = queryWeight, product of:
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.05371688 = queryNorm
            0.6951649 = fieldWeight in 1123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.09375 = fieldNorm(doc=1123)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Content: Auch als Word-Ausgabe unter: http://www.allegro-c.de/fb/fb15.docx.

Ding, W.; Chen, C.: Dynamic topic detection and tracking : a comparison of HDP, C-word, and cocitation methods (2014) 0.05
```
0.04894859 = product of:
  0.09789718 = sum of:
    0.09789718 = product of:
      0.19579436 = sum of:
        0.19579436 = weight(_text_:word in 1502) [ClassicSimilarity], result of:
          0.19579436 = score(doc=1502,freq=8.0), product of:
            0.28165168 = queryWeight, product of:
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.05371688 = queryNorm
            0.6951649 = fieldWeight in 1502, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.046875 = fieldNorm(doc=1502)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Cocitation and co-word methods have long been used to detect and track emerging topics in scientific literature, but both have weaknesses. Recently, while many researchers have adopted generative probabilistic models for topic detection and tracking, few have compared generative probabilistic models with traditional cocitation and co-word methods in terms of their overall performance. In this article, we compare the performance of hierarchical Dirichlet process (HDP), a promising generative probabilistic model, with that of the 2 traditional topic detecting and tracking methods-cocitation analysis and co-word analysis. We visualize and explore the relationships between topics identified by the 3 methods in hierarchical edge bundling graphs and time flow graphs. Our result shows that HDP is more sensitive and reliable than the other 2 methods in both detecting and tracking emerging topics. Furthermore, we demonstrate the important topics and topic evolution trends in the literature of terrorism research with the HDP method.
Xinglin, L.: Automatic summarization method based on compound word recognition (2015) 0.05
```
0.04894859 = product of:
  0.09789718 = sum of:
    0.09789718 = product of:
      0.19579436 = sum of:
        0.19579436 = weight(_text_:word in 1841) [ClassicSimilarity], result of:
          0.19579436 = score(doc=1841,freq=8.0), product of:
            0.28165168 = queryWeight, product of:
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.05371688 = queryNorm
            0.6951649 = fieldWeight in 1841, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.046875 = fieldNorm(doc=1841)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

After analyzing main methods of automatic summarization today, we find they all ignore the weight of unknown words in the sentence. In order to overcome this problem, a method for automatic summarization based on compound word recognition is proposed. According to this method, the compound word in the text was identified and the segmentation word was corrected at first. Then, keyword set was extracted from Chinese documents and the sentence weights were calculated according to the weights of the keyword set. Because the weight of compound words was calculated by different weight calculation formula, the corresponding total weight of each sentence will be determined. Finally, sentences with higher weight which will be outputted to make up the summarization sentences by original order were selected by percentage. Experiments were conducted on HIT IR-lab Text Summarization Corpus, the results show that the precision can be achieved 76.51% by the proposed method, and we can conclude that the method is applicable for automatic summarization and the effect is good.

Borchers, D.: ¬Eine kleine Geschichte der Textverarbeitung (2019) 0.05

0.04894859 = product of:
  0.09789718 = sum of:
    0.09789718 = product of:
      0.19579436 = sum of:
        0.19579436 = weight(_text_:word in 5422) [ClassicSimilarity], result of:
          0.19579436 = score(doc=5422,freq=2.0), product of:
            0.28165168 = queryWeight, product of:
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.05371688 = queryNorm
            0.6951649 = fieldWeight in 5422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.2432623 = idf(docFreq=634, maxDocs=44218)
              0.09375 = fieldNorm(doc=5422)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Abstract: Vor fast 70 Jahren begann die Ablösung der Schreibmaschine durch den ersten Word Processor. Den Begriff dachte sich ein ehemaliger Jagdflieger aus.

Search (785 results, page 1 of 40)

Authors

Languages

Types

Themes

Subjects

Classifications