Search (126 results, page 1 of 7)

  • × year_i:[2020 TO 2030}
  1. Lee, Y.-Y.; Ke, H.; Yen, T.-Y.; Huang, H.-H.; Chen, H.-H.: Combining and learning word embedding with WordNet for semantic relatedness and similarity measurement (2020) 0.05
    0.04894859 = product of:
      0.09789718 = sum of:
        0.09789718 = product of:
          0.19579436 = sum of:
            0.19579436 = weight(_text_:word in 5871) [ClassicSimilarity], result of:
              0.19579436 = score(doc=5871,freq=8.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6951649 = fieldWeight in 5871, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5871)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    In this research, we propose 3 different approaches to measure the semantic relatedness between 2 words: (i) boost the performance of GloVe word embedding model via removing or transforming abnormal dimensions; (ii) linearly combine the information extracted from WordNet and word embeddings; and (iii) utilize word embedding and 12 linguistic information extracted from WordNet as features for Support Vector Regression. We conducted our experiments on 8 benchmark data sets, and computed Spearman correlations between the outputs of our methods and the ground truth. We report our results together with 3 state-of-the-art approaches. The experimental results show that our method can outperform state-of-the-art approaches in all the selected English benchmark data sets.
  2. Alipour, O.; Soheili, F.; Khasseh, A.A.: ¬A co-word analysis of global research on knowledge organization: 1900-2019 (2022) 0.05
    0.04894859 = product of:
      0.09789718 = sum of:
        0.09789718 = product of:
          0.19579436 = sum of:
            0.19579436 = weight(_text_:word in 1106) [ClassicSimilarity], result of:
              0.19579436 = score(doc=1106,freq=18.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6951649 = fieldWeight in 1106, product of:
                  4.2426405 = tf(freq=18.0), with freq of:
                    18.0 = termFreq=18.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1106)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The study's objective is to analyze the structure of knowledge organization studies conducted worldwide. This applied research has been conducted with a scientometrics approach using the co-word analysis. The research records consisted of all articles published in the journals of Knowledge Organization and Cataloging & Classification Quarterly and keywords related to the field of knowledge organization indexed in Web of Science from 1900 to 2019, in which 17,950 records were analyzed entirely with plain text format. The total number of keywords was 25,480, which was reduced to 12,478 keywords after modifications and removal of duplicates. Then, 115 keywords with a frequency of at least 18 were included in the final analysis, and finally, the co-word network was drawn. BibExcel, UCINET, VOSviewer, and SPSS software were used to draw matrices, analyze co-word networks, and draw dendrograms. Furthermore, strategic diagrams were drawn using Excel software. The keywords "information retrieval," "classification," and "ontology" are among the most frequently used keywords in knowledge organization articles. Findings revealed that "Ontology*Semantic Web", "Digital Library*Information Retrieval" and "Indexing*Information Retrieval" are highly frequent co-word pairs, respectively. The results of hierarchical clustering indicated that the global research on knowledge organization consists of eight main thematic clusters; the largest is specified for the topic of "classification, indexing, and information retrieval." The smallest clusters deal with the topics of "data processing" and "theoretical concepts of information and knowledge organization" respectively. Cluster 1 (cataloging standards and knowledge organization) has the highest density, while Cluster 5 (classification, indexing, and information retrieval) has the highest centrality. According to the findings of this research, the keyword "information retrieval" has played a significant role in knowledge organization studies, both as a keyword and co-word pair. In the co-word section, there is a type of related or general topic relationship between co-word pairs. Results indicated that information retrieval is one of the main topics in knowledge organization, while the theoretical concepts of knowledge organization have been neglected. In general, the co-word structure of knowledge organization research indicates the multiplicity of global concepts and topics studied in this field globally.
  3. Soni, S.; Lerman, K.; Eisenstein, J.: Follow the leader : documents on the leading edge of semantic change get more citations (2021) 0.05
    0.04560516 = product of:
      0.09121032 = sum of:
        0.09121032 = product of:
          0.18242064 = sum of:
            0.18242064 = weight(_text_:word in 169) [ClassicSimilarity], result of:
              0.18242064 = score(doc=169,freq=10.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.6476817 = fieldWeight in 169, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=169)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Diachronic word embeddings-vector representations of words over time-offer remarkable insights into the evolution of language and provide a tool for quantifying sociocultural change from text documents. Prior work has used such embeddings to identify shifts in the meaning of individual words. However, simply knowing that a word has changed in meaning is insufficient to identify the instances of word usage that convey the historical meaning or the newer meaning. In this study, we link diachronic word embeddings to documents, by situating those documents as leaders or laggards with respect to ongoing semantic changes. Specifically, we propose a novel method to quantify the degree of semantic progressiveness in each word usage, and then show how these usages can be aggregated to obtain scores for each document. We analyze two large collections of documents, representing legal opinions and scientific articles. Documents that are scored as semantically progressive receive a larger number of citations, indicating that they are especially influential. Our work thus provides a new technique for identifying lexical semantic leaders and demonstrates a new link between progressive use of language and influence in a citation network.
  4. Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.04
    0.04265832 = product of:
      0.08531664 = sum of:
        0.08531664 = product of:
          0.2559499 = sum of:
            0.2559499 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
              0.2559499 = score(doc=862,freq=2.0), product of:
                0.4554123 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05371688 = queryNorm
                0.56201804 = fieldWeight in 862, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=862)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Source
    https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN
  5. Dietz, K.: en.wikipedia.org > 6 Mio. Artikel (2020) 0.04
    0.035548605 = product of:
      0.07109721 = sum of:
        0.07109721 = product of:
          0.21329162 = sum of:
            0.21329162 = weight(_text_:3a in 5669) [ClassicSimilarity], result of:
              0.21329162 = score(doc=5669,freq=2.0), product of:
                0.4554123 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05371688 = queryNorm
                0.46834838 = fieldWeight in 5669, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5669)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Content
    "Die Englischsprachige Wikipedia verfügt jetzt über mehr als 6 Millionen Artikel. An zweiter Stelle kommt die deutschsprachige Wikipedia mit 2.3 Millionen Artikeln, an dritter Stelle steht die französischsprachige Wikipedia mit 2.1 Millionen Artikeln (via Researchbuzz: Firehose <https://rbfirehose.com/2020/01/24/techcrunch-wikipedia-now-has-more-than-6-million-articles-in-english/> und Techcrunch <https://techcrunch.com/2020/01/23/wikipedia-english-six-million-articles/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&guccounter=1&guce_referrer=aHR0cHM6Ly9yYmZpcmVob3NlLmNvbS8yMDIwLzAxLzI0L3RlY2hjcnVuY2gtd2lraXBlZGlhLW5vdy1oYXMtbW9yZS10aGFuLTYtbWlsbGlvbi1hcnRpY2xlcy1pbi1lbmdsaXNoLw&guce_referrer_sig=AQAAAK0zHfjdDZ_spFZBF_z-zDjtL5iWvuKDumFTzm4HvQzkUfE2pLXQzGS6FGB_y-VISdMEsUSvkNsg2U_NWQ4lwWSvOo3jvXo1I3GtgHpP8exukVxYAnn5mJspqX50VHIWFADHhs5AerkRn3hMRtf_R3F1qmEbo8EROZXp328HMC-o>). 250120 via digithek ch = #fineBlog s.a.: Angesichts der Veröffentlichung des 6-millionsten Artikels vergangene Woche in der englischsprachigen Wikipedia hat die Community-Zeitungsseite "Wikipedia Signpost" ein Moratorium bei der Veröffentlichung von Unternehmensartikeln gefordert. Das sei kein Vorwurf gegen die Wikimedia Foundation, aber die derzeitigen Maßnahmen, um die Enzyklopädie gegen missbräuchliches undeklariertes Paid Editing zu schützen, funktionierten ganz klar nicht. *"Da die ehrenamtlichen Autoren derzeit von Werbung in Gestalt von Wikipedia-Artikeln überwältigt werden, und da die WMF nicht in der Lage zu sein scheint, dem irgendetwas entgegenzusetzen, wäre der einzige gangbare Weg für die Autoren, fürs erste die Neuanlage von Artikeln über Unternehmen zu untersagen"*, schreibt der Benutzer Smallbones in seinem Editorial <https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-01-27/From_the_editor> zur heutigen Ausgabe."
  6. Gabler, S.: Vergabe von DDC-Sachgruppen mittels eines Schlagwort-Thesaurus (2021) 0.04
    0.035548605 = product of:
      0.07109721 = sum of:
        0.07109721 = product of:
          0.21329162 = sum of:
            0.21329162 = weight(_text_:3a in 1000) [ClassicSimilarity], result of:
              0.21329162 = score(doc=1000,freq=2.0), product of:
                0.4554123 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.05371688 = queryNorm
                0.46834838 = fieldWeight in 1000, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1000)
          0.33333334 = coord(1/3)
      0.5 = coord(1/2)
    
    Content
    Master thesis Master of Science (Library and Information Studies) (MSc), Universität Wien. Advisor: Christoph Steiner. Vgl.: https://www.researchgate.net/publication/371680244_Vergabe_von_DDC-Sachgruppen_mittels_eines_Schlagwort-Thesaurus. DOI: 10.25365/thesis.70030. Vgl. dazu die Präsentation unter: https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source=web&cd=&ved=0CAIQw7AJahcKEwjwoZzzytz_AhUAAAAAHQAAAAAQAg&url=https%3A%2F%2Fwiki.dnb.de%2Fdownload%2Fattachments%2F252121510%2FDA3%2520Workshop-Gabler.pdf%3Fversion%3D1%26modificationDate%3D1671093170000%26api%3Dv2&psig=AOvVaw0szwENK1or3HevgvIDOfjx&ust=1687719410889597&opi=89978449.
  7. Lee, G.E.; Sun, A.: Understanding the stability of medical concept embeddings (2021) 0.04
    0.035325605 = product of:
      0.07065121 = sum of:
        0.07065121 = product of:
          0.14130242 = sum of:
            0.14130242 = weight(_text_:word in 159) [ClassicSimilarity], result of:
              0.14130242 = score(doc=159,freq=6.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.5016921 = fieldWeight in 159, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=159)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Frequency is one of the major factors for training quality word embeddings. Several studies have recently discussed the stability of word embeddings in general domain and suggested factors influencing the stability. In this work, we conduct a detailed analysis on the stability of concept embeddings in medical domain, particularly in relations with concept frequency. The analysis reveals the surprising high stability of low-frequency concepts: low-frequency (<100) concepts have the same high stability as high-frequency (>1,000) concepts. To develop a deeper understanding of this finding, we propose a new factor, the noisiness of context words, which influences the stability of medical concept embeddings regardless of high or low frequency. We evaluate the proposed factor by showing the linear correlation with the stability of medical concept embeddings. The correlations are clear and consistent with various groups of medical concepts. Based on the linear relations, we make suggestions on ways to adjust the noisiness of context words for the improvement of stability. Finally, we demonstrate that the linear relation of the proposed factor extends to the word embedding stability in general domain.
  8. Fan, W.-M.; Jeng, W.; Tang, M.-C.: Using data citation to define a knowledge domain : a case study of the Add-Health dataset (2023) 0.03
    0.028843235 = product of:
      0.05768647 = sum of:
        0.05768647 = product of:
          0.11537294 = sum of:
            0.11537294 = weight(_text_:word in 844) [ClassicSimilarity], result of:
              0.11537294 = score(doc=844,freq=4.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.40962988 = fieldWeight in 844, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=844)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    To date, most studies in scientometric map and track the main topics in a knowledge domain by measuring publications in core journals or keyword searches in databases. The present study instead proposes a novel metrics in which a knowledge domain is mapped and tracked via articles that cite the same openly accessible dataset. We retrieved 1,537 journal articles citing the National Longitudinal Study of Adolescent to Adult Health (Add-Health) as the basis for an investigation of the major research topics associated with this dataset and how they evolved over time. To identify the primary research interests associated with the dataset, co-word network modularity analysis was used. Another novel aspect of this study is that it juxtaposes the research topics identified by the co-word approach with those generated by topic modeling: an approach that complements network modularity analysis, and allows for cross-referencing between the results of these two methods. Keyness analysis was also performed to identify significant keywords in different time periods, which enables tracing of research interests in Add-Health as they evolve. The methodological implications of using data citation as the basis for delineating a knowledge domain and techniques for its mapping are also discussed.
  9. Ali, C.B.; Haddad, H.; Slimani, Y.: Multi-word terms selection for information retrieval (2022) 0.03
    0.028843235 = product of:
      0.05768647 = sum of:
        0.05768647 = product of:
          0.11537294 = sum of:
            0.11537294 = weight(_text_:word in 900) [ClassicSimilarity], result of:
              0.11537294 = score(doc=900,freq=4.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.40962988 = fieldWeight in 900, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=900)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Purpose A number of approaches and algorithms have been proposed over the years as a basis for automatic indexing. Many of these approaches suffer from precision inefficiency at low recall. The choice of indexing units has a great impact on search system effectiveness. The authors dive beyond simple terms indexing to propose a framework for multi-word terms (MWT) filtering and indexing. Design/methodology/approach In this paper, the authors rely on ranking MWT to filter them, keeping the most effective ones for the indexing process. The proposed model is based on filtering MWT according to their ability to capture the document topic and distinguish between different documents from the same collection. The authors rely on the hypothesis that the best MWT are those that achieve the greatest association degree. The experiments are carried out with English and French languages data sets. Findings The results indicate that this approach achieved precision enhancements at low recall, and it performed better than more advanced models based on terms dependencies. Originality/value Using and testing different association measures to select MWT that best describe the documents to enhance the precision in the first retrieved documents.
  10. Safder, I.; Ali, M.; Aljohani, N.R.; Nawaz, R.; Hassan, S.-U.: Neural machine translation for in-text citation classification (2023) 0.03
    0.028843235 = product of:
      0.05768647 = sum of:
        0.05768647 = product of:
          0.11537294 = sum of:
            0.11537294 = weight(_text_:word in 1053) [ClassicSimilarity], result of:
              0.11537294 = score(doc=1053,freq=4.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.40962988 = fieldWeight in 1053, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1053)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The quality of scientific publications can be measured by quantitative indices such as the h-index, Source Normalized Impact per Paper, or g-index. However, these measures lack to explain the function or reasons for citations and the context of citations from citing publication to cited publication. We argue that citation context may be considered while calculating the impact of research work. However, mining citation context from unstructured full-text publications is a challenging task. In this paper, we compiled a data set comprising 9,518 citations context. We developed a deep learning-based architecture for citation context classification. Unlike feature-based state-of-the-art models, our proposed focal-loss and class-weight-aware BiLSTM model with pretrained GloVe embedding vectors use citation context as input to outperform them in multiclass citation context classification tasks. Our model improves on the baseline state-of-the-art by achieving an F1 score of 0.80 with an accuracy of 0.81 for citation context classification. Moreover, we delve into the effects of using different word embeddings on the performance of the classification model and draw a comparison between fastText, GloVe, and spaCy pretrained word embeddings.
  11. Zhang, M.; Zhang, Y.: Professional organizations in Twittersphere : an empirical study of U.S. library and information science professional organizations-related Tweets (2020) 0.03
    0.028553344 = product of:
      0.05710669 = sum of:
        0.05710669 = product of:
          0.11421338 = sum of:
            0.11421338 = weight(_text_:word in 5775) [ClassicSimilarity], result of:
              0.11421338 = score(doc=5775,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.40551287 = fieldWeight in 5775, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5775)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Twitter is utilized by many, including professional businesses and organizations; however, there are very few studies on how other entities interact with these organizations in the Twittersphere. This article presents a study that investigates tweets related to 5 major library and information science (LIS) professional organizations in the United States. This study applies a systematic tweets analysis framework, including descriptive analytics, network analytics, and co-word analysis of hashtags. The findings shed light on user engagement with LIS professional organizations and the trending discussion topics on Twitter, which is valuable for enabling more successful social media use and greater influence.
  12. ¬Der Student aus dem Computer (2023) 0.03
    0.025472634 = product of:
      0.050945267 = sum of:
        0.050945267 = product of:
          0.101890534 = sum of:
            0.101890534 = weight(_text_:22 in 1079) [ClassicSimilarity], result of:
              0.101890534 = score(doc=1079,freq=2.0), product of:
                0.18810736 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05371688 = queryNorm
                0.5416616 = fieldWeight in 1079, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=1079)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    27. 1.2023 16:22:55
  13. Wei, W.; Liu, Y.-P.; Wei, L-R.: Feature-level sentiment analysis based on rules and fine-grained domain ontology (2020) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 5876) [ClassicSimilarity], result of:
              0.09789718 = score(doc=5876,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 5876, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5876)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Mining product reviews and sentiment analysis are of great significance, whether for academic research purposes or optimizing business strategies. We propose a feature-level sentiment analysis framework based on rules parsing and fine-grained domain ontology for Chinese reviews. Fine-grained ontology is used to describe synonymous expressions of product features, which are reflected in word changes in online reviews. First, a semiautomatic construction method is developed by using Word2Vec for fine-grained ontology. Then, featurelevel sentiment analysis that combines rules parsing and the fine-grained domain ontology is conducted to extract explicit and implicit features from product reviews. Finally, the domain sentiment dictionary and context sentiment dictionary are established to identify sentiment polarities for the extracted feature-sentiment combinations. An experiment is conducted on the basis of product reviews crawled from Chinese e-commerce websites. The results demonstrate the effectiveness of our approach.
  14. Organisciak, P.; Schmidt, B.M.; Downie, J.S.: Giving shape to large digital libraries through exploratory data analysis (2022) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 473) [ClassicSimilarity], result of:
              0.09789718 = score(doc=473,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 473, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=473)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The emergence of large multi-institutional digital libraries has opened the door to aggregate-level examinations of the published word. Such large-scale analysis offers a new way to pursue traditional problems in the humanities and social sciences, using digital methods to ask routine questions of large corpora. However, inquiry into multiple centuries of books is constrained by the burdens of scale, where statistical inference is technically complex and limited by hurdles to access and flexibility. This work examines the role that exploratory data analysis and visualization tools may play in understanding large bibliographic datasets. We present one such tool, HathiTrust+Bookworm, which allows multifaceted exploration of the multimillion work HathiTrust Digital Library, and center it in the broader space of scholarly tools for exploratory data analysis.
  15. Berg, A.; Nelimarkka, M.: Do you see what I see? : measuring the semantic differences in image-recognition services' outputs (2023) 0.02
    0.024474295 = product of:
      0.04894859 = sum of:
        0.04894859 = product of:
          0.09789718 = sum of:
            0.09789718 = weight(_text_:word in 1070) [ClassicSimilarity], result of:
              0.09789718 = score(doc=1070,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.34758246 = fieldWeight in 1070, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1070)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    As scholars increasingly undertake large-scale analysis of visual materials, advanced computational tools show promise for informing that process. One technique in the toolbox is image recognition, made readily accessible via Google Vision AI, Microsoft Azure Computer Vision, and Amazon's Rekognition service. However, concerns about such issues as bias factors and low reliability have led to warnings against research employing it. A systematic study of cross-service label agreement concretized such issues: using eight datasets, spanning professionally produced and user-generated images, the work showed that image-recognition services disagree on the most suitable labels for images. Beyond supporting caveats expressed in prior literature, the report articulates two mitigation strategies, both involving the use of multiple image-recognition services: Highly explorative research could include all the labels, accepting noisier but less restrictive analysis output. Alternatively, scholars may employ word-embedding-based approaches to identify concepts that are similar enough for their purposes, then focus on those labels filtered in.
  16. Pepper, S.: ¬The typology and semantics of binominal lexemes : noun-noun compounds and their functional equivalents (2020) 0.02
    0.023074588 = product of:
      0.046149176 = sum of:
        0.046149176 = product of:
          0.09229835 = sum of:
            0.09229835 = weight(_text_:word in 104) [ClassicSimilarity], result of:
              0.09229835 = score(doc=104,freq=4.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.3277039 = fieldWeight in 104, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.03125 = fieldNorm(doc=104)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The dissertation establishes 'binominal lexeme' as a comparative concept and discusses its cross-linguistic typology and semantics. Informally, a binominal lexeme is a noun-noun compound or functional equivalent; more precisely, it is a lexical item that consists primarily of two thing-morphs between which there exists an unstated semantic relation. Examples of binominals include Mandarin Chinese ?? (tielù) [iron road], French chemin de fer [way of iron] and Russian ???????? ?????? (zeleznaja doroga) [iron:adjz road]. All of these combine a word denoting 'iron' and a word denoting 'road' or 'way' to denote the meaning railway. In each case, the unstated semantic relation is one of composition: a railway is conceptualized as a road that is composed (or made) of iron. However, three different morphosyntactic strategies are employed: compounding, prepositional phrase and relational adjective. This study explores the range of such strategies used by a worldwide sample of 106 languages to express a set of 100 meanings from various semantic domains, resulting in a classification consisting of nine different morphosyntactic types. The semantic relations found in the data are also explored and a classification called the Hatcher-Bourque system is developed that operates at two levels of granularity, together with a tool for classifying binominals, the Bourquifier. The classification is extended to other subfields of language, including metonymy and lexical semantics, and beyond language to the domain of knowledge representation, resulting in a proposal for a general model of associative relations called the PHAB model. The many findings of the research include universals concerning the recruitment of anchoring nominal modification strategies, a method for comparing non-binary typologies, the non-universality (despite its predominance) of compounding, and a scale of frequencies for semantic relations which may provide insights into the associative nature of human thought.
  17. Jaeger, L.: Wissenschaftler versus Wissenschaft (2020) 0.02
    0.021833686 = product of:
      0.043667372 = sum of:
        0.043667372 = product of:
          0.087334745 = sum of:
            0.087334745 = weight(_text_:22 in 4156) [ClassicSimilarity], result of:
              0.087334745 = score(doc=4156,freq=2.0), product of:
                0.18810736 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05371688 = queryNorm
                0.46428138 = fieldWeight in 4156, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4156)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    2. 3.2020 14:08:22
  18. Ibrahim, G.M.; Taylor, M.: Krebszellen manipulieren Neurone : Gliome (2023) 0.02
    0.021833686 = product of:
      0.043667372 = sum of:
        0.043667372 = product of:
          0.087334745 = sum of:
            0.087334745 = weight(_text_:22 in 1203) [ClassicSimilarity], result of:
              0.087334745 = score(doc=1203,freq=2.0), product of:
                0.18810736 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05371688 = queryNorm
                0.46428138 = fieldWeight in 1203, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1203)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Spektrum der Wissenschaft. 2023, H.10, S.22-24
  19. Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.02
    0.020395245 = product of:
      0.04079049 = sum of:
        0.04079049 = product of:
          0.08158098 = sum of:
            0.08158098 = weight(_text_:word in 5816) [ClassicSimilarity], result of:
              0.08158098 = score(doc=5816,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.28965205 = fieldWeight in 5816, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5816)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.
  20. Haggar, E.: Fighting fake news : exploring George Orwell's relationship to information literacy (2020) 0.02
    0.020395245 = product of:
      0.04079049 = sum of:
        0.04079049 = product of:
          0.08158098 = sum of:
            0.08158098 = weight(_text_:word in 5978) [ClassicSimilarity], result of:
              0.08158098 = score(doc=5978,freq=2.0), product of:
                0.28165168 = queryWeight, product of:
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.05371688 = queryNorm
                0.28965205 = fieldWeight in 5978, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.2432623 = idf(docFreq=634, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5978)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The purpose of this paper is to analyse George Orwell's diaries through an information literacy lens. Orwell is well known for his dedication to freedom of speech and objective truth, and his novel Nineteen Eighty-Four is often used as a lens through which to view the fake news phenomenon. This paper will examine Orwell's diaries in relation to UNESCO's Five Laws of Media and Information Literacy to examine how information literacy concepts can be traced in historical documents. Design/methodology/approach This paper will use a content analysis method to explore Orwell's relationship to information literacy. Two of Orwell's political diaries from the period 1940-42 were coded for key themes related to the ways in which Orwell discusses and evaluates information and news. These themes were then compared to UNESCO Five Laws of Media and Information Literacy. Textual analysis software NVivo 12 was used to perform keyword searches and word frequency queries in the digitised diaries. Findings The findings show that while Orwell's diaries and the Five Laws did not share terminology, they did share ideas on bias and access to information. They also extend the history of information literacy research and practice by illustrating how concerns about the need to evaluate information sources are represented within historical literature. Originality/value This paper combines historical research with textual analysis to bring a unique historical perspective to information literacy, demonstrating that "fake news" is not a recent phenomenon, and that the tools to fight it may also lie in historical research.

Languages

  • e 97
  • d 29

Types

  • a 119
  • el 20
  • m 2
  • p 2
  • x 2
  • More… Less…