Search (343 results, page 1 of 18)

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.15

0.1487487 = product of:
  0.2974974 = sum of:
    0.07437435 = product of:
      0.22312303 = sum of:
        0.22312303 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.22312303 = score(doc=862,freq=2.0), product of:
            0.39700332 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046827413 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.22312303 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.22312303 = score(doc=862,freq=2.0), product of:
        0.39700332 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046827413 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.5 = coord(2/4)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Candela, G.: ¬An automatic data quality approach to assess semantic data from cultural heritage institutions (2023) 0.06

0.055452086 = product of:
  0.11090417 = sum of:
    0.088698536 = weight(_text_:data in 997) [ClassicSimilarity], result of:
      0.088698536 = score(doc=997,freq=12.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.59902847 = fieldWeight in 997, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=997)
    0.022205638 = product of:
      0.044411276 = sum of:
        0.044411276 = weight(_text_:22 in 997) [ClassicSimilarity], result of:
          0.044411276 = score(doc=997,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.2708308 = fieldWeight in 997, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=997)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In recent years, cultural heritage institutions have been exploring the benefits of applying Linked Open Data to their catalogs and digital materials. Innovative and creative methods have emerged to publish and reuse digital contents to promote computational access, such as the concepts of Labs and Collections as Data. Data quality has become a requirement for researchers and training methods based on artificial intelligence and machine learning. This article explores how the quality of Linked Open Data made available by cultural heritage institutions can be automatically assessed. The results obtained can be useful for other institutions who wish to publish and assess their collections.
Date: 22. 6.2023 18:23:31

Jia, J.: From data to knowledge : the relationships between vocabularies, linked data and knowledge graphs (2021) 0.05
```
0.054559413 = product of:
  0.10911883 = sum of:
    0.09325766 = weight(_text_:data in 106) [ClassicSimilarity], result of:
      0.09325766 = score(doc=106,freq=26.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.6298187 = fieldWeight in 106, product of:
          5.0990195 = tf(freq=26.0), with freq of:
            26.0 = termFreq=26.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=106)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 106) [ClassicSimilarity], result of:
          0.03172234 = score(doc=106,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 106, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=106)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Purpose The purpose of this paper is to identify the concepts, component parts and relationships between vocabularies, linked data and knowledge graphs (KGs) from the perspectives of data and knowledge transitions. Design/methodology/approach This paper uses conceptual analysis methods. This study focuses on distinguishing concepts and analyzing composition and intercorrelations to explore data and knowledge transitions. Findings Vocabularies are the cornerstone for accurately building understanding of the meaning of data. Vocabularies provide for a data-sharing model and play an important role in supporting the semantic expression of linked data and defining the schema layer; they are also used for entity recognition, alignment and linkage for KGs. KGs, which consist of a schema layer and a data layer, are presented as cubes that organically combine vocabularies, linked data and big data. Originality/value This paper first describes the composition of vocabularies, linked data and KGs. More importantly, this paper innovatively analyzes and summarizes the interrelatedness of these factors, which comes from frequent interactions between data and knowledge. The three factors empower each other and can ultimately empower the Semantic Web.

Date

22. 1.2021 14:24:32
Palsdottir, A.: Data literacy and management of research data : a prerequisite for the sharing of research data (2021) 0.05
```
0.05375583 = product of:
  0.10751166 = sum of:
    0.09482273 = weight(_text_:data in 183) [ClassicSimilarity], result of:
      0.09482273 = score(doc=183,freq=42.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.6403884 = fieldWeight in 183, product of:
          6.4807405 = tf(freq=42.0), with freq of:
            42.0 = termFreq=42.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.03125 = fieldNorm(doc=183)
    0.012688936 = product of:
      0.025377871 = sum of:
        0.025377871 = weight(_text_:22 in 183) [ClassicSimilarity], result of:
          0.025377871 = score(doc=183,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.15476047 = fieldWeight in 183, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03125 = fieldNorm(doc=183)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Purpose The purpose of this paper is to investigate the knowledge and attitude about research data management, the use of data management methods and the perceived need for support, in relation to participants' field of research. Design/methodology/approach This is a quantitative study. Data were collected by an email survey and sent to 792 academic researchers and doctoral students. Total response rate was 18% (N = 139). The measurement instrument consisted of six sets of questions: about data management plans, the assignment of additional information to research data, about metadata, standard file naming systems, training at data management methods and the storing of research data. Findings The main finding is that knowledge about the procedures of data management is limited, and data management is not a normal practice in the researcher's work. They were, however, in general, of the opinion that the university should take the lead by recommending and offering access to the necessary tools of data management. Taken together, the results indicate that there is an urgent need to increase the researcher's understanding of the importance of data management that is based on professional knowledge and to provide them with resources and training that enables them to make effective and productive use of data management methods. Research limitations/implications The survey was sent to all members of the population but not a sample of it. Because of the response rate, the results cannot be generalized to all researchers at the university. Nevertheless, the findings may provide an important understanding about their research data procedures, in particular what characterizes their knowledge about data management and attitude towards it. Practical implications Awareness of these issues is essential for information specialists at academic libraries, together with other units within the universities, to be able to design infrastructures and develop services that suit the needs of the research community. The findings can be used, to develop data policies and services, based on professional knowledge of best practices and recognized standards that assist the research community at data management. Originality/value The study contributes to the existing literature about research data management by examining the results by participants' field of research. Recognition of the issues is critical in order for information specialists in collaboration with universities to design relevant infrastructures and services for academics and doctoral students that can promote their research data management.

Date

20. 1.2015 18:30:22
Cerda-Cosme, R.; Méndez, E.: Analysis of shared research data in Spanish scientific papers about COVID-19 : a first approach (2023) 0.05
```
0.05082287 = product of:
  0.10164574 = sum of:
    0.08578457 = weight(_text_:data in 916) [ClassicSimilarity], result of:
      0.08578457 = score(doc=916,freq=22.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.5793489 = fieldWeight in 916, product of:
          4.690416 = tf(freq=22.0), with freq of:
            22.0 = termFreq=22.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=916)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 916) [ClassicSimilarity], result of:
          0.03172234 = score(doc=916,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 916, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=916)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

During the coronavirus pandemic, changes in the way science is done and shared occurred, which motivates meta-research to help understand science communication in crises and improve its effectiveness. The objective is to study how many Spanish scientific papers on COVID-19 published during 2020 share their research data. Qualitative and descriptive study applying nine attributes: (a) availability, (b) accessibility, (c) format, (d) licensing, (e) linkage, (f) funding, (g) editorial policy, (h) content, and (i) statistics. We analyzed 1,340 papers, 1,173 (87.5%) did not have research data. A total of 12.5% share their research data of which 2.1% share their data in repositories, 5% share their data through a simple request, 0.2% do not have permission to share their data, and 5.2% share their data as supplementary material. There is a small percentage that shares their research data; however, it demonstrates the researchers' poor knowledge on how to properly share their research data and their lack of knowledge on what is research data.

Date

21. 3.2023 19:22:02
Ilhan, A.; Fietkiewicz, K.J.: Data privacy-related behavior and concerns of activity tracking technology users from Germany and the USA (2021) 0.05
```
0.04882677 = product of:
  0.09765354 = sum of:
    0.08179237 = weight(_text_:data in 180) [ClassicSimilarity], result of:
      0.08179237 = score(doc=180,freq=20.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.5523875 = fieldWeight in 180, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=180)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 180) [ClassicSimilarity], result of:
          0.03172234 = score(doc=180,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 180, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=180)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Purpose This investigation aims to examine the differences and similarities between activity tracking technology users from two regions (the USA and Germany) in their intended privacy-related behavior. The focus lies on data handling after hypothetical discontinuance of use, data protection and privacy policy seeking, and privacy concerns. Design/methodology/approach The data was collected through an online survey in 2019. In order to identify significant differences between participants from Germany and the USA, the chi-squared test and the Mann-Whitney U test were applied. Findings The intensity of several privacy-related concerns was significantly different between the two groups. The majority of the participants did not inform themselves about the respective data privacy policies or terms and conditions before installing an activity tracking application. The majority of the German participants knew that they could request the deletion of all their collected data. In contrast, only 35% out of 68 participants from the US knew about this option. Research limitations/implications This study intends to raise awareness about managing the collected health and fitness data after stopping to use activity tracking technologies. Furthermore, to reduce privacy and security concerns, the involvement of the government, companies and users is necessary to handle and share data more considerably and in a sustainable way. Originality/value This study sheds light on users of activity tracking technologies from a broad perspective (here, participants from the USA and Germany). It incorporates not only concerns and the privacy paradox but (intended) user behavior, including seeking information on data protection and privacy policy and handling data after hypothetical discontinuance of use of the technology.

Date

20. 1.2015 18:30:22

Fang, Z.; Dudek, J.; Costas, R.: Facing the volatility of tweets in altmetric research (2022) 0.05

0.047419276 = product of:
  0.09483855 = sum of:
    0.06940313 = weight(_text_:data in 605) [ClassicSimilarity], result of:
      0.06940313 = score(doc=605,freq=10.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.46871632 = fieldWeight in 605, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=605)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 605) [ClassicSimilarity], result of:
          0.05087085 = score(doc=605,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=605)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The data re-collection for tweets from data snapshots is a common methodological step in Twitter-based research. Understanding better the volatility of tweets over time is important for validating the reliability of metrics based on Twitter data. We tracked a set of 37,918 original scholarly tweets mentioning COVID-19-related research daily for 56 days and captured the reasons for the changes in their availability over time. Results show that the proportion of unavailable tweets increased from 1.6 to 2.6% in the time window observed. Of the 1,323 tweets that became unavailable at some point in the period observed, 30.5% became available again afterwards. "Revived" tweets resulted mainly from the unprotecting, reactivating, or unsuspending of users' accounts. Our findings highlight the importance of noting this dynamic nature of Twitter data in altmetric research and testify to the challenges that this poses for the retrieval, processing, and interpretation of Twitter data about scientific papers.

Xiang, R.; Chersoni, E.; Lu, Q.; Huang, C.-R.; Li, W.; Long, Y.: Lexical data augmentation for sentiment analysis (2021) 0.05
```
0.047176752 = product of:
  0.094353504 = sum of:
    0.07315732 = weight(_text_:data in 392) [ClassicSimilarity], result of:
      0.07315732 = score(doc=392,freq=16.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.49407038 = fieldWeight in 392, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=392)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 392) [ClassicSimilarity], result of:
          0.042392377 = score(doc=392,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 392, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=392)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Machine learning methods, especially deep learning models, have achieved impressive performance in various natural language processing tasks including sentiment analysis. However, deep learning models are more demanding for training data. Data augmentation techniques are widely used to generate new instances based on modifications to existing data or relying on external knowledge bases to address annotated data scarcity, which hinders the full potential of machine learning techniques. This paper presents our work using part-of-speech (POS) focused lexical substitution for data augmentation (PLSDA) to enhance the performance of machine learning algorithms in sentiment analysis. We exploit POS information to identify words to be replaced and investigate different augmentation strategies to find semantically related substitutions when generating new instances. The choice of POS tags as well as a variety of strategies such as semantic-based substitution methods and sampling methods are discussed in detail. Performance evaluation focuses on the comparison between PLSDA and two previous lexical substitution-based data augmentation methods, one of which is thesaurus-based, and the other is lexicon manipulation based. Our approach is tested on five English sentiment analysis benchmarks: SST-2, MR, IMDB, Twitter, and AirRecord. Hyperparameters such as the candidate similarity threshold and number of newly generated instances are optimized. Results show that six classifiers (SVM, LSTM, BiLSTM-AT, bidirectional encoder representations from transformers [BERT], XLNet, and RoBERTa) trained with PLSDA achieve accuracy improvement of more than 0.6% comparing to two previous lexical substitution methods averaged on five benchmarks. Introducing POS constraint and well-designed augmentation strategies can improve the reliability of lexical data augmentation methods. Consequently, PLSDA significantly improves the performance of sentiment analysis algorithms.
Urs, S.R.; Minhaj, M.: Evolution of data science and its education in iSchools : an impressionistic study using curriculum analysis (2023) 0.05
```
0.047176752 = product of:
  0.094353504 = sum of:
    0.07315732 = weight(_text_:data in 960) [ClassicSimilarity], result of:
      0.07315732 = score(doc=960,freq=16.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.49407038 = fieldWeight in 960, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=960)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 960) [ClassicSimilarity], result of:
          0.042392377 = score(doc=960,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 960, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=960)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Data Science (DS) has emerged from the shadows of its parents-statistics and computer science-into an independent field since its origin nearly six decades ago. Its evolution and education have taken many sharp turns. We present an impressionistic study of the evolution of DS anchored to Kuhn's four stages of paradigm shifts. First, we construct the landscape of DS based on curriculum analysis of the 32 iSchools across the world offering graduate-level DS programs. Second, we paint the "field" as it emerges from the word frequency patterns, ranking, and clustering of course titles based on text mining. Third, we map the curriculum to the landscape of DS and project the same onto the Edison Data Science Framework (2017) and ACM Data Science Knowledge Areas (2021). Our study shows that the DS programs of iSchools align well with the field and correspond to the Knowledge Areas and skillsets. iSchool's DS curriculums exhibit a bias toward "data visualization" along with machine learning, data mining, natural language processing, and artificial intelligence; go light on statistics; slanted toward ontologies and health informatics; and surprisingly minimal thrust toward eScience/research data management, which we believe would add a distinctive iSchool flavor to the DS.

Footnote

Beitrag in einem Special issue on "Data Science in the iField".

Dunsire, G.; Fritz, D.; Fritz, R.: Instructions, interfaces, and interoperable data : the RIMMF experience with RDA revisited (2020) 0.05

0.046197 = product of:
  0.092394 = sum of:
    0.06271934 = weight(_text_:data in 5751) [ClassicSimilarity], result of:
      0.06271934 = score(doc=5751,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.42357713 = fieldWeight in 5751, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5751)
    0.029674664 = product of:
      0.05934933 = sum of:
        0.05934933 = weight(_text_:processing in 5751) [ClassicSimilarity], result of:
          0.05934933 = score(doc=5751,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.3130829 = fieldWeight in 5751, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5751)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This article presents a case study of RIMMF, a software tool developed to improve the orientation and training of catalogers who use Resource Description and Access (RDA) to maintain bibliographic data. The cataloging guidance and instructions of RDA are based on the Functional Requirements conceptual models that are now consolidated in the IFLA Library Reference Model, but many catalogers are applying RDA in systems that have evolved from inventory and text-processing applications developed from older metadata paradigms. The article describes how RIMMF interacts with the RDA Toolkit and RDA Registry to offer cataloger-friendly multilingual data input and editing interfaces.

Wu, P.F.: Veni, vidi, vici? : On the rise of scrape-and-report scholarship in online reviews research (2023) 0.04

0.042462487 = product of:
  0.08492497 = sum of:
    0.06271934 = weight(_text_:data in 896) [ClassicSimilarity], result of:
      0.06271934 = score(doc=896,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.42357713 = fieldWeight in 896, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=896)
    0.022205638 = product of:
      0.044411276 = sum of:
        0.044411276 = weight(_text_:22 in 896) [ClassicSimilarity], result of:
          0.044411276 = score(doc=896,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.2708308 = fieldWeight in 896, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=896)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: JASIST has in recent years received many submissions reporting data analytics based on "Big Data" of online reviews scraped from various platforms. By outlining major issues in this type of scape-and-report scholarship and providing a set of recommendations, this essay encourages online reviews researchers to look at Big Data with a critical eye and treat online reviews as a sociotechnical "thing" produced within the fabric of sociomaterial life.
Date: 22. 1.2023 18:33:53

Tang, X.-B.; Fu, W.-G.; Liu, Y.: Knowledge big graph fusing ontology with property graph : a case study of financial ownership network (2021) 0.04
```
0.039516065 = product of:
  0.07903213 = sum of:
    0.057835944 = weight(_text_:data in 234) [ClassicSimilarity], result of:
      0.057835944 = score(doc=234,freq=10.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.39059696 = fieldWeight in 234, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=234)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 234) [ClassicSimilarity], result of:
          0.042392377 = score(doc=234,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 234, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=234)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

The scale of knowledge is growing rapidly in the big data environment, and traditional knowledge organization and services have faced the dilemma of semantic inaccuracy and untimeliness. From a knowledge fusion perspective-combining the precise semantic superiority of traditional ontology with the large-scale graph processing power and the predicate attribute expression ability of property graph-this paper presents an ontology and property graph fusion framework (OPGFF). The fusion process is divided into content layer fusion and constraint layer fusion. The result of the fusion, that is, the knowledge representation model is called knowledge big graph. In addition, this paper applies the knowledge big graph model to the ownership network in the China's financial field and builds a financial ownership knowledge big graph. Furthermore, this paper designs and implements six consistency inference algorithms for finding contradictory data and filling in missing data in the financial ownership knowledge big graph, five of which are completely domain agnostic. The correctness and validity of the algorithms have been experimentally verified with actual data. The fusion OPGFF framework and the implementation method of the knowledge big graph could provide technical reference for big data knowledge organization and services.

Al-Khatib, K.; Ghosa, T.; Hou, Y.; Waard, A. de; Freitag, D.: Argument mining for scholarly document processing : taking stock and looking ahead (2021) 0.04

0.03908867 = product of:
  0.07817734 = sum of:
    0.036211025 = weight(_text_:data in 568) [ClassicSimilarity], result of:
      0.036211025 = score(doc=568,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24455236 = fieldWeight in 568, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0546875 = fieldNorm(doc=568)
    0.041966315 = product of:
      0.08393263 = sum of:
        0.08393263 = weight(_text_:processing in 568) [ClassicSimilarity], result of:
          0.08393263 = score(doc=568,freq=4.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.4427661 = fieldWeight in 568, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0546875 = fieldNorm(doc=568)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Argument mining targets structures in natural language related to interpretation and persuasion. Most scholarly discourse involves interpreting experimental evidence and attempting to persuade other scientists to adopt the same conclusions, which could benefit from argument mining techniques. However, While various argument mining studies have addressed student essays and news articles, those that target scientific discourse are still scarce. This paper surveys existing work in argument mining of scholarly discourse, and provides an overview of current models, data, tasks, and applications. We identify a number of key challenges confronting argument mining in the scientific domain, and suggest some possible solutions and future directions.
Source: Proceedings of the Second Workshop on Scholarly Document Processing,

Das, S.; Paik, J.H.: Gender tagging of named entities using retrieval-assisted multi-context aggregation : an unsupervised approach (2023) 0.03

0.0314639 = product of:
  0.0629278 = sum of:
    0.043894395 = weight(_text_:data in 941) [ClassicSimilarity], result of:
      0.043894395 = score(doc=941,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.29644224 = fieldWeight in 941, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=941)
    0.019033402 = product of:
      0.038066804 = sum of:
        0.038066804 = weight(_text_:22 in 941) [ClassicSimilarity], result of:
          0.038066804 = score(doc=941,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.23214069 = fieldWeight in 941, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=941)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Inferring the gender of named entities present in a text has several practical applications in information sciences. Existing approaches toward name gender identification rely exclusively on using the gender distributions from labeled data. In the absence of such labeled data, these methods fail. In this article, we propose a two-stage model that is able to infer the gender of names present in text without requiring explicit name-gender labels. We use coreference resolution as the backbone for our proposed model. To aid coreference resolution where the existing contextual information does not suffice, we use a retrieval-assisted context aggregation framework. We demonstrate that state-of-the-art name gender inference is possible without supervision. Our proposed method matches or outperforms several supervised approaches and commercially used methods on five English language datasets from different domains.
Date: 22. 3.2023 12:00:14

Kang, M.: Dual paths to continuous online knowledge sharing : a repetitive behavior perspective (2020) 0.03
```
0.030330349 = product of:
  0.060660698 = sum of:
    0.04479953 = weight(_text_:data in 5985) [ClassicSimilarity], result of:
      0.04479953 = score(doc=5985,freq=6.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.30255508 = fieldWeight in 5985, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5985)
    0.01586117 = product of:
      0.03172234 = sum of:
        0.03172234 = weight(_text_:22 in 5985) [ClassicSimilarity], result of:
          0.03172234 = score(doc=5985,freq=2.0), product of:
            0.16398162 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046827413 = queryNorm
            0.19345059 = fieldWeight in 5985, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5985)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Purpose Continuous knowledge sharing by active users, who are highly active in answering questions, is crucial to the sustenance of social question-and-answer (Q&A) sites. The purpose of this paper is to examine such knowledge sharing considering reason-based elaborate decision and habit-based automated cognitive processes. Design/methodology/approach To verify the research hypotheses, survey data on subjective intentions and web-crawled data on objective behavior are utilized. The sample size is 337 with the response rate of 27.2 percent. Negative binomial and hierarchical linear regressions are used given the skewed distribution of the dependent variable (i.e. the number of answers). Findings Both elaborate decision (linking satisfaction, intentions and continuance behavior) and automated cognitive processes (linking past and continuance behavior) are significant and substitutable. Research limitations/implications By measuring both subjective intentions and objective behavior, it verifies a detailed mechanism linking continuance intentions, past behavior and continuous knowledge sharing. The significant influence of automated cognitive processes implies that online knowledge sharing is habitual for active users. Practical implications Understanding that online knowledge sharing is habitual is imperative to maintaining continuous knowledge sharing by active users. Knowledge sharing trends should be monitored to check if the frequency of sharing decreases. Social Q&A sites should intervene to restore knowledge sharing behavior through personalized incentives. Originality/value This is the first study utilizing both subjective intentions and objective behavior data in the context of online knowledge sharing. It also introduces habit-based automated cognitive processes to this context. This approach extends the current understanding of continuous online knowledge sharing behavior.

Date

20. 1.2015 18:30:22
Hjoerland, B.: Science, Part I : basic conceptions of science and the scientific method (2021) 0.03
```
0.028887425 = product of:
  0.05777485 = sum of:
    0.03657866 = weight(_text_:data in 594) [ClassicSimilarity], result of:
      0.03657866 = score(doc=594,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 594, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=594)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 594) [ClassicSimilarity], result of:
          0.042392377 = score(doc=594,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 594, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=594)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This article is the first in a trilogy about the concept "science". Section 1 considers the historical development of the meaning of the term science and shows its close relation to the terms "knowledge" and "philosophy". Section 2 presents four historic phases in the basic conceptualizations of science (1) science as representing absolute certain of knowledge based on deductive proof; (2) science as representing absolute certain of knowledge based on "the scientific method"; (3) science as representing fallible knowledge based on "the scientific method"; (4) science without a belief in "the scientific method" as constitutive, hence the question about the nature of science becomes dramatic. Section 3 presents four basic understandings of the scientific method: Rationalism, which gives priority to a priori thinking; empiricism, which gives priority to the collection, description, and processing of data in a neutral way; historicism, which gives priority to the interpretation of data in the light of "paradigm" and pragmatism, which emphasizes the analysis of the purposes, consequences, and the interests of knowledge. The second article in the trilogy focus on different fields studying science, while the final article presets further developments in the concept of science and the general conclusion. Overall, the trilogy illuminates the most important tensions in different conceptualizations of science and argues for the role of information science and knowledge organization in the study of science and suggests how "science" should be understood as an object of research in these fields.
Morrison, H.; Borges, L.; Zhao, X.; Kakou, T.L.; Shanbhoug, A.N.: Change and growth in open access journal publishing and charging trends 2011-2021 (2022) 0.03
```
0.028887425 = product of:
  0.05777485 = sum of:
    0.03657866 = weight(_text_:data in 741) [ClassicSimilarity], result of:
      0.03657866 = score(doc=741,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 741, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=741)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 741) [ClassicSimilarity], result of:
          0.042392377 = score(doc=741,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 741, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=741)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

This study examines trends in open access article processing charges (APCs) from 2011 to 2021, building on a 2011 study by Solomon and Björk. Two methods are employed, a modified replica and a status update of the 2011 journals. Data are drawn from multiple sources and datasets are available as open data. Most journals do not charge APCs; this has not changed. The global average per-journal APC increased slightly, from 906 to 958 USD, while the per-article average increased from 904 to 1,626 USD, indicating that authors choose to publish in more expensive journals. Publisher size, type, impact metrics and subject affect charging tendencies, average APC, and pricing trends. Half the journals from the 2011 sample are no longer listed in DOAJ in 2021, due to ceased publication or publisher de-listing. Conclusions include a caution about the potential of the APC model to increase costs beyond inflation. The university sector may be the most promising approach to economically sustainable no-fee OA journals. Universities publish many OA journals, nearly half of OA articles, tend not to charge APCs and when APCs are charged, the prices are very low on average.
Favato Barcelos, P.P.; Sales, T.P.; Fumagalli, M.; Guizzardi, G.; Valle Sousa, I.; Fonseca, C.M.; Romanenko, E.; Kritz, J.: ¬A FAIR model catalog for ontology-driven conceptual modeling research (2022) 0.03
```
0.028887425 = product of:
  0.05777485 = sum of:
    0.03657866 = weight(_text_:data in 756) [ClassicSimilarity], result of:
      0.03657866 = score(doc=756,freq=4.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.24703519 = fieldWeight in 756, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.0390625 = fieldNorm(doc=756)
    0.021196188 = product of:
      0.042392377 = sum of:
        0.042392377 = weight(_text_:processing in 756) [ClassicSimilarity], result of:
          0.042392377 = score(doc=756,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.22363065 = fieldWeight in 756, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.0390625 = fieldNorm(doc=756)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Conceptual models are artifacts representing conceptualizations of particular domains. Hence, multi-domain model catalogs serve as empirical sources of knowledge and insights about specific domains, about the use of a modeling language's constructs, as well as about the patterns and anti-patterns recurrent in the models of that language crosscutting different domains. However, to support domain and language learning, model reuse, knowledge discovery for humans, and reliable automated processing and analysis by machines, these catalogs must be built following generally accepted quality requirements for scientific data management. Especially, all scientific (meta)data-including models-should be created using the FAIR principles (Findability, Accessibility, Interoperability, and Reusability). In this paper, we report on the construction of a FAIR model catalog for Ontology-Driven Conceptual Modeling research, a trending paradigm lying at the intersection of conceptual modeling and ontology engineering in which the Unified Foundational Ontology (UFO) and OntoUML emerged among the most adopted technologies. In this initial release, the catalog includes over a hundred models, developed in a variety of contexts and domains. The paper also discusses the research implications for (ontology-driven) conceptual modeling of such a resource.
Li, W.; Zheng, Y.; Zhan, Y.; Feng, R.; Zhang, T.; Fan, W.: Cross-modal retrieval with dual multi-angle self-attention (2021) 0.03
```
0.028236724 = product of:
  0.05647345 = sum of:
    0.031038022 = weight(_text_:data in 67) [ClassicSimilarity], result of:
      0.031038022 = score(doc=67,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 67, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=67)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 67) [ClassicSimilarity], result of:
          0.05087085 = score(doc=67,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 67, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=67)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In recent years, cross-modal retrieval has been a popular research topic in both fields of computer vision and natural language processing. There is a huge semantic gap between different modalities on account of heterogeneous properties. How to establish the correlation among different modality data faces enormous challenges. In this work, we propose a novel end-to-end framework named Dual Multi-Angle Self-Attention (DMASA) for cross-modal retrieval. Multiple self-attention mechanisms are applied to extract fine-grained features for both images and texts from different angles. We then integrate coarse-grained and fine-grained features into a multimodal embedding space, in which the similarity degrees between images and texts can be directly compared. Moreover, we propose a special multistage training strategy, in which the preceding stage can provide a good initial value for the succeeding stage and make our framework work better. Very promising experimental results over the state-of-the-art methods can be achieved on three benchmark datasets of Flickr8k, Flickr30k, and MSCOCO.

Brito, M. de: Social affects engineering and ethics (2023) 0.03

0.028236724 = product of:
  0.05647345 = sum of:
    0.031038022 = weight(_text_:data in 1135) [ClassicSimilarity], result of:
      0.031038022 = score(doc=1135,freq=2.0), product of:
        0.14807065 = queryWeight, product of:
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046827413 = queryNorm
        0.2096163 = fieldWeight in 1135, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.1620505 = idf(docFreq=5088, maxDocs=44218)
          0.046875 = fieldNorm(doc=1135)
    0.025435425 = product of:
      0.05087085 = sum of:
        0.05087085 = weight(_text_:processing in 1135) [ClassicSimilarity], result of:
          0.05087085 = score(doc=1135,freq=2.0), product of:
            0.18956426 = queryWeight, product of:
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046827413 = queryNorm
            0.26835677 = fieldWeight in 1135, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.048147 = idf(docFreq=2097, maxDocs=44218)
              0.046875 = fieldNorm(doc=1135)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: This text proposes a multidisciplinary reflection on the subject of ethics, based on philosophical approaches, using Spinoza's work, Ethics, as a foundation. The power of Spinoza's geometric reasoning and deterministic logic, compatible with formal grammars and programming languages, provides a favorable framework for this purpose. In an information society characterized by an abundance of data and a diversity of perspectives, complex thinking is an essential tool for developing an ethical construct that can deal with the uncertainty and contradictions in the field. Acknowledging the natural complexity of ethics in interpersonal relationships, the use of AI techniques appears unavoidable. Artificial intelligence in KOS offers the potential for processing complex questions through the formal modeling of concepts in ethical discourse. By formalizing problems, we hope to unleash the potential of ethical analysis; by addressing complexity analysis, we propose a mechanism for understanding problems and empowering solutions.

Search (343 results, page 1 of 18)

Authors

Languages

Types

Themes

Subjects

Classifications