Search (13 results, page 1 of 1)

Wang, S.; Ma, Y.; Mao, J.; Bai, Y.; Liang, Z.; Li, G.: Quantifying scientific breakthroughs by a novel disruption indicator based on knowledge entities : On the rise of scrape-and-report scholarship in online reviews research (2023) 0.02

0.018446533 = product of:
  0.04611633 = sum of:
    0.0068111527 = weight(_text_:a in 882) [ClassicSimilarity], result of:
      0.0068111527 = score(doc=882,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.12739488 = fieldWeight in 882, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=882)
    0.039305177 = sum of:
      0.007893822 = weight(_text_:information in 882) [ClassicSimilarity], result of:
        0.007893822 = score(doc=882,freq=2.0), product of:
          0.08139861 = queryWeight, product of:
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.046368346 = queryNorm
          0.09697737 = fieldWeight in 882, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            1.7554779 = idf(docFreq=20772, maxDocs=44218)
            0.0390625 = fieldNorm(doc=882)
      0.031411353 = weight(_text_:22 in 882) [ClassicSimilarity], result of:
        0.031411353 = score(doc=882,freq=2.0), product of:
          0.16237405 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046368346 = queryNorm
          0.19345059 = fieldWeight in 882, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=882)
  0.4 = coord(2/5)

Abstract: Compared to previous studies that generally detect scientific breakthroughs based on citation patterns, this article proposes a knowledge entity-based disruption indicator by quantifying the change of knowledge directly created and inspired by scientific breakthroughs to their evolutionary trajectories. Two groups of analytic units, including MeSH terms and their co-occurrences, are employed independently by the indicator to measure the change of knowledge. The effectiveness of the proposed indicators was evaluated against the four datasets of scientific breakthroughs derived from four recognition trials. In terms of identifying scientific breakthroughs, the proposed disruption indicator based on MeSH co-occurrences outperforms that based on MeSH terms and three earlier citation-based disruption indicators. It is also shown that in our indicator, measuring the change of knowledge inspired by the focal paper in its evolutionary trajectory is a larger contributor than measuring the change created by the focal paper. Our study not only offers empirical insights into conceptual understanding of scientific breakthroughs but also provides practical disruption indicator for scientists and science management agencies searching for valuable research.
Date: 22. 1.2023 18:37:33
Source: Journal of the Association for Information Science and Technology. 74(2023) no.2, S.150-167
Type: a

Cai, F.; Wang, S.; Rijke, M.de: Behavior-based personalization in web search (2017) 0.01
```
0.007183318 = product of:
  0.017958295 = sum of:
    0.010897844 = weight(_text_:a in 3527) [ClassicSimilarity], result of:
      0.010897844 = score(doc=3527,freq=32.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20383182 = fieldWeight in 3527, product of:
          5.656854 = tf(freq=32.0), with freq of:
            32.0 = termFreq=32.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=3527)
    0.0070604496 = product of:
      0.014120899 = sum of:
        0.014120899 = weight(_text_:information in 3527) [ClassicSimilarity], result of:
          0.014120899 = score(doc=3527,freq=10.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1734784 = fieldWeight in 3527, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=3527)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Personalized search approaches tailor search results to users' current interests, so as to help improve the likelihood of a user finding relevant documents for their query. Previous work on personalized search focuses on using the content of the user's query and of the documents clicked to model the user's preference. In this paper we focus on a different type of signal: We investigate the use of behavioral information for the purpose of search personalization. That is, we consider clicks and dwell time for reranking an initially retrieved list of documents. In particular, we (i) investigate the impact of distributions of users and queries on document reranking; (ii) estimate the relevance of a document for a query at 2 levels, at the query-level and at the word-level, to alleviate the problem of sparseness; and (iii) perform an experimental evaluation both for users seen during the training period and for users not seen during training. For the latter, we explore the use of information from similar users who have been seen during the training period. We use the dwell time on clicked documents to estimate a document's relevance to a query, and perform Bayesian probabilistic matrix factorization to generate a relevance distribution of a document over queries. Our experiments show that: (i) for personalized ranking, behavioral information helps to improve retrieval effectiveness; and (ii) given a query, merging information inferred from behavior of a particular user and from behaviors of other users with a user-dependent adaptive weight outperforms any combination with a fixed weight.

Footnote

A preliminary version of this paper was published in the proceedings of SIGIR '14. In this extension, we (i) extend the behavioral personalization search model introduced there to deal with queries issued by new users for whom long-term search logs are unavailable; (ii) examine the impact of sparseness on the performance of our model by considering both word-level and query-level modeling, as we find that the word-document relevance matrix is less sparse than the query-document relevance matrix; (iii) investigate the effectiveness of our behavior-based reranking model with and without assuming a uniform distribution of users as users may behave differently; (iv) include more related work and provide a detailed discussion of the experimental results.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.4, S.855-868

Type

a
Ren, P.; Chen, Z.; Ma, J.; Zhang, Z.; Si, L.; Wang, S.: Detecting temporal patterns of user queries (2017) 0.01
```
0.005948606 = product of:
  0.014871514 = sum of:
    0.008173384 = weight(_text_:a in 3315) [ClassicSimilarity], result of:
      0.008173384 = score(doc=3315,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 3315, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3315)
    0.0066981306 = product of:
      0.013396261 = sum of:
        0.013396261 = weight(_text_:information in 3315) [ClassicSimilarity], result of:
          0.013396261 = score(doc=3315,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16457605 = fieldWeight in 3315, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3315)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Query classification is an important part of exploring the characteristics of web queries. Existing studies are mainly based on Broder's classification scheme and classify user queries into navigational, informational, and transactional categories according to users' information needs. In this article, we present a novel classification scheme from the perspective of queries' temporal patterns. Queries' temporal patterns are inherent time series patterns of the search volumes of queries that reflect the evolution of the popularity of a query over time. By analyzing the temporal patterns of queries, search engines can more deeply understand the users' search intents and thus improve performance. Furthermore, we extract three groups of features based on the queries' search volume time series and use a support vector machine (SVM) to automatically detect the temporal patterns of user queries. Extensive experiments on the Million Query Track data sets of the Text REtrieval Conference (TREC) demonstrate the effectiveness of our approach.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.1, S.113-128

Type

a
Wang, S.; Koopman, R.: Second life for authority records (2015) 0.01
```
0.0054569542 = product of:
  0.013642386 = sum of:
    0.008173384 = weight(_text_:a in 2303) [ClassicSimilarity], result of:
      0.008173384 = score(doc=2303,freq=18.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15287387 = fieldWeight in 2303, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.03125 = fieldNorm(doc=2303)
    0.0054690014 = product of:
      0.010938003 = sum of:
        0.010938003 = weight(_text_:information in 2303) [ClassicSimilarity], result of:
          0.010938003 = score(doc=2303,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.1343758 = fieldWeight in 2303, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=2303)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Authority control is a standard practice in the library community that provides consistent, unique, and unambiguous reference to entities such as persons, places, concepts, etc. The ideal way of referring to authority records through unique identifiers is in line with the current linked data principle. When presenting a bibliographic record, the linked authority records are expanded with the authoritative information. This way, any update in the authority records will not affect the indexing of the bibliographic records. The structural information in the authority files can also be leveraged to expand the user's query to retrieve bibliographic records associated with all the variations, narrower terms or related terms. However, in many digital libraries, especially largescale aggregations such as WorldCat and Europeana, name strings are often used instead of authority record identifiers. This is also partly due to the lack of global authority records that are valid across countries and cultural heritage domains. But even when there are global authority systems, they are not applied at scale. For example, in WorldCat, only 15% of the records have DDC and 3% have UDC codes; less than 40% of the records have one or more topical terms catalogued in the 650 MARC field, many of which are too general (such as "sports" or "literature") to be useful for retrieving bibliographic records. Therefore, when a user query is based on a Dewey code, the results usually have high precision but the recall is much lower than it should be; and, a search on a general topical term returns millions of hits without being even complete. All these practices make it difficult to leverage the key benefits of authority files. This is also true for authority files that have been transformed into linked data and enriched with mapping information. There are practical reasons for using name strings instead of identifiers. One is the indexing and query response. The future infrastructure design should take the performance into account while embracing the benefit of linking instead of copying, without introducing extra complexity to users. Notwithstanding all the restrictions, we argue that largescale aggregations also bring new opportunities for better exploiting the benefits of authority records. It is possible to use machine learning techniques to automatically link bibliographic records to authority records based on the manual input of cataloguers. Text mining and visualization techniques can offer a contextual view of authority records, which in return can be used to retrieve missing or mis-catalogued records. In this talk, we will describe such opportunities in more detail.

Source

Classification and authority control: expanding resource discovery: proceedings of the International UDC Seminar 2015, 29-30 October 2015, Lisbon, Portugal. Eds.: Slavic, A. u. M.I. Cordeiro

Type

a
Cui, C.; Ma, J.; Lian, T.; Chen, Z.; Wang, S.: Improving image annotation via ranking-oriented neighbor search and learning-based keyword propagation (2015) 0.01
```
0.005182888 = product of:
  0.012957219 = sum of:
    0.009010308 = weight(_text_:a in 1609) [ClassicSimilarity], result of:
      0.009010308 = score(doc=1609,freq=14.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1685276 = fieldWeight in 1609, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1609)
    0.003946911 = product of:
      0.007893822 = sum of:
        0.007893822 = weight(_text_:information in 1609) [ClassicSimilarity], result of:
          0.007893822 = score(doc=1609,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.09697737 = fieldWeight in 1609, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1609)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Automatic image annotation plays a critical role in modern keyword-based image retrieval systems. For this task, the nearest-neighbor-based scheme works in two phases: first, it finds the most similar neighbors of a new image from the set of labeled images; then, it propagates the keywords associated with the neighbors to the new image. In this article, we propose a novel approach for image annotation, which simultaneously improves both phases of the nearest-neighbor-based scheme. In the phase of neighbor search, different from existing work discovering the nearest neighbors with the predicted distance, we introduce a ranking-oriented neighbor search mechanism (RNSM), where the ordering of labeled images is optimized directly without going through the intermediate step of distance prediction. In the phase of keyword propagation, different from existing work using simple heuristic rules to select the propagated keywords, we present a learning-based keyword propagation strategy (LKPS), where a scoring function is learned to evaluate the relevance of keywords based on their multiple relations with the nearest neighbors. Extensive experiments on the Corel 5K data set and the MIR Flickr data set demonstrate the effectiveness of our approach.

Source

Journal of the Association for Information Science and Technology. 66(2015) no.1, S.82-98

Type

a
Gwizdka, J.; Hosseini, R.; Cole, M.; Wang, S.: Temporal dynamics of eye-tracking and EEG during reading and relevance decisions (2017) 0.01
```
0.005093954 = product of:
  0.012734884 = sum of:
    0.005898632 = weight(_text_:a in 3822) [ClassicSimilarity], result of:
      0.005898632 = score(doc=3822,freq=6.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.11032722 = fieldWeight in 3822, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3822)
    0.006836252 = product of:
      0.013672504 = sum of:
        0.013672504 = weight(_text_:information in 3822) [ClassicSimilarity], result of:
          0.013672504 = score(doc=3822,freq=6.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.16796975 = fieldWeight in 3822, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3822)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Assessment of text relevance is an important aspect of human-information interaction. For many search sessions it is essential to achieving the task goal. This work investigates text relevance decision dynamics in a question-answering task by direct measurement of eye movement using eye-tracking and brain activity using electroencephalography EEG. The EEG measurements are correlated with the user's goal-directed attention allocation revealed by their eye movements. In a within-subject lab experiment (N?=?24), participants read short news stories of varied relevance. Eye movement and EEG features were calculated in three epochs of reading each news story (early, middle, final) and for periods where relevant words were read. Perceived relevance classification models were learned for each epoch. The results show reading epochs where relevant words were processed could be distinguished from other epochs. The classification models show increasing divergence in processing relevant vs. irrelevant documents after the initial epoch. This suggests differences in cognitive processes used to assess texts of varied relevance levels and provides evidence for the potential to detect these differences in information search sessions using eye tracking and EEG.

Source

Journal of the Association for Information Science and Technology. 68(2017) no.10, S.2299-2312

Type

a
Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.00
```
0.004915534 = product of:
  0.012288835 = sum of:
    0.008341924 = weight(_text_:a in 5400) [ClassicSimilarity], result of:
      0.008341924 = score(doc=5400,freq=12.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.15602624 = fieldWeight in 5400, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5400)
    0.003946911 = product of:
      0.007893822 = sum of:
        0.007893822 = weight(_text_:information in 5400) [ClassicSimilarity], result of:
          0.007893822 = score(doc=5400,freq=2.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.09697737 = fieldWeight in 5400, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.

Footnote

Beitrag eines Special Issue: Research Information Systems and Science Classifications; including papers from "Trajectories for Research: Fathoming the Promise of the NARCIS Classification," 27-28 September 2018, The Hague, The Netherlands.

Type

a
Xie, I.; Babu, R.; Lee, H.S.; Wang, S.; Lee, T.H.: Orientation tactics and associated factors in the digital library environment : comparison between blind and sighted users (2021) 0.00
```
0.004592163 = product of:
  0.011480408 = sum of:
    0.005898632 = weight(_text_:a in 307) [ClassicSimilarity], result of:
      0.005898632 = score(doc=307,freq=6.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.11032722 = fieldWeight in 307, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=307)
    0.0055817757 = product of:
      0.011163551 = sum of:
        0.011163551 = weight(_text_:information in 307) [ClassicSimilarity], result of:
          0.011163551 = score(doc=307,freq=4.0), product of:
            0.08139861 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046368346 = queryNorm
            0.13714671 = fieldWeight in 307, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=307)
      0.5 = coord(1/2)
  0.4 = coord(2/5)
```
Abstract

This is the first study that compares types of orientation tactics that blind and sighted users applied in their initial interactions with a digital library (DL) and the associated factors. Multiple methods were employed for data collection: questionnaires, think-aloud protocols, and transaction logs. The paper identifies seven types of orientation tactics applied by the two groups of users. While sighted users focused on skimming DL content, blind users concentrated on exploring DL structure. Moreover, the authors discovered 13 types of system, user, and interaction factors that led to the use of orientation tactics. More system factors than user factors affect blind users' tactics in browsing DL structures. The findings of this study support the social model that the sight-centered design of DLs, rather than blind users' disability, prohibits them from effectively interacting with a DL. Simultaneously, the results reveal the limitation of existing interactive information retrieval models that do not take people with disabilities into consideration. DL design implications are discussed based on the identified factors.

Source

Journal of the Association for Information Science and Technology. 72(2021) no.8, S.995-1010

Type

a
Zhang, L.; Wang, S.; Liu, B.: Deep learning for sentiment analysis : a survey (2018) 0.00
```
0.0021795689 = product of:
  0.010897844 = sum of:
    0.010897844 = weight(_text_:a in 4092) [ClassicSimilarity], result of:
      0.010897844 = score(doc=4092,freq=8.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.20383182 = fieldWeight in 4092, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=4092)
  0.2 = coord(1/5)
```
Abstract

Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

Type

a
Hollink, L.; Assem, M. van; Wang, S.; Isaac, A.; Schreiber, G.: Two variations on ontology alignment evaluation : methodological issues (2008) 0.00
```
0.0018276243 = product of:
  0.009138121 = sum of:
    0.009138121 = weight(_text_:a in 4645) [ClassicSimilarity], result of:
      0.009138121 = score(doc=4645,freq=10.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1709182 = fieldWeight in 4645, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=4645)
  0.2 = coord(1/5)
```
Abstract

Evaluation of ontology alignments is in practice done in two ways: (1) assessing individual correspondences and (2) comparing the alignment to a reference alignment. However, this type of evaluation does not guarantee that an application which uses the alignment will perform well. In this paper, we contribute to the current ontology alignment evaluation practices by proposing two alternative evaluation methods that take into account some characteristics of a usage scenario without doing a full-fledged end-to-end evaluation. We compare different evaluation approaches in three case studies, focussing on methodological issues. Each case study considers an alignment between a different pair of ontologies, ranging from rich and well-structured to small and poorly structured. This enables us to conclude on the use of different evaluation approaches in different settings.
Wang, S.; Isaac, A.; Schlobach, S.; Meij, L. van der; Schopman, B.: Instance-based semantic interoperability in the cultural heritage (2012) 0.00
```
0.0018020617 = product of:
  0.009010308 = sum of:
    0.009010308 = weight(_text_:a in 125) [ClassicSimilarity], result of:
      0.009010308 = score(doc=125,freq=14.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.1685276 = fieldWeight in 125, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0390625 = fieldNorm(doc=125)
  0.2 = coord(1/5)
```
Abstract

This paper gives a comprehensive overview over the problem of Semantic Interoperability in the Cultural Heritage domain, with a particular focus on solutions centered around extensional, i.e., instance-based, ontology matching methods. It presents three typical scenarios requiring interoperability, one with homogenous collections, one with heterogeneous collections, and one with multi-lingual collection. It discusses two different ways to evaluate potential alignments, one based on the application of re-indexing, one using a reference alignment. To these scenarios we apply extensional matching with different similarity measures which gives interesting insights. Finally, we firmly position our work in the Cultural Heritage context through an extensive discussion of the relevance for, and issues related to this specific field. The findings are as unspectacular as expected but nevertheless important: the provided methods can really improve interoperability in a number of important cases, but they are not universal solutions to all related problems. This paper will provide a solid foundation for any future work on Semantic Interoperability in the Cultural Heritage domain, in particular for anybody intending to apply extensional methods.

Type

a

Isaac, A.; Wang, S.; Zinn, C.; Matthezing, H.; Meij, L. van der; Schlobach, S.: Evaluating thesaurus alignments for semantic interoperability in the library domain (2009) 0.00

0.001541188 = product of:
  0.00770594 = sum of:
    0.00770594 = weight(_text_:a in 1650) [ClassicSimilarity], result of:
      0.00770594 = score(doc=1650,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.14413087 = fieldWeight in 1650, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.0625 = fieldNorm(doc=1650)
  0.2 = coord(1/5)

Type: a

Wang, S.; Isaac, A.; Schopman, B.; Schlobach, S.; Meij, L. van der: Matching multilingual subject vocabularies (2009) 0.00
```
0.001155891 = product of:
  0.005779455 = sum of:
    0.005779455 = weight(_text_:a in 3035) [ClassicSimilarity], result of:
      0.005779455 = score(doc=3035,freq=4.0), product of:
        0.053464882 = queryWeight, product of:
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046368346 = queryNorm
        0.10809815 = fieldWeight in 3035, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.153047 = idf(docFreq=37942, maxDocs=44218)
          0.046875 = fieldNorm(doc=3035)
  0.2 = coord(1/5)
```
Abstract

Most libraries and other cultural heritage institutions use controlled knowledge organisation systems, such as thesauri, to describe their collections. Unfortunately, as most of these institutions use different such systems, united access to heterogeneous collections is difficult. Things are even worse in an international context when concepts have labels in different languages. In order to overcome the multilingual interoperability problem between European Libraries, extensive work has been done to manually map concepts from different knowledge organisation systems, which is a tedious and expensive process. Within the TELplus project, we developed and evaluated methods to automatically discover these mappings, using different ontology matching techniques. In experiments on major French, English and German subject heading lists Rameau, LCSH and SWD, we show that we can automatically produce mappings of surprisingly good quality, even when using relatively naive translation and matching methods.

Search (13 results, page 1 of 1)

Authors

Years

Types

Themes