Search (413 results, page 21 of 21)

  • × theme_ss:"Retrievalstudien"
  1. Voorbij, H.: Title keywords and subject descriptors : a comparison of subject search entries of books in the humanities and social sciences (1998) 0.00
    0.0028845975 = product of:
      0.008653793 = sum of:
        0.008653793 = product of:
          0.025961377 = sum of:
            0.025961377 = weight(_text_:online in 4721) [ClassicSimilarity], result of:
              0.025961377 = score(doc=4721,freq=2.0), product of:
                0.1548489 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16765618 = fieldWeight in 4721, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4721)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    In order to compare the value of subject descriptors and title keywords as entries to subject searches, two studies were carried out. Both studies concentrated on monographs in the humanities and social sciences, held by the online public access catalogue of the National Library of the Netherlands. In the first study, a comparison was made by subject librarians between the subject descriptors and the title keywords of 475 records. They could express their opinion on a scale from 1 (descriptor is exactly or almost the same as word in title) to 7 (descriptor does not appear in title at all). It was concluded that 37 per cent of the records are considerably enhanced by a subject descriptor, and 49 per cent slightly or considerably enhanced. In the second study, subject librarians performed subject searches using title keywords and subject descriptors on the same topic. The relative recall amounted to 48 per cent and 86 per cent respectively. Failure analysis revealed the reasons why so many records that were found by subject descriptors were not found by title keywords. First, although completely meaningless titles hardly ever appear, the title of a publication does not always offer sufficient clues for title keyword searching. In those cases, descriptors may enhance the record of a publication. A second and even more important task of subject descriptors is controlling the vocabulary. Many relevant titles cannot be retrieved by title keyword searching because of the wide diversity of ways of expressing a topic. Descriptors take away the burden of vocabulary control from the user.
  2. Park, S.: Usability, user preferences, effectiveness, and user behaviors when searching individual and integrated full-text databases : implications for digital libraries (2000) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 4591) [ClassicSimilarity], result of:
              0.025790809 = score(doc=4591,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 4591, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4591)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    This article addresses a crucial issue in the digital library environment: how to support effective interaction of users with heterogeneous and distributed information resources. In particular, this study compared usability, user preference, effectiveness, and searching behaviors in systems that implement interaction with multiple databases as if they were one (integrated interaction) in a experiment in the TREC environment. 28 volunteers were recruited from the graduate students of the School of Communication, Information & Library Studies at Rutgers University. Significantly more subjects preferred the common interface to the integrated interface, mainly because they could have more control over database selection. Subjects were also more satisfied with the results from the common interface, and performed better with the common interface than with the integrated interface. Overall, it appears that for this population, interacting with databases through a common interface is preferable on all grounds to interacting with databases through an integrated interface. These results suggest that: (1) the general assumption of the information retrieval (IR) literature that an integrated interaction is best needs to be revisited; (2) it is important to allow for more user control in the distributed environment; (3) for digital library purposes, it is important to characterize different databases to support user choice for integration; and (4) certain users prefer control over database selection while still opting for results to be merged
  3. Boros, E.; Kantor, P.B.; Neu, D.J.: Pheromonic representation of user quests by digital structures (1999) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 6684) [ClassicSimilarity], result of:
              0.025790809 = score(doc=6684,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 6684, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6684)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    In a novel approach to information finding in networked environments, each user's specific purpose or "quest" can be represented in numerous ways. The most familiar is a list of keywords, or a natural language sentence or paragraph. More effective is an extended text that has been judged as to relevance. This forms the basis of relevance feedback, as it is used in information retrieval. In the "Ant World" project (Ant World, 1999; Kantor et al., 1999b; Kantor et al., 1999a), the items to be retrieved are not documents, but rather quests, represented by entire collections of judged documents. In order to save space and time we have developed methods for representing these complex entities in a short string of about 1,000 bytes, which we call a "Digital Information Pheromone" (DIP). The principles for determining the DIP for a given quest, and for matching DIPs to each other are presented. The effectiveness of this scheme is explored with some applications to the large judged collections of TREC documents
  4. Debole, F.; Sebastiani, F.: ¬An analysis of the relative hardness of Reuters-21578 subsets (2005) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 3456) [ClassicSimilarity], result of:
              0.025790809 = score(doc=3456,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 3456, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3456)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    The existence, public availability, and widespread acceptance of a standard benchmark for a given information retrieval (IR) task are beneficial to research an this task, because they allow different researchers to experimentally compare their own systems by comparing the results they have obtained an this benchmark. The Reuters-21578 test collection, together with its earlier variants, has been such a standard benchmark for the text categorization (TC) task throughout the last 10 years. However, the benefits that this has brought about have somehow been limited by the fact that different researchers have "carved" different subsets out of this collection and tested their systems an one of these subsets only; systems that have been tested an different Reuters-21578 subsets are thus not readily comparable. In this article, we present a systematic, comparative experimental study of the three subsets of Reuters-21578 that have been most popular among TC researchers. The results we obtain allow us to determine the relative hardness of these subsets, thus establishing an indirect means for comparing TC systems that have, or will be, tested an these different subsets.
  5. Hansen, P.; Karlgren, J.: Effects of foreign language and task scenario on relevance assessment (2005) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 4393) [ClassicSimilarity], result of:
              0.025790809 = score(doc=4393,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 4393, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4393)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - This paper aims to investigate how readers assess relevance of retrieved documents in a foreign language they know well compared with their native language, and whether work-task scenario descriptions have effect on the assessment process. Design/methodology/approach - Queries, test collections, and relevance assessments were used from the 2002 Interactive CLEF. Swedish first-language speakers, fluent in English, were given simulated information-seeking scenarios and presented with retrieval results in both languages. Twenty-eight subjects in four groups were asked to rate the retrieved text documents by relevance. A two-level work-task scenario description framework was developed and applied to facilitate the study of context effects on the assessment process. Findings - Relevance assessment takes longer in a foreign language than in the user first language. The quality of assessments by comparison with pre-assessed results is inferior to those made in the users' first language. Work-task scenario descriptions had an effect on the assessment process, both by measured access time and by self-report by subjects. However, effects on results by traditional relevance ranking were detectable. This may be an argument for extending the traditional IR experimental topical relevance measures to cater for context effects. Originality/value - An extended two-level work-task scenario description framework was developed and applied. Contextual aspects had an effect on the relevance assessment process. English texts took longer to assess than Swedish and were assessed less well, especially for the most difficult queries. The IR research field needs to close this gap and to design information access systems with users' language competence in mind.
  6. Mettrop, W.; Nieuwenhuysen, P.: Internet search engines : fluctuations in document accessibility (2001) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 4481) [ClassicSimilarity], result of:
              0.025790809 = score(doc=4481,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 4481, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4481)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    An empirical investigation of the consistency of retrieval through Internet search engines is reported. Thirteen engines are evaluated: AltaVista, EuroFerret, Excite, HotBot, InfoSeek, Lycos, MSN, NorthernLight, Snap, WebCrawler and three national Dutch engines: Ilse, Search.nl and Vindex. The focus is on a characteristics related to size: the degree of consistency to which an engine retrieves documents. Does an engine always present the same relevant documents that are, or were, available in its databases? We observed and identified three types of fluctuations in the result sets of several kinds of searches, many of them significant. These should be taken into account by users who apply an Internet search engine, for instance to retrieve as many relevant documents as possible, or to retrieve a document that was already found in a previous search, or to perform scientometric/bibliometric measurements. The fluctuations should also be considered as a complication of other research on the behaviour and performance of Internet search engines. In conclusion: in view of the increasing importance of the Internet as a publication/communication medium, the fluctuations in the result sets of Internet search engines can no longer be neglected.
  7. López-Pujalte, C.; Guerrero-Bote, V.P.; Moya-Anegón, F. de: Order-based fitness functions for genetic algorithms applied to relevance feedback (2003) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 5154) [ClassicSimilarity], result of:
              0.025790809 = score(doc=5154,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 5154, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5154)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Lopez-Pujalte and Guerrero-Bote test a relevance feedback genetic algorithm while varying its order based fitness functions and generating a function based upon the Ide dec-hi method as a base line. Using the non-zero weighted term types assigned to the query, and to the initially retrieved set of documents, as genes, a chromosome of equal length is created for each. The algorithm is provided with the chromosomes for judged relevant documents, for judged irrelevant documents, and for the irrelevant documents with their terms negated. The algorithm uses random selection of all possible genes, but gives greater likelihood to those with higher fitness values. When the fittest chromosome of a previous population is eliminated it is restored while the least fittest of the new population is eliminated in its stead. A crossover probability of .8 and a mutation probability of .2 were used with 20 generations. Three fitness functions were utilized; the Horng and Yeh function which takes into account the position of relevant documents, and two new functions, one based on accumulating the cosine similarity for retrieved documents, the other on stored fixed-recall-interval precessions. The Cranfield collection was used with the first 15 documents retrieved from 33 queries chosen to have at least 3 relevant documents in the first 15 and at least 5 relevant documents not initially retrieved. Precision was calculated at fixed recall levels using the residual collection method which removes viewed documents. One of the three functions improved the original retrieval by127 percent, while the Ide dec-hi method provided a 120 percent improvement.
  8. Kekäläinen, J.; Järvelin, K.: Using graded relevance assessments in IR evaluation (2002) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 5225) [ClassicSimilarity], result of:
              0.025790809 = score(doc=5225,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 5225, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5225)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Kekalainen and Jarvelin use what they term generalized, nonbinary recall and precision measures where recall is the sum of the relevance scores of the retrieved documents divided by the sum of relevance scores of all documents in the data base, and precision is the sum of the relevance scores of the retrieved documents divided by the number of documents where the relevance scores are real numbers between zero and one. Using the In-Query system and a text data base of 53,893 newspaper articles with 30 queries selected from those for which four relevance categories to provide recall measures were available, search results were evaluated by four judges. Searches were done by average key term weight, Boolean expression, and by average term weight where the terms are grouped by a synonym operator, and for each case with and without expansion of the original terms. Use of higher standards of relevance appears to increase the superiority of the best method. Some methods do a better job of getting the highly relevant documents but do not increase retrieval of marginal ones. There is evidence that generalized precision provides more equitable results, while binary precision provides undeserved merit to some methods. Generally graded relevance measures seem to provide additional insight into IR evaluation.
  9. Ruthven, I.; Baillie, M.; Elsweiler, D.: ¬The relative effects of knowledge, interest and confidence in assessing relevance (2007) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 835) [ClassicSimilarity], result of:
              0.025790809 = score(doc=835,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 835, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=835)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose - The purpose of this paper is to examine how different aspects of an assessor's context, in particular their knowledge of a search topic, their interest in the search topic and their confidence in assessing relevance for a topic, affect the relevance judgements made and the assessor's ability to predict which documents they will assess as being relevant. Design/methodology/approach - The study was conducted as part of the Text REtrieval Conference (TREC) HARD track. Using a specially constructed questionnaire information was sought on TREC assessors' personal context and, using the TREC assessments gathered, the responses were correlated to the questionnaire questions and the final relevance decisions. Findings - This study found that each of the three factors (interest, knowledge and confidence) had an affect on how many documents were assessed as relevant and the balance between how many documents were marked as marginally or highly relevant. Also these factors are shown to affect an assessors' ability to predict what information they will finally mark as being relevant. Research limitations/implications - The major limitation is that the research is conducted within the TREC initiative. This means that we can report on results but cannot report on discussions with the assessors. The research implications are numerous but mainly on the effect of personal context on the outcomes of a user study. Practical implications - One major consequence is that we should take more account of how we construct search tasks for IIR evaluation to create tasks that are interesting and relevant to experimental subjects. Originality/value - Examining different search variables within one study to compare the relative effects on these variables on the search outcomes.
  10. Vegt, A. van der; Zuccon, G.; Koopman, B.: Do better search engines really equate to better clinical decisions? : If not, why not? (2021) 0.00
    0.0028656456 = product of:
      0.008596936 = sum of:
        0.008596936 = product of:
          0.025790809 = sum of:
            0.025790809 = weight(_text_:retrieval in 150) [ClassicSimilarity], result of:
              0.025790809 = score(doc=150,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.16710453 = fieldWeight in 150, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=150)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Previous research has found that improved search engine effectiveness-evaluated using a batch-style approach-does not always translate to significant improvements in user task performance; however, these prior studies focused on simple recall and precision-based search tasks. We investigated the same relationship, but for realistic, complex search tasks required in clinical decision making. One hundred and nine clinicians and final year medical students answered 16 clinical questions. Although the search engine did improve answer accuracy by 20 percentage points, there was no significant difference when participants used a more effective, state-of-the-art search engine. We also found that the search engine effectiveness difference, identified in the lab, was diminished by around 70% when the search engines were used with real users. Despite the aid of the search engine, half of the clinical questions were answered incorrectly. We further identified the relative contribution of search engine effectiveness to the overall end task success. We found that the ability to interpret documents correctly was a much more important factor impacting task success. If these findings are representative, information retrieval research may need to reorient its emphasis towards helping users to better understand information, rather than just finding it for them.
  11. Cooper, M.D.; Chen, H.-M.: Predicting the relevance of a library catalog search (2001) 0.00
    0.002307678 = product of:
      0.006923034 = sum of:
        0.006923034 = product of:
          0.0207691 = sum of:
            0.0207691 = weight(_text_:online in 6519) [ClassicSimilarity], result of:
              0.0207691 = score(doc=6519,freq=2.0), product of:
                0.1548489 = queryWeight, product of:
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.051022716 = queryNorm
                0.13412495 = fieldWeight in 6519, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0349014 = idf(docFreq=5778, maxDocs=44218)
                  0.03125 = fieldNorm(doc=6519)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Relevance has been a difficult concept to define, let alone measure. In this paper, a simple operational definition of relevance is proposed for a Web-based library catalog: whether or not during a search session the user saves, prints, mails, or downloads a citation. If one of those actions is performed, the session is considered relevant to the user. An analysis is presented illustrating the advantages and disadvantages of this definition. With this definition and good transaction logging, it is possible to ascertain the relevance of a session. This was done for 905,970 sessions conducted with the University of California's Melvyl online catalog. Next, a methodology was developed to try to predict the relevance of a session. A number of variables were defined that characterize a session, none of which used any demographic information about the user. The values of the variables were computed for the sessions. Principal components analysis was used to extract a new set of variables out of the original set. A stratified random sampling technique was used to form ten strata such that each new strata of 90,570 sessions contained the same proportion of relevant to nonrelevant sessions. Logistic regression was used to ascertain the regression coefficients for nine of the ten strata. Then, the coefficients were used to predict the relevance of the sessions in the missing strata. Overall, 17.85% of the sessions were determined to be relevant. The predicted number of relevant sessions for all ten strata was 11 %, a 6.85% difference. The authors believe that the methodology can be further refined and the prediction improved. This methodology could also have significant application in improving user searching and also in predicting electronic commerce buying decisions without the use of personal demographic data
  12. Bateman, J.: Modelling the importance of end-user relevance criteria (1999) 0.00
    0.0022925164 = product of:
      0.006877549 = sum of:
        0.006877549 = product of:
          0.020632647 = sum of:
            0.020632647 = weight(_text_:retrieval in 6606) [ClassicSimilarity], result of:
              0.020632647 = score(doc=6606,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.13368362 = fieldWeight in 6606, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03125 = fieldNorm(doc=6606)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    In most information retrieval research, the concept of relevance has been defined a priori as a single variable by the researcher and the meaning of relevance to the end-user who is making relevance judgments has not been taken into account. However, a number of criteria that users employ in making relevance judgments has been identified (Schamher, 1991; Barry, 1993). Understanding these criteria and their importance to end-users can help researchers better understand end-user evaluation behavior. This study reports end-users' ratings of the relative importance of 40 relevance criteria as used in their own information-seeking situations, and examines relationships between criteria that they rated most important. Data were collected from 210 graduate students who were instructed in a mail survey to rate 40 relevance criteria by importance in their selection of the most valuable information source for a recent or current paper or project. The criteria were selected from previous studies in which open-ended interviews were used to elicit criteria from end-users making judgments in their own information-seeking situations (Schamher, 1991; Su, 1992; Barry, 1993). A model of relevance with three constructs that contribute to the concept of relevance was proposed using the eleven criteria that survey respondents rated as most important (75 or more on a scale of 0 to 100). The development of this model was guided by similarities in criteria and criteria groupings from previous research (Barry & Schamher, 1998). Confirmatory factor analysis was used to confirm the model and verify that the constructs would produce reliable subscale scores. The three constructs are information quality, information credibility, and information completeness. Second-order factor analysis indicated that these constructs explain 48% of positive relevance judgments for these respondents. Three additional constructs, information availability, information topicality, and information currency are also suggested. The constructs developed from this analysis are thought to underlie the concept of relevance for this group of users
  13. Leiva-Mederos, A.; Senso, J.A.; Hidalgo-Delgado, Y.; Hipola, P.: Working framework of semantic interoperability for CRIS with heterogeneous data sources (2017) 0.00
    0.0022925164 = product of:
      0.006877549 = sum of:
        0.006877549 = product of:
          0.020632647 = sum of:
            0.020632647 = weight(_text_:retrieval in 3706) [ClassicSimilarity], result of:
              0.020632647 = score(doc=3706,freq=2.0), product of:
                0.15433937 = queryWeight, product of:
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.051022716 = queryNorm
                0.13368362 = fieldWeight in 3706, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.024915 = idf(docFreq=5836, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3706)
          0.33333334 = coord(1/3)
      0.33333334 = coord(1/3)
    
    Abstract
    Purpose Information from Current Research Information Systems (CRIS) is stored in different formats, in platforms that are not compatible, or even in independent networks. It would be helpful to have a well-defined methodology to allow for management data processing from a single site, so as to take advantage of the capacity to link disperse data found in different systems, platforms, sources and/or formats. Based on functionalities and materials of the VLIR project, the purpose of this paper is to present a model that provides for interoperability by means of semantic alignment techniques and metadata crosswalks, and facilitates the fusion of information stored in diverse sources. Design/methodology/approach After reviewing the state of the art regarding the diverse mechanisms for achieving semantic interoperability, the paper analyzes the following: the specific coverage of the data sets (type of data, thematic coverage and geographic coverage); the technical specifications needed to retrieve and analyze a distribution of the data set (format, protocol, etc.); the conditions of re-utilization (copyright and licenses); and the "dimensions" included in the data set as well as the semantics of these dimensions (the syntax and the taxonomies of reference). The semantic interoperability framework here presented implements semantic alignment and metadata crosswalk to convert information from three different systems (ABCD, Moodle and DSpace) to integrate all the databases in a single RDF file. Findings The paper also includes an evaluation based on the comparison - by means of calculations of recall and precision - of the proposed model and identical consultations made on Open Archives Initiative and SQL, in order to estimate its efficiency. The results have been satisfactory enough, due to the fact that the semantic interoperability facilitates the exact retrieval of information. Originality/value The proposed model enhances management of the syntactic and semantic interoperability of the CRIS system designed. In a real setting of use it achieves very positive results.

Languages

Types

  • a 378
  • s 15
  • el 8
  • m 8
  • r 6
  • x 5
  • p 2
  • d 1
  • More… Less…