Search (30 results, page 1 of 2)

Salton, G.; Buckley, C.; Allan, J.: Automatic structuring of text files (1992) 0.08

0.08050447 = product of:
  0.1207567 = sum of:
    0.06780831 = weight(_text_:electronic in 6507) [ClassicSimilarity], result of:
      0.06780831 = score(doc=6507,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.34555468 = fieldWeight in 6507, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.0625 = fieldNorm(doc=6507)
    0.052948397 = product of:
      0.10589679 = sum of:
        0.10589679 = weight(_text_:publishing in 6507) [ClassicSimilarity], result of:
          0.10589679 = score(doc=6507,freq=2.0), product of:
            0.24522576 = queryWeight, product of:
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.05019314 = queryNorm
            0.4318339 = fieldWeight in 6507, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.0625 = fieldNorm(doc=6507)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Source: Electronic publishing. 5(1992) no.1, S.1-17

Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.07

0.06679192 = product of:
  0.20037577 = sum of:
    0.20037577 = sum of:
      0.132371 = weight(_text_:publishing in 2759) [ClassicSimilarity], result of:
        0.132371 = score(doc=2759,freq=2.0), product of:
          0.24522576 = queryWeight, product of:
            4.885643 = idf(docFreq=907, maxDocs=44218)
            0.05019314 = queryNorm
          0.53979236 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.885643 = idf(docFreq=907, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
      0.06800478 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
        0.06800478 = score(doc=2759,freq=2.0), product of:
          0.17576782 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.05019314 = queryNorm
          0.38690117 = fieldWeight in 2759, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=2759)
  0.33333334 = coord(1/3)

Date: 1. 2.2016 18:25:22
Imprint: Basel : Springer International Publishing

Ward, M.L.: ¬The future of the human indexer (1996) 0.05

0.04750511 = product of:
  0.071257666 = sum of:
    0.050856233 = weight(_text_:electronic in 7244) [ClassicSimilarity], result of:
      0.050856233 = score(doc=7244,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.259166 = fieldWeight in 7244, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.046875 = fieldNorm(doc=7244)
    0.020401431 = product of:
      0.040802862 = sum of:
        0.040802862 = weight(_text_:22 in 7244) [ClassicSimilarity], result of:
          0.040802862 = score(doc=7244,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.23214069 = fieldWeight in 7244, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=7244)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)

Abstract: Considers the principles of indexing and the intellectual skills involved in order to determine what automatic indexing systems would be required in order to supplant or complement the human indexer. Good indexing requires: considerable prior knowledge of the literature; judgement as to what to index and what depth to index; reading skills; abstracting skills; and classification skills, Illustrates these features with a detailed description of abstracting and indexing processes involved in generating entries for the mechanical engineering database POWERLINK. Briefly assesses the possibility of replacing human indexers with specialist indexing software, with particular reference to the Object Analyzer from the InTEXT automatic indexing system and using the criteria described for human indexers. At present, it is unlikely that the automatic indexer will replace the human indexer, but when more primary texts are available in electronic form, it may be a useful productivity tool for dealing with large quantities of low grade texts (should they be wanted in the database)
Date: 9. 2.1997 18:44:22

Thiel, T.J.: Automated indexing of information stored on optical disk electronic document image management systems (1994) 0.04

0.03955485 = product of:
  0.11866455 = sum of:
    0.11866455 = weight(_text_:electronic in 1260) [ClassicSimilarity], result of:
      0.11866455 = score(doc=1260,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.6047207 = fieldWeight in 1260, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.109375 = fieldNorm(doc=1260)
  0.33333334 = coord(1/3)

Silvester, J.P.; Genuardi, M.T.: Machine-aided indexing from the analysis of natural language text (1994) 0.03

0.033904158 = product of:
  0.101712465 = sum of:
    0.101712465 = weight(_text_:electronic in 2989) [ClassicSimilarity], result of:
      0.101712465 = score(doc=2989,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.518332 = fieldWeight in 2989, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.09375 = fieldNorm(doc=2989)
  0.33333334 = coord(1/3)

Source: Challenges in indexing electronic text and images. Ed.: R. Fidel et al

Smith, P.J.; Normore, L.F.; Denning, R.; Johnson, W.P.: Computerized tools to support document analysis (1994) 0.03

0.033904158 = product of:
  0.101712465 = sum of:
    0.101712465 = weight(_text_:electronic in 2990) [ClassicSimilarity], result of:
      0.101712465 = score(doc=2990,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.518332 = fieldWeight in 2990, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.09375 = fieldNorm(doc=2990)
  0.33333334 = coord(1/3)

Source: Challenges in indexing electronic text and images. Ed.: R. Fidel et al

Harman, D.: Automatic indexing (1994) 0.02

0.02260277 = product of:
  0.06780831 = sum of:
    0.06780831 = weight(_text_:electronic in 7729) [ClassicSimilarity], result of:
      0.06780831 = score(doc=7729,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.34555468 = fieldWeight in 7729, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.0625 = fieldNorm(doc=7729)
  0.33333334 = coord(1/3)

Source: Challenges in indexing electronic text and images. Ed.: R. Fidel et al

Renouf, A.: Sticking to the text : a corpus linguist's view of language (1993) 0.02
```
0.019777425 = product of:
  0.059332274 = sum of:
    0.059332274 = weight(_text_:electronic in 2314) [ClassicSimilarity], result of:
      0.059332274 = score(doc=2314,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.30236036 = fieldWeight in 2314, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2314)
  0.33333334 = coord(1/3)
```
Abstract

Corpus linguistics is the study of large, computer held bodies of text. Some corpus linguists are concerned with language descriptions for its own sake. On the corpus-linguistic continuum, the study of raw ASCII text is situated at one end, and the study of heavily pre-coded text at the other. Discusses the use of word frequency to identify changes in the lexicon; word repetition and word positioning in automatic abstracting and word clusters in automatic text retrieval. Compares the machine extract with manual abstracts. Abstractors and indexers may find themselves taking the original wording of the text more into account as the focus moves towards the electronic medium and away from the hard copy
Shafer, K.: Scorpion Project explores using Dewey to organize the Web (1996) 0.02
```
0.019777425 = product of:
  0.059332274 = sum of:
    0.059332274 = weight(_text_:electronic in 6750) [ClassicSimilarity], result of:
      0.059332274 = score(doc=6750,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.30236036 = fieldWeight in 6750, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6750)
  0.33333334 = coord(1/3)
```
Abstract

As the amount of accessible information on the WWW increases, so will the cost of accessing it, even if search servcies remain free, due to the increasing amount of time users will have to spend to find needed items. Considers what the seemingly unorganized Web and the organized world of libraries can offer each other. The OCLC Scorpion Project is attempting to combine indexing and cataloguing, specifically focusing on building tools for automatic subject recognition using the technqiues of library science and information retrieval. If subject headings or concept domains can be automatically assigned to electronic items, improved filtering tools for searching can be produced

Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02

0.018134607 = product of:
  0.05440382 = sum of:
    0.05440382 = product of:
      0.10880764 = sum of:
        0.10880764 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
          0.10880764 = score(doc=402,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.61904186 = fieldWeight in 402, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=402)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information processing and management. 22(1986) no.6, S.465-476

Dow Jones unveils knowledge indexing system (1997) 0.02

0.017649466 = product of:
  0.052948397 = sum of:
    0.052948397 = product of:
      0.10589679 = sum of:
        0.10589679 = weight(_text_:publishing in 751) [ClassicSimilarity], result of:
          0.10589679 = score(doc=751,freq=2.0), product of:
            0.24522576 = queryWeight, product of:
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.05019314 = queryNorm
            0.4318339 = fieldWeight in 751, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.0625 = fieldNorm(doc=751)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Abstract: Dow Jones Interactive Publishing has developed a sophisticated automatic knowledge indexing system that will allow searchers of the Dow Jones News / Retrieval service to get highly targeted results from a search in the service's Publications Library. Instead of relying on a thesaurus of company names, the new system uses a combination of that basic algorithm plus unique rules based on the editorial styles of individual publications in the Library. Dow Jones have also announced its acceptance of the definitions of 'selected full text' and 'full text' from Bibliodata's Fulltext Sources Online directory

Galvez, C.; Moya-Anegón, F. de: ¬An evaluation of conflation accuracy using finite-state transducers (2006) 0.02
```
0.016952079 = product of:
  0.050856233 = sum of:
    0.050856233 = weight(_text_:electronic in 5599) [ClassicSimilarity], result of:
      0.050856233 = score(doc=5599,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.259166 = fieldWeight in 5599, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.046875 = fieldNorm(doc=5599)
  0.33333334 = coord(1/3)
```
Abstract

Purpose - To evaluate the accuracy of conflation methods based on finite-state transducers (FSTs). Design/methodology/approach - Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm. Findings - The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms. Originality/value - The report outlines the potential of transducers in their application to normalization processes.

Hlava, M.M.K.: Automatic indexing : comparing rule-based and statistics-based indexing systems (2005) 0.02

0.01586778 = product of:
  0.047603343 = sum of:
    0.047603343 = product of:
      0.095206685 = sum of:
        0.095206685 = weight(_text_:22 in 6265) [ClassicSimilarity], result of:
          0.095206685 = score(doc=6265,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.5416616 = fieldWeight in 6265, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=6265)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information outlook. 9(2005) no.8, S.22-23

Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.01
```
0.014126732 = product of:
  0.042380195 = sum of:
    0.042380195 = weight(_text_:electronic in 3151) [ClassicSimilarity], result of:
      0.042380195 = score(doc=3151,freq=2.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.21597168 = fieldWeight in 3151, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3151)
  0.33333334 = coord(1/3)
```
Abstract

Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.
Rasmussen, E.M.: Indexing and retrieval for the Web (2002) 0.01
```
0.013984751 = product of:
  0.041954253 = sum of:
    0.041954253 = weight(_text_:electronic in 4285) [ClassicSimilarity], result of:
      0.041954253 = score(doc=4285,freq=4.0), product of:
        0.19623034 = queryWeight, product of:
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.05019314 = queryNorm
        0.21380106 = fieldWeight in 4285, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.9095051 = idf(docFreq=2409, maxDocs=44218)
          0.02734375 = fieldNorm(doc=4285)
  0.33333334 = coord(1/3)
```
Abstract

The introduction and growth of the World Wide Web (WWW, or Web) have resulted in a profound change in the way individuals and organizations access information. In terms of volume, nature, and accessibility, the characteristics of electronic information are significantly different from those of even five or six years ago. Control of, and access to, this flood of information rely heavily an automated techniques for indexing and retrieval. According to Gudivada, Raghavan, Grosky, and Kasanagottu (1997, p. 58), "The ability to search and retrieve information from the Web efficiently and effectively is an enabling technology for realizing its full potential." Almost 93 percent of those surveyed consider the Web an "indispensable" Internet technology, second only to e-mail (Graphie, Visualization & Usability Center, 1998). Although there are other ways of locating information an the Web (browsing or following directory structures), 85 percent of users identify Web pages by means of a search engine (Graphie, Visualization & Usability Center, 1998). A more recent study conducted by the Stanford Institute for the Quantitative Study of Society confirms the finding that searching for information is second only to e-mail as an Internet activity (Nie & Ebring, 2000, online). In fact, Nie and Ebring conclude, "... the Internet today is a giant public library with a decidedly commercial tilt. The most widespread use of the Internet today is as an information search utility for products, travel, hobbies, and general information. Virtually all users interviewed responded that they engaged in one or more of these information gathering activities."
Techniques for automated indexing and information retrieval (IR) have been developed, tested, and refined over the past 40 years, and are well documented (see, for example, Agosti & Smeaton, 1996; BaezaYates & Ribeiro-Neto, 1999a; Frakes & Baeza-Yates, 1992; Korfhage, 1997; Salton, 1989; Witten, Moffat, & Bell, 1999). With the introduction of the Web, and the capability to index and retrieve via search engines, these techniques have been extended to a new environment. They have been adopted, altered, and in some Gases extended to include new methods. "In short, search engines are indispensable for searching the Web, they employ a variety of relatively advanced IR techniques, and there are some peculiar aspects of search engines that make searching the Web different than more conventional information retrieval" (Gordon & Pathak, 1999, p. 145). The environment for information retrieval an the World Wide Web differs from that of "conventional" information retrieval in a number of fundamental ways. The collection is very large and changes continuously, with pages being added, deleted, and altered. Wide variability between the size, structure, focus, quality, and usefulness of documents makes Web documents much more heterogeneous than a typical electronic document collection. The wide variety of document types includes images, video, audio, and scripts, as well as many different document languages. Duplication of documents and sites is common. Documents are interconnected through networks of hyperlinks. Because of the size and dynamic nature of the Web, preprocessing all documents requires considerable resources and is often not feasible, certainly not an the frequent basis required to ensure currency. Query length is usually much shorter than in other environments-only a few words-and user behavior differs from that in other environments. These differences make the Web a novel environment for information retrieval (Baeza-Yates & Ribeiro-Neto, 1999b; Bharat & Henzinger, 1998; Huang, 2000).
Munkelt, J.; Schaer, P.; Lepsky, K.: Towards an IR test collection for the German National Library (2018) 0.01
```
0.0132371 = product of:
  0.0397113 = sum of:
    0.0397113 = product of:
      0.0794226 = sum of:
        0.0794226 = weight(_text_:publishing in 4311) [ClassicSimilarity], result of:
          0.0794226 = score(doc=4311,freq=2.0), product of:
            0.24522576 = queryWeight, product of:
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.05019314 = queryNorm
            0.32387543 = fieldWeight in 4311, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.046875 = fieldNorm(doc=4311)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Automatic content indexing is one of the innovations that are increasingly changing the way libraries work. In theory, it promises a cataloguing service that would hardly be possible with humans in terms of speed, quantity and maybe quality. The German National Library (DNB) has also recognised this potential and is increasingly relying on the automatic indexing of their catalogue content. The DNB took a major step in this direction in 2017, which was announced in two papers. The announcement was rather restrained, but the content of the papers is all the more explosive for the library community: Since September 2017, the DNB has discontinued the intellectual indexing of series Band H and has switched to an automatic process for these series. The subject indexing of online publications (series O) has been purely automatical since 2010; from September 2017, monographs and periodicals published outside the publishing industry and university publications will no longer be indexed by people. This raises the question: What is the quality of the automatic indexing compared to the manual work or in other words to which degree can the automatic indexing replace people without a signi cant drop in regards to quality?

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.01

0.011334131 = product of:
  0.03400239 = sum of:
    0.03400239 = product of:
      0.06800478 = sum of:
        0.06800478 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
          0.06800478 = score(doc=1952,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.38690117 = fieldWeight in 1952, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=1952)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 16. 8.1998 12:51:22

Kutschekmanesch, S.; Lutes, B.; Moelle, K.; Thiel, U.; Tzeras, K.: Automated multilingual indexing : a synthesis of rule-based and thesaurus-based methods (1998) 0.01

0.011334131 = product of:
  0.03400239 = sum of:
    0.03400239 = product of:
      0.06800478 = sum of:
        0.06800478 = weight(_text_:22 in 4157) [ClassicSimilarity], result of:
          0.06800478 = score(doc=4157,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.38690117 = fieldWeight in 4157, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=4157)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Source: Information und Märkte: 50. Deutscher Dokumentartag 1998, Kongreß der Deutschen Gesellschaft für Dokumentation e.V. (DGD), Rheinische Friedrich-Wilhelms-Universität Bonn, 22.-24. September 1998. Hrsg. von Marlies Ockenfeld u. Gerhard J. Mantwill

Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.01
```
0.011030916 = product of:
  0.03309275 = sum of:
    0.03309275 = product of:
      0.0661855 = sum of:
        0.0661855 = weight(_text_:publishing in 3667) [ClassicSimilarity], result of:
          0.0661855 = score(doc=3667,freq=2.0), product of:
            0.24522576 = queryWeight, product of:
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.05019314 = queryNorm
            0.26989618 = fieldWeight in 3667, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.885643 = idf(docFreq=907, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3667)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.

Tsujii, J.-I.: Automatic acquisition of semantic collocation from corpora (1995) 0.01

0.0090673035 = product of:
  0.02720191 = sum of:
    0.02720191 = product of:
      0.05440382 = sum of:
        0.05440382 = weight(_text_:22 in 4709) [ClassicSimilarity], result of:
          0.05440382 = score(doc=4709,freq=2.0), product of:
            0.17576782 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.05019314 = queryNorm
            0.30952093 = fieldWeight in 4709, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=4709)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 31. 7.1996 9:22:19

Search (30 results, page 1 of 2)

Authors

Years

Themes