Search (5 results, page 1 of 1)

  • × theme_ss:"Metadaten"
  • × theme_ss:"Automatisches Indexieren"
  1. Wolfekuhler, M.R.; Punch, W.F.: Finding salient features for personal Web pages categories (1997) 0.02
    0.019223861 = product of:
      0.05767158 = sum of:
        0.010478153 = weight(_text_:of in 2673) [ClassicSimilarity], result of:
          0.010478153 = score(doc=2673,freq=4.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.17103596 = fieldWeight in 2673, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.028615767 = weight(_text_:systems in 2673) [ClassicSimilarity], result of:
          0.028615767 = score(doc=2673,freq=2.0), product of:
            0.12039685 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.03917671 = queryNorm
            0.23767869 = fieldWeight in 2673, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2673)
        0.018577661 = product of:
          0.037155323 = sum of:
            0.037155323 = weight(_text_:22 in 2673) [ClassicSimilarity], result of:
              0.037155323 = score(doc=2673,freq=2.0), product of:
                0.13719016 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03917671 = queryNorm
                0.2708308 = fieldWeight in 2673, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2673)
          0.5 = coord(1/2)
      0.33333334 = coord(3/9)
    
    Abstract
    Examines techniques that discover features in sets of pre-categorized documents, such that similar documents can be found on the WWW. Examines techniques which will classifiy training examples with high accuracy, then explains why this is not necessarily useful. Describes a method for extracting word clusters from the raw document features. Results show that the clustering technique is successful in discovering word groups in personal Web pages which can be used to find similar information on the WWW
    Date
    1. 8.1996 22:08:06
    Footnote
    Contribution to a special issue of papers from the 6th International World Wide Web conference, held 7-11 Apr 1997, Santa Clara, California
    Source
    Computer networks and ISDN systems. 29(1997) no.8, S.1147-1156
  2. Yang, T.-H.; Hsieh, Y.-L.; Liu, S.-H.; Chang, Y.-C.; Hsu, W.-L.: ¬A flexible template generation and matching method with applications for publication reference metadata extraction (2021) 0.01
    0.0113586085 = product of:
      0.05111374 = sum of:
        0.041947264 = weight(_text_:applications in 63) [ClassicSimilarity], result of:
          0.041947264 = score(doc=63,freq=2.0), product of:
            0.17247584 = queryWeight, product of:
              4.4025097 = idf(docFreq=1471, maxDocs=44218)
              0.03917671 = queryNorm
            0.2432066 = fieldWeight in 63, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.4025097 = idf(docFreq=1471, maxDocs=44218)
              0.0390625 = fieldNorm(doc=63)
        0.009166474 = weight(_text_:of in 63) [ClassicSimilarity], result of:
          0.009166474 = score(doc=63,freq=6.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.1496253 = fieldWeight in 63, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=63)
      0.22222222 = coord(2/9)
    
    Abstract
    Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.
    Source
    Journal of the Association for Information Science and Technology. 72(2021) no.1, S.32-45
  3. Husevag, A.-S.R.: Named entities in indexing : a case study of TV subtitles and metadata records (2016) 0.01
    0.007171934 = product of:
      0.032273702 = sum of:
        0.011833867 = weight(_text_:of in 3105) [ClassicSimilarity], result of:
          0.011833867 = score(doc=3105,freq=10.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.19316542 = fieldWeight in 3105, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3105)
        0.020439833 = weight(_text_:systems in 3105) [ClassicSimilarity], result of:
          0.020439833 = score(doc=3105,freq=2.0), product of:
            0.12039685 = queryWeight, product of:
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.03917671 = queryNorm
            0.1697705 = fieldWeight in 3105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0731742 = idf(docFreq=5561, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3105)
      0.22222222 = coord(2/9)
    
    Abstract
    This paper explores the possible role of named entities in an automatic index-ing process, based on text in subtitles. This is done by analyzing entity types, name den-sity and name frequencies in subtitles and metadata records from different TV programs. The name density in metadata records is much higher than the name density in subtitles, and named entities with high frequencies in the subtitles are more likely to be mentioned in the metadata records. Personal names, geographical names and names of organizations where the most prominent entity types in both the news subtitles and news metadata, while persons, works and locations are the most prominent in culture programs.
    Source
    Proceedings of the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016) co-located with the 20th International Conference on Theory and Practice of Digital Libraries 2016 (TPDL 2016), Hannover, Germany, September 9, 2016. Edi. by Philipp Mayr et al. [http://ceur-ws.org/Vol-1676/=urn:nbn:de:0074-1676-5]
  4. Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.00
    0.0020165213 = product of:
      0.018148692 = sum of:
        0.018148692 = weight(_text_:of in 5236) [ClassicSimilarity], result of:
          0.018148692 = score(doc=5236,freq=12.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.29624295 = fieldWeight in 5236, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5236)
      0.11111111 = coord(1/9)
    
    Abstract
    The Black Book Interactive Project at the University of Kansas (KU) is developing an expanded corpus of novels by African American authors, with an emphasis on lesser known writers and a goal of expanding research in this field. Using a custom metadata schema with an emphasis on race-related elements, each novel is analyzed for a variety of elements such as literary style, targeted content analysis, historical context, and other areas. Librarians at KU have worked to develop a variety of computational text analysis processes designed to assist with specific aspects of this metadata collection, including text mining and natural language processing, automated subject extraction based on word sense disambiguation, harvesting data from Wikidata, and other actions.
  5. Strobel, S.; Marín-Arraiza, P.: Metadata for scientific audiovisual media : current practices and perspectives of the TIB / AV-portal (2015) 0.00
    0.0016631988 = product of:
      0.014968789 = sum of:
        0.014968789 = weight(_text_:of in 3667) [ClassicSimilarity], result of:
          0.014968789 = score(doc=3667,freq=16.0), product of:
            0.061262865 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03917671 = queryNorm
            0.24433708 = fieldWeight in 3667, product of:
              4.0 = tf(freq=16.0), with freq of:
                16.0 = termFreq=16.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3667)
      0.11111111 = coord(1/9)
    
    Abstract
    Descriptive metadata play a key role in finding relevant search results in large amounts of unstructured data. However, current scientific audiovisual media are provided with little metadata, which makes them hard to find, let alone individual sequences. In this paper, the TIB / AV-Portal is presented as a use case where methods concerning the automatic generation of metadata, a semantic search and cross-lingual retrieval (German/English) have already been applied. These methods result in a better discoverability of the scientific audiovisual media hosted in the portal. Text, speech, and image content of the video are automatically indexed by specialised GND (Gemeinsame Normdatei) subject headings. A semantic search is established based on properties of the GND ontology. The cross-lingual retrieval uses English 'translations' that were derived by an ontology mapping (DBpedia i. a.). Further ways of increasing the discoverability and reuse of the metadata are publishing them as Linked Open Data and interlinking them with other data sets.