Search (110 results, page 1 of 6)

  • × theme_ss:"Metadaten"
  1. Heidorn, P.B.; Wei, Q.: Automatic metadata extraction from museum specimen labels (2008) 0.07
    0.07409342 = product of:
      0.14818683 = sum of:
        0.14818683 = sum of:
          0.113330126 = weight(_text_:learning in 2624) [ClassicSimilarity], result of:
            0.113330126 = score(doc=2624,freq=8.0), product of:
              0.22973695 = queryWeight, product of:
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.05145426 = queryNorm
              0.49330387 = fieldWeight in 2624, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2624)
          0.03485671 = weight(_text_:22 in 2624) [ClassicSimilarity], result of:
            0.03485671 = score(doc=2624,freq=2.0), product of:
              0.18018405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05145426 = queryNorm
              0.19345059 = fieldWeight in 2624, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=2624)
      0.5 = coord(1/2)
    
    Abstract
    This paper describes the information properties of museum specimen labels and machine learning tools to automatically extract Darwin Core (DwC) and other metadata from these labels processed through Optical Character Recognition (OCR). The DwC is a metadata profile describing the core set of access points for search and retrieval of natural history collections and observation databases. Using the HERBIS Learning System (HLS) we extract 74 independent elements from these labels. The automated text extraction tools are provided as a web service so that users can reference digital images of specimens and receive back an extended Darwin Core XML representation of the content of the label. This automated extraction task is made more difficult by the high variability of museum label formats, OCR errors and the open class nature of some elements. In this paper we introduce our overall system architecture, and variability robust solutions including, the application of Hidden Markov and Naïve Bayes machine learning models, data cleaning, use of field element identifiers, and specialist learning models. The techniques developed here could be adapted to any metadata extraction situation with noisy text and weakly ordered elements.
    Source
    Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas
  2. Lubas, R.L.; Wolfe, R.H.W.; Fleischman, M.: Creating metadata practices for MIT's OpenCourseWare Project (2004) 0.06
    0.06406524 = product of:
      0.12813048 = sum of:
        0.12813048 = sum of:
          0.07933109 = weight(_text_:learning in 2843) [ClassicSimilarity], result of:
            0.07933109 = score(doc=2843,freq=2.0), product of:
              0.22973695 = queryWeight, product of:
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.05145426 = queryNorm
              0.3453127 = fieldWeight in 2843, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2843)
          0.04879939 = weight(_text_:22 in 2843) [ClassicSimilarity], result of:
            0.04879939 = score(doc=2843,freq=2.0), product of:
              0.18018405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05145426 = queryNorm
              0.2708308 = fieldWeight in 2843, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2843)
      0.5 = coord(1/2)
    
    Abstract
    The MIT libraries were called upon to recommend a metadata scheme for the resources contained in MIT's OpenCourseWare (OCW) project. The resources in OCW needed descriptive, structural, and technical metadata. The SCORM standard, which uses IEEE Learning Object Metadata for its descriptive standard, was selected for its focus on educational objects. However, it was clear that the Libraries would need to recommend how the standard would be applied and adapted to accommodate needs that were not addressed in the standard's specifications. The newly formed MIT Libraries Metadata Unit adapted established practices from AACR2 and MARC traditions when facing situations in which there were no precedents to follow.
    Source
    Library hi tech. 22(2004) no.2, S.138-143
  3. Ilik, V.; Storlien, J.; Olivarez, J.: Metadata makeover (2014) 0.06
    0.06406524 = product of:
      0.12813048 = sum of:
        0.12813048 = sum of:
          0.07933109 = weight(_text_:learning in 2606) [ClassicSimilarity], result of:
            0.07933109 = score(doc=2606,freq=2.0), product of:
              0.22973695 = queryWeight, product of:
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.05145426 = queryNorm
              0.3453127 = fieldWeight in 2606, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2606)
          0.04879939 = weight(_text_:22 in 2606) [ClassicSimilarity], result of:
            0.04879939 = score(doc=2606,freq=2.0), product of:
              0.18018405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05145426 = queryNorm
              0.2708308 = fieldWeight in 2606, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2606)
      0.5 = coord(1/2)
    
    Abstract
    Catalogers have become fluent in information technology such as web design skills, HyperText Markup Language (HTML), Cascading Stylesheets (CSS), eXensible Markup Language (XML), and programming languages. The knowledge gained from learning information technology can be used to experiment with methods of transforming one metadata schema into another using various software solutions. This paper will discuss the use of eXtensible Stylesheet Language Transformations (XSLT) for repurposing, editing, and reformatting metadata. Catalogers have the requisite skills for working with any metadata schema, and if they are excluded from metadata work, libraries are wasting a valuable human resource.
    Date
    10. 9.2000 17:38:22
  4. Sutton, S.A.; Golder, D.: Achievement Standards Network (ASN) : an application profile for mapping K-12 educational resources to achievement (2008) 0.05
    0.054913066 = product of:
      0.10982613 = sum of:
        0.10982613 = sum of:
          0.06799808 = weight(_text_:learning in 2636) [ClassicSimilarity], result of:
            0.06799808 = score(doc=2636,freq=2.0), product of:
              0.22973695 = queryWeight, product of:
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.05145426 = queryNorm
              0.29598233 = fieldWeight in 2636, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.464877 = idf(docFreq=1382, maxDocs=44218)
                0.046875 = fieldNorm(doc=2636)
          0.04182805 = weight(_text_:22 in 2636) [ClassicSimilarity], result of:
            0.04182805 = score(doc=2636,freq=2.0), product of:
              0.18018405 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.05145426 = queryNorm
              0.23214069 = fieldWeight in 2636, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=2636)
      0.5 = coord(1/2)
    
    Abstract
    This paper describes metadata development of an application profile for the National Science Digital Library (NSDL) Achievement Standards Network (ASN) in the United States. The ASN is a national repository of machine-readable achievement standards modeled in RDF that shape teaching and learning in the various states. We describe the nature of the ASN metadata and the various uses to which that metadata is applied including the alignment of the standards of one state to those of another and the correlation of those standards to educational resources in support of resource discovery and retrieval.
    Source
    Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas
  5. Bueno-de-la-Fuente, G.; Hernández-Pérez, T.; Rodríguez-Mateos, D.; Méndez-Rodríguez, E.M.; Martín-Galán, B.: Study on the use of metadata for digital learning objects in University Institutional Repositories (MODERI) (2009) 0.04
    0.038012084 = product of:
      0.07602417 = sum of:
        0.07602417 = product of:
          0.15204833 = sum of:
            0.15204833 = weight(_text_:learning in 2981) [ClassicSimilarity], result of:
              0.15204833 = score(doc=2981,freq=10.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.6618366 = fieldWeight in 2981, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2981)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Metadata is a core issue for the creation of repositories. Different institutional repositories have chosen and use different metadata models, elements and values for describing the range of digital objects they store. Thus, this paper analyzes the current use of metadata describing those Learning Objects that some open higher educational institutions' repositories include in their collections. The goal of this work is to identify and analyze the different metadata models being used to describe educational features of those specific digital educational objects (such as audience, type of educational material, learning objectives, etc.). Also discussed is the concept and typology of Learning Objects (LO) through their use in University Repositories. We will also examine the usefulness of specifically describing those learning objects, setting them apart from other kind of documents included in the repository, mainly scholarly publications and research results of the Higher Education institution.
  6. Aulik, J.L.; Burt, H.A.; Gruby, E.; Morgan, A.; O'Halloran, C.: Online mentoring : a student experience at Dominican University (2002) 0.03
    0.034351375 = product of:
      0.06870275 = sum of:
        0.06870275 = product of:
          0.1374055 = sum of:
            0.1374055 = weight(_text_:learning in 5465) [ClassicSimilarity], result of:
              0.1374055 = score(doc=5465,freq=6.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.59809923 = fieldWeight in 5465, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5465)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper explores the online learning experience of seven students in the Graduate School of Library and Information Science at Dominican University. In a class entitled Metadata for Internet Resources, the students developed a distance learning relationship with professional catalogers. Student assignments included posting bibliographic records on the WebBoardTM for mentor input. In an online exchange, the mentors responded by posting their suggestions for improving student records. The interaction between students and mentors is discussed, as is the educational value of distance learning.
  7. Kamke, H.-U.; Zimmermann, K.: Metadaten und Online-Learning (2003) 0.03
    0.032054603 = product of:
      0.064109206 = sum of:
        0.064109206 = product of:
          0.12821841 = sum of:
            0.12821841 = weight(_text_:learning in 1829) [ClassicSimilarity], result of:
              0.12821841 = score(doc=1829,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.55810964 = fieldWeight in 1829, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1829)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Metadaten dienen der Beschreibung von Online-Ouellen. Online-Learning braucht fachspezifische Materialien. In diesem Artikel wird die Schnittmenge der beiden betrachtet. Zum einen wird ein Überblick über die Metadaten Standards im Bildungsbereich gegeben, zum anderen werden Beispiele von Portalen angeführt, die (frei verfügbare) Materialien samt Metadatenset anbieten.
  8. Jimenez, V.O.R.: Nuevas perspectivas para la catalogacion : metadatos ver MARC (1999) 0.03
    0.029576901 = product of:
      0.059153803 = sum of:
        0.059153803 = product of:
          0.118307605 = sum of:
            0.118307605 = weight(_text_:22 in 5743) [ClassicSimilarity], result of:
              0.118307605 = score(doc=5743,freq=4.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.6565931 = fieldWeight in 5743, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5743)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    30. 3.2002 19:45:22
    Source
    Revista Española de Documentaçion Cientifica. 22(1999) no.2, S.198-219
  9. Slavic, A.: General library classification in learning material metadata : the application in IMS/LOM and CDMES metadata schemas (2003) 0.03
    0.029444033 = product of:
      0.058888067 = sum of:
        0.058888067 = product of:
          0.11777613 = sum of:
            0.11777613 = weight(_text_:learning in 3961) [ClassicSimilarity], result of:
              0.11777613 = score(doc=3961,freq=6.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.51265645 = fieldWeight in 3961, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3961)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    This paper analyses the approach to resource discovery in the educational domain and stresses this community's need for a subject approach to information. The use of both general (Dublin Core) and domain specific (IEEE Learning Object Metadata/IMS Metadata) metadata schemas for learning resource discovery suggests that library classification could be used for subject description. There are several reasons why this indexing language might be suitable for the indexing of education resources. The paper will explain the reasoning behind the application of Universal Decimal Classification in the EASEL (Educator's Access to Services in the Electronic Landscape - http://www.fdgroup.com/easel) project. EASEL deploys two Dublin Core and several different application profiles of LOM i.e. IMS Metadata and this paper will explain how these two types of metadata support the use of classification.
  10. Frodl, C.: International Conference on Dublin Core and Metadata Applications (2007) 0.03
    0.028332531 = product of:
      0.056665063 = sum of:
        0.056665063 = product of:
          0.113330126 = sum of:
            0.113330126 = weight(_text_:learning in 3674) [ClassicSimilarity], result of:
              0.113330126 = score(doc=3674,freq=2.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.49330387 = fieldWeight in 3674, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.078125 = fieldNorm(doc=3674)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Vom 3. bis 6. Oktober 2006 fand in Manzanillo (Mexiko) die »International Conference on Dublin Core and Metadata Applications« mit dem Rahmenthema »Metadata for Knowledge and Learning« statt. Es nahmen 250 Teilnehmer aus 24 Nationen, überwiegend aus dem südamerikanischen Raum teil.
  11. Andresen, L.: Metadata in Denmark (2000) 0.03
    0.027885368 = product of:
      0.055770736 = sum of:
        0.055770736 = product of:
          0.11154147 = sum of:
            0.11154147 = weight(_text_:22 in 4899) [ClassicSimilarity], result of:
              0.11154147 = score(doc=4899,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.61904186 = fieldWeight in 4899, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=4899)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    16. 7.2000 20:58:22
  12. MARC and metadata : METS, MODS, and MARCXML: current and future implications (2004) 0.03
    0.027885368 = product of:
      0.055770736 = sum of:
        0.055770736 = product of:
          0.11154147 = sum of:
            0.11154147 = weight(_text_:22 in 2840) [ClassicSimilarity], result of:
              0.11154147 = score(doc=2840,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.61904186 = fieldWeight in 2840, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=2840)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Library hi tech. 22(2004) no.1
  13. Moen, W.E.: ¬The metadata approach to accessing government information (2001) 0.02
    0.024399696 = product of:
      0.04879939 = sum of:
        0.04879939 = product of:
          0.09759878 = sum of:
            0.09759878 = weight(_text_:22 in 4407) [ClassicSimilarity], result of:
              0.09759878 = score(doc=4407,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.5416616 = fieldWeight in 4407, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4407)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    28. 3.2002 9:22:34
  14. MARC and metadata : METS, MODS, and MARCXML: current and future implications (2004) 0.02
    0.024399696 = product of:
      0.04879939 = sum of:
        0.04879939 = product of:
          0.09759878 = sum of:
            0.09759878 = weight(_text_:22 in 7196) [ClassicSimilarity], result of:
              0.09759878 = score(doc=7196,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.5416616 = fieldWeight in 7196, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=7196)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Library hi tech. 22(2004) no.1
  15. MARC and metadata : METS, MODS, and MARCXML: current and future implications part 2 (2004) 0.02
    0.024399696 = product of:
      0.04879939 = sum of:
        0.04879939 = product of:
          0.09759878 = sum of:
            0.09759878 = weight(_text_:22 in 2841) [ClassicSimilarity], result of:
              0.09759878 = score(doc=2841,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.5416616 = fieldWeight in 2841, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2841)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Source
    Library hi tech. 22(2004) no.2
  16. Maule, R.W.: Cognitive maps, AI agents and personalized virtual environments in Internet learning experiences (1998) 0.02
    0.022666026 = product of:
      0.04533205 = sum of:
        0.04533205 = product of:
          0.0906641 = sum of:
            0.0906641 = weight(_text_:learning in 5221) [ClassicSimilarity], result of:
              0.0906641 = score(doc=5221,freq=2.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.3946431 = fieldWeight in 5221, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0625 = fieldNorm(doc=5221)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
  17. Broughton, V.: Automatic metadata generation : Digital resource description without human intervention (2007) 0.02
    0.020914026 = product of:
      0.04182805 = sum of:
        0.04182805 = product of:
          0.0836561 = sum of:
            0.0836561 = weight(_text_:22 in 6048) [ClassicSimilarity], result of:
              0.0836561 = score(doc=6048,freq=2.0), product of:
                0.18018405 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.05145426 = queryNorm
                0.46428138 = fieldWeight in 6048, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6048)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Date
    22. 9.2007 15:41:14
  18. Liu, X.; Qin, J.: ¬An interactive metadata model for structural, descriptive, and referential representation of scholarly output (2014) 0.02
    0.020034127 = product of:
      0.040068254 = sum of:
        0.040068254 = product of:
          0.08013651 = sum of:
            0.08013651 = weight(_text_:learning in 1253) [ClassicSimilarity], result of:
              0.08013651 = score(doc=1253,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.34881854 = fieldWeight in 1253, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1253)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    The scientific metadata model proposed in this article encompasses both classical descriptive metadata such as those defined in the Dublin Core Metadata Element Set (DC) and the innovative structural and referential metadata properties that go beyond the classical model. Structural metadata capture the structural vocabulary in research publications; referential metadata include not only citations but also data about other types of scholarly output that is based on or related to the same publication. The article describes the structural, descriptive, and referential (SDR) elements of the metadata model and explains the underlying assumptions and justifications for each major component in the model. ScholarWiki, an experimental system developed as a proof of concept, was built over the wiki platform to allow user interaction with the metadata and the editing, deleting, and adding of metadata. By allowing and encouraging scholars (both as authors and as users) to participate in the knowledge and metadata editing and enhancing process, the larger community will benefit from more accurate and effective information retrieval. The ScholarWiki system utilizes machine-learning techniques that can automatically produce self-enhanced metadata by learning from the structural metadata that scholars contribute, which will add intelligence to enhance and update automatically the publication of metadata Wiki pages.
  19. Yang, T.-H.; Hsieh, Y.-L.; Liu, S.-H.; Chang, Y.-C.; Hsu, W.-L.: ¬A flexible template generation and matching method with applications for publication reference metadata extraction (2021) 0.02
    0.020034127 = product of:
      0.040068254 = sum of:
        0.040068254 = product of:
          0.08013651 = sum of:
            0.08013651 = weight(_text_:learning in 63) [ClassicSimilarity], result of:
              0.08013651 = score(doc=63,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.34881854 = fieldWeight in 63, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=63)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.
  20. Laparra, E.; Binford-Walsh, A.; Emerson, K.; Miller, M.L.; López-Hoffman, L.; Currim, F.; Bethard, S.: Addressing structural hurdles for metadata extraction from environmental impact statements (2023) 0.02
    0.020034127 = product of:
      0.040068254 = sum of:
        0.040068254 = product of:
          0.08013651 = sum of:
            0.08013651 = weight(_text_:learning in 1042) [ClassicSimilarity], result of:
              0.08013651 = score(doc=1042,freq=4.0), product of:
                0.22973695 = queryWeight, product of:
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.05145426 = queryNorm
                0.34881854 = fieldWeight in 1042, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.464877 = idf(docFreq=1382, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1042)
          0.5 = coord(1/2)
      0.5 = coord(1/2)
    
    Abstract
    Natural language processing techniques can be used to analyze the linguistic content of a document to extract missing pieces of metadata. However, accurate metadata extraction may not depend solely on the linguistics, but also on structural problems such as extremely large documents, unordered multi-file documents, and inconsistency in manually labeled metadata. In this work, we start from two standard machine learning solutions to extract pieces of metadata from Environmental Impact Statements, environmental policy documents that are regularly produced under the US National Environmental Policy Act of 1969. We present a series of experiments where we evaluate how these standard approaches are affected by different issues derived from real-world data. We find that metadata extraction can be strongly influenced by nonlinguistic factors such as document length and volume ordering and that the standard machine learning solutions often do not scale well to long documents. We demonstrate how such solutions can be better adapted to these scenarios, and conclude with suggestions for other NLP practitioners cataloging large document collections.

Authors

Years

Languages

  • e 98
  • d 10
  • sp 1
  • More… Less…

Types

  • a 100
  • el 6
  • s 6
  • m 5
  • b 2
  • More… Less…