Search (256 results, page 1 of 13)

Assem, M. van: Converting and integrating vocabularies for the Semantic Web (2010) 0.06

0.06416068 = product of:
  0.14436153 = sum of:
    0.08878562 = weight(_text_:applications in 4639) [ClassicSimilarity], result of:
      0.08878562 = score(doc=4639,freq=14.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.51477134 = fieldWeight in 4639, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03125 = fieldNorm(doc=4639)
    0.011975031 = weight(_text_:of in 4639) [ClassicSimilarity], result of:
      0.011975031 = score(doc=4639,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.19546966 = fieldWeight in 4639, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=4639)
    0.016351866 = weight(_text_:systems in 4639) [ClassicSimilarity], result of:
      0.016351866 = score(doc=4639,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.1358164 = fieldWeight in 4639, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=4639)
    0.027249003 = weight(_text_:software in 4639) [ClassicSimilarity], result of:
      0.027249003 = score(doc=4639,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.17532499 = fieldWeight in 4639, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=4639)
  0.44444445 = coord(4/9)

Abstract: This thesis focuses on conversion of vocabularies for representation and integration of collections on the Semantic Web. A secondary focus is how to represent metadata schemas (RDF Schemas representing metadata element sets) such that they interoperate with vocabularies. The primary domain in which we operate is that of cultural heritage collections. The background worldview in which a solution is sought is that of the Semantic Web research paradigmwith its associated theories, methods, tools and use cases. In other words, we assume the SemanticWeb is in principle able to provide the context to realize interoperable collections. Interoperability is dependent on the interplay between representations and the applications that use them. We mean applications in the widest sense, such as "search" and "annotation". These applications or tasks are often present in software applications, such as the E-Culture application. It is therefore necessary that applications requirements on the vocabulary representation are met. This leads us to formulate the following problem statement: HOW CAN EXISTING VOCABULARIES BE MADE AVAILABLE TO SEMANTIC WEB APPLICATIONS?
We refine the problem statement into three research questions. The first two focus on the problem of conversion of a vocabulary to a Semantic Web representation from its original format. Conversion of a vocabulary to a representation in a Semantic Web language is necessary to make the vocabulary available to SemanticWeb applications. In the last question we focus on integration of collection metadata schemas in a way that allows for vocabulary representations as produced by our methods. Academisch proefschrift ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, Dutch Research School for Information and Knowledge Systems.

Rehurek, R.; Sojka, P.: Software framework for topic modelling with large corpora (2010) 0.05

0.046727955 = product of:
  0.14018387 = sum of:
    0.050336715 = weight(_text_:applications in 1058) [ClassicSimilarity], result of:
      0.050336715 = score(doc=1058,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.2918479 = fieldWeight in 1058, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.046875 = fieldNorm(doc=1058)
    0.019052157 = weight(_text_:of in 1058) [ClassicSimilarity], result of:
      0.019052157 = score(doc=1058,freq=18.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.3109903 = fieldWeight in 1058, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1058)
    0.070794985 = weight(_text_:software in 1058) [ClassicSimilarity], result of:
      0.070794985 = score(doc=1058,freq=6.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.4555077 = fieldWeight in 1058, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=1058)
  0.33333334 = coord(3/9)

Abstract: Large corpora are ubiquitous in today's world and memory quickly becomes the limiting factor in practical applications of the Vector Space Model (VSM). In this paper, we identify a gap in existing implementations of many of the popular algorithms, which is their scalability and ease of use. We describe a Natural Language Processing software framework which is based on the idea of document streaming, i.e. processing corpora document after document, in a memory independent fashion. Within this framework, we implement several popular algorithms for topical inference, including Latent Semantic Analysis and Latent Dirichlet Allocation, in a way that makes them completely independent of the training corpus size. Particular emphasis is placed on straightforward and intuitive framework design, so that modifications and extensions of the methods and/or their application by interested practitioners are effortless. We demonstrate the usefulness of our approach on a real-world scenario of computing document similarities within an existing digital library DML-CZ.
Content: Für die Software, vgl.: http://radimrehurek.com/gensim/index.html. Für eine Demo, vgl.: http://dml.cz/handle/10338.dmlcz/100785/SimilarArticles.

Gómez-Pérez, A.; Corcho, O.: Ontology languages for the Semantic Web (2015) 0.04
```
0.04371041 = product of:
  0.13113123 = sum of:
    0.08389453 = weight(_text_:applications in 3297) [ClassicSimilarity], result of:
      0.08389453 = score(doc=3297,freq=8.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.4864132 = fieldWeight in 3297, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3297)
    0.011833867 = weight(_text_:of in 3297) [ClassicSimilarity], result of:
      0.011833867 = score(doc=3297,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.19316542 = fieldWeight in 3297, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3297)
    0.03540283 = weight(_text_:systems in 3297) [ClassicSimilarity], result of:
      0.03540283 = score(doc=3297,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.29405114 = fieldWeight in 3297, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3297)
  0.33333334 = coord(3/9)
```
Abstract

Ontologies have proven to be an essential element in many applications. They are used in agent systems, knowledge management systems, and e-commerce platforms. They can also generate natural language, integrate intelligent information, provide semantic-based access to the Internet, and extract information from texts in addition to being used in many other applications to explicitly declare the knowledge embedded in them. However, not only are ontologies useful for applications in which knowledge plays a key role, but they can also trigger a major change in current Web contents. This change is leading to the third generation of the Web-known as the Semantic Web-which has been defined as "the conceptual structuring of the Web in an explicit machine-readable way."1 This definition does not differ too much from the one used for defining an ontology: "An ontology is an explicit, machinereadable specification of a shared conceptualization."2 In fact, new ontology-based applications and knowledge architectures are developing for this new Web. A common claim for all of these approaches is the need for languages to represent the semantic information that this Web requires-solving the heterogeneous data exchange in this heterogeneous environment. Here, we don't decide which language is best of the Semantic Web. Rather, our goal is to help developers find the most suitable language for their representation needs. The authors analyze the most representative ontology languages created for the Web and compare them using a common framework.

Source

IEEE intelligent systems 2002, Jan./Feb., S.54-60

Stoykova, V.; Petkova, E.: Automatic extraction of mathematical terms for precalculus (2012) 0.04

0.039748333 = product of:
  0.11924499 = sum of:
    0.05872617 = weight(_text_:applications in 156) [ClassicSimilarity], result of:
      0.05872617 = score(doc=156,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 156, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
    0.0128330635 = weight(_text_:of in 156) [ClassicSimilarity], result of:
      0.0128330635 = score(doc=156,freq=6.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.20947541 = fieldWeight in 156, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
    0.047685754 = weight(_text_:software in 156) [ClassicSimilarity], result of:
      0.047685754 = score(doc=156,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.30681872 = fieldWeight in 156, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0546875 = fieldNorm(doc=156)
  0.33333334 = coord(3/9)

Abstract: In this work, we present the results of research for evaluating a methodology for extracting mathematical terms for precalculus using the techniques for semantically-oriented statistical search. We use the corpus-based approach and the combination of different statistically-based techniques for extracting keywords, collocations and co-occurrences incorporated in the Sketch Engine software. We evaluate the collocations candidate terms for the basic concept function(s) and approve the related methodology by precalculus domain conceptual terms definitions. Finally, we offer a conceptual terms hierarchical representation and discuss the results with respect to their possible applications.

Shen, M.; Liu, D.-R.; Huang, Y.-S.: Extracting semantic relations to enrich domain ontologies (2012) 0.03

0.034636453 = product of:
  0.10390935 = sum of:
    0.05872617 = weight(_text_:applications in 267) [ClassicSimilarity], result of:
      0.05872617 = score(doc=267,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.34048924 = fieldWeight in 267, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=267)
    0.016567415 = weight(_text_:of in 267) [ClassicSimilarity], result of:
      0.016567415 = score(doc=267,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2704316 = fieldWeight in 267, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=267)
    0.028615767 = weight(_text_:systems in 267) [ClassicSimilarity], result of:
      0.028615767 = score(doc=267,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.23767869 = fieldWeight in 267, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0546875 = fieldNorm(doc=267)
  0.33333334 = coord(3/9)

Abstract: Domain ontologies facilitate the organization, sharing and reuse of domain knowledge, and enable various vertical domain applications to operate successfully. Most methods for automatically constructing ontologies focus on taxonomic relations, such as is-kind-of and is- part-of relations. However, much of the domain-specific semantics is ignored. This work proposes a semi-unsupervised approach for extracting semantic relations from domain-specific text documents. The approach effectively utilizes text mining and existing taxonomic relations in domain ontologies to discover candidate keywords that can represent semantic relations. A preliminary experiment on the natural science domain (Taiwan K9 education) indicates that the proposed method yields valuable recommendations. This work enriches domain ontologies by adding distilled semantics.
Source: Journal of Intelligent Information Systems

Arenas, M.; Cuenca Grau, B.; Kharlamov, E.; Marciuska, S.; Zheleznyakov, D.: Faceted search over ontology-enhanced RDF data (2014) 0.03

0.029688384 = product of:
  0.08906515 = sum of:
    0.050336715 = weight(_text_:applications in 2207) [ClassicSimilarity], result of:
      0.050336715 = score(doc=2207,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.2918479 = fieldWeight in 2207, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.046875 = fieldNorm(doc=2207)
    0.014200641 = weight(_text_:of in 2207) [ClassicSimilarity], result of:
      0.014200641 = score(doc=2207,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.23179851 = fieldWeight in 2207, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2207)
    0.0245278 = weight(_text_:systems in 2207) [ClassicSimilarity], result of:
      0.0245278 = score(doc=2207,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.2037246 = fieldWeight in 2207, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=2207)
  0.33333334 = coord(3/9)

Abstract: An increasing number of applications rely on RDF, OWL2, and SPARQL for storing and querying data. SPARQL, however, is not targeted towards end-users, and suitable query interfaces are needed. Faceted search is a prominent approach for end-user data access, and several RDF-based faceted search systems have been developed. There is, however, a lack of rigorous theoretical underpinning for faceted search in the context of RDF and OWL2. In this paper, we provide such solid foundations. We formalise faceted interfaces for this context, identify a fragment of first-order logic capturing the underlying queries, and study the complexity of answering such queries for RDF and OWL2 profiles. We then study interface generation and update, and devise efficiently implementable algorithms. Finally, we have implemented and tested our faceted search algorithms for scalability, with encouraging results.

Mitchell, J.S.; Zeng, M.L.; Zumer, M.: Modeling classification systems in multicultural and multilingual contexts (2012) 0.03

0.027655158 = product of:
  0.08296547 = sum of:
    0.017962547 = weight(_text_:of in 1967) [ClassicSimilarity], result of:
      0.017962547 = score(doc=1967,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2932045 = fieldWeight in 1967, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1967)
    0.042483397 = weight(_text_:systems in 1967) [ClassicSimilarity], result of:
      0.042483397 = score(doc=1967,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.35286134 = fieldWeight in 1967, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=1967)
    0.022519529 = product of:
      0.045039058 = sum of:
        0.045039058 = weight(_text_:22 in 1967) [ClassicSimilarity], result of:
          0.045039058 = score(doc=1967,freq=4.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.32829654 = fieldWeight in 1967, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1967)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: This paper reports on the second part of an initiative of the authors on researching classification systems with the conceptual model defined by the Functional Requirements for Subject Authority Data (FRSAD) final report. In an earlier study, the authors explored whether the FRSAD conceptual model could be extended beyond subject authority data to model classification data. The focus of the current study is to determine if classification data modeled using FRSAD can be used to solve real-world discovery problems in multicultural and multilingual contexts. The paper discusses the relationships between entities (same type or different types) in the context of classification systems that involve multiple translations and /or multicultural implementations. Results of two case studies are presented in detail: (a) two instances of the DDC (DDC 22 in English, and the Swedish-English mixed translation of DDC 22), and (b) Chinese Library Classification. The use cases of conceptual models in practice are also discussed.

Mayo, D.; Bowers, K.: ¬The devil's shoehorn : a case study of EAD to ArchivesSpace migration at a large university (2017) 0.03
```
0.026281446 = product of:
  0.07884434 = sum of:
    0.015876798 = weight(_text_:of in 3373) [ClassicSimilarity], result of:
      0.015876798 = score(doc=3373,freq=18.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.25915858 = fieldWeight in 3373, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3373)
    0.02890629 = weight(_text_:systems in 3373) [ClassicSimilarity], result of:
      0.02890629 = score(doc=3373,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.24009174 = fieldWeight in 3373, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3373)
    0.034061253 = weight(_text_:software in 3373) [ClassicSimilarity], result of:
      0.034061253 = score(doc=3373,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.21915624 = fieldWeight in 3373, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3373)
  0.33333334 = coord(3/9)
```
Abstract

A band of archivists and IT professionals at Harvard took on a project to convert nearly two million descriptions of archival collection components from marked-up text into the ArchivesSpace archival metadata management system. Starting in the mid-1990s, Harvard was an alpha implementer of EAD, an SGML (later XML) text markup language for electronic inventories, indexes, and finding aids that archivists use to wend their way through the sometimes quirky filing systems that bureaucracies establish for their records or the utter chaos in which some individuals keep their personal archives. These pathfinder documents, designed to cope with messy reality, can themselves be difficult to classify. Portions of them are rigorously structured, while other parts are narrative. Early documents predate the establishment of the standard; many feature idiosyncratic encoding that had been through several machine conversions, while others were freshly encoded and fairly consistent. In this paper, we will cover the practical and technical challenges involved in preparing a large (900MiB) corpus of XML for ingest into an open-source archival information system (ArchivesSpace). This case study will give an overview of the project, discuss problem discovery and problem solving, and address the technical challenges, analysis, solutions, and decisions and provide information on the tools produced and lessons learned. The authors of this piece are Kate Bowers, Collections Services Archivist for Metadata, Systems, and Standards at the Harvard University Archive, and Dave Mayo, a Digital Library Software Engineer for Harvard's Library and Technology Services. Kate was heavily involved in both metadata analysis and later problem solving, while Dave was the sole full-time developer assigned to the migration project.

Klic, L.; Miller, M.; Nelson, J.K.; Germann, J.E.: Approaching the largest 'API' : extracting information from the Internet with Python (2018) 0.03

0.025856502 = product of:
  0.11635426 = sum of:
    0.0089812735 = weight(_text_:of in 4239) [ClassicSimilarity], result of:
      0.0089812735 = score(doc=4239,freq=4.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.14660224 = fieldWeight in 4239, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=4239)
    0.107372984 = product of:
      0.21474597 = sum of:
        0.21474597 = weight(_text_:packages in 4239) [ClassicSimilarity], result of:
          0.21474597 = score(doc=4239,freq=6.0), product of:
            0.2706874 = queryWeight, product of:
              6.9093957 = idf(docFreq=119, maxDocs=44218)
              0.03917671 = queryNorm
            0.7933357 = fieldWeight in 4239, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              6.9093957 = idf(docFreq=119, maxDocs=44218)
              0.046875 = fieldNorm(doc=4239)
      0.5 = coord(1/2)
  0.22222222 = coord(2/9)

Abstract: This article explores the need for libraries to algorithmically access and manipulate the world's largest API: the Internet. The billions of pages on the 'Internet API' (HTTP, HTML, CSS, XPath, DOM, etc.) are easily accessible and manipulable. Libraries can assist in creating meaning through the datafication of information on the world wide web. Because most information is created for human consumption, some programming is required for automated extraction. Python is an easy-to-learn programming language with extensive packages and community support for web page automation. Four packages (Urllib, Selenium, BeautifulSoup, Scrapy) in Python can automate almost any web page for all sized projects. An example warrant data project is explained to illustrate how well Python packages can manipulate web pages to create meaning through assembling custom datasets.

Harlow, C.: Data munging tools in Preparation for RDF : Catmandu and LODRefine (2015) 0.03

0.02546304 = product of:
  0.07638912 = sum of:
    0.041947264 = weight(_text_:applications in 2277) [ClassicSimilarity], result of:
      0.041947264 = score(doc=2277,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.2432066 = fieldWeight in 2277, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2277)
    0.0140020205 = weight(_text_:of in 2277) [ClassicSimilarity], result of:
      0.0140020205 = score(doc=2277,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.22855641 = fieldWeight in 2277, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2277)
    0.020439833 = weight(_text_:systems in 2277) [ClassicSimilarity], result of:
      0.020439833 = score(doc=2277,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.1697705 = fieldWeight in 2277, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2277)
  0.33333334 = coord(3/9)

Abstract: Data munging, or the work of remediating, enhancing and transforming library datasets for new or improved uses, has become more important and staff-inclusive in many library technology discussions and projects. Many times we know how we want our data to look, as well as how we want our data to act in discovery interfaces or when exposed, but we are uncertain how to make the data we have into the data we want. This article introduces and compares two library data munging tools that can help: LODRefine (OpenRefine with the DERI RDF Extension) and Catmandu. The strengths and best practices of each tool are discussed in the context of metadata munging use cases for an institution's metadata migration workflow. There is a focus on Linked Open Data modeling and transformation applications of each tool, in particular how metadataists, catalogers, and programmers can create metadata quality reports, enhance existing data with LOD sets, and transform that data to a RDF model. Integration of these tools with other systems and projects, the use of domain specific transformation languages, and the expansion of vocabulary reconciliation services are mentioned.

Voß, J.: Classification of knowledge organization systems with Wikidata (2016) 0.03

0.025069844 = product of:
  0.07520953 = sum of:
    0.016802425 = weight(_text_:of in 3082) [ClassicSimilarity], result of:
      0.016802425 = score(doc=3082,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2742677 = fieldWeight in 3082, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=3082)
    0.042483397 = weight(_text_:systems in 3082) [ClassicSimilarity], result of:
      0.042483397 = score(doc=3082,freq=6.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.35286134 = fieldWeight in 3082, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.046875 = fieldNorm(doc=3082)
    0.015923709 = product of:
      0.031847417 = sum of:
        0.031847417 = weight(_text_:22 in 3082) [ClassicSimilarity], result of:
          0.031847417 = score(doc=3082,freq=2.0), product of:
            0.13719016 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03917671 = queryNorm
            0.23214069 = fieldWeight in 3082, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=3082)
      0.5 = coord(1/2)
  0.33333334 = coord(3/9)

Abstract: This paper presents a crowd-sourced classification of knowledge organization systems based on open knowledge base Wikidata. The focus is less on the current result in its rather preliminary form but on the environment and process of categorization in Wikidata and the extraction of KOS from the collaborative database. Benefits and disadvantages are summarized and discussed for application to knowledge organization of other subject areas with Wikidata.
Pages: S.15-22
Source: Proceedings of the 15th European Networked Knowledge Organization Systems Workshop (NKOS 2016) co-located with the 20th International Conference on Theory and Practice of Digital Libraries 2016 (TPDL 2016), Hannover, Germany, September 9, 2016. Edi. by Philipp Mayr et al. [http://ceur-ws.org/Vol-1676/=urn:nbn:de:0074-1676-5]

Durno, J.: Digital archaeology and/or forensics : working with floppy disks from the 1980s (2016) 0.02

0.022857988 = product of:
  0.10286094 = sum of:
    0.008467626 = weight(_text_:of in 3196) [ClassicSimilarity], result of:
      0.008467626 = score(doc=3196,freq=2.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.13821793 = fieldWeight in 3196, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=3196)
    0.09439332 = weight(_text_:software in 3196) [ClassicSimilarity], result of:
      0.09439332 = score(doc=3196,freq=6.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.6073436 = fieldWeight in 3196, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=3196)
  0.22222222 = coord(2/9)

Abstract: While software originating from the domain of digital forensics has demonstrated utility for data recovery from contemporary storage media, it is not as effective for working with floppy disks from the 1980s. This paper details alternative strategies for recovering data from floppy disks employing software originating from the software preservation and retro-computing communities. Imaging hardware, storage formats and processing workflows are also discussed.

Blanco, E.; Cankaya, H.C.; Moldovan, D.: Composition of semantic relations : model and applications (2010) 0.02

0.022137502 = product of:
  0.09961876 = sum of:
    0.083051346 = weight(_text_:applications in 4761) [ClassicSimilarity], result of:
      0.083051346 = score(doc=4761,freq=4.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.4815245 = fieldWeight in 4761, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4761)
    0.016567415 = weight(_text_:of in 4761) [ClassicSimilarity], result of:
      0.016567415 = score(doc=4761,freq=10.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2704316 = fieldWeight in 4761, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4761)
  0.22222222 = coord(2/9)

Abstract: This paper presents a framework for combining semantic relations extracted from text to reveal even more semantics that otherwise would be missed. A set of 26 relations is introduced, with their arguments defined on an ontology of sorts. A semantic parser is used to extract these relations from noun phrases and verb argument structures. The method was successfully used in two applications: rapid customization of semantic relations to arbitrary domains and recognizing entailments.
Source: Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010), Poster Volume, Beijing, China. Ed.: Chu-Ren Huang and Dan Jurafsky

Belpassi, E.: ¬The application software RIMMF : RDA thinking in action (2016) 0.02

0.021899873 = product of:
  0.09854943 = sum of:
    0.016802425 = weight(_text_:of in 2959) [ClassicSimilarity], result of:
      0.016802425 = score(doc=2959,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2742677 = fieldWeight in 2959, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2959)
    0.08174701 = weight(_text_:software in 2959) [ClassicSimilarity], result of:
      0.08174701 = score(doc=2959,freq=8.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.525975 = fieldWeight in 2959, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.046875 = fieldNorm(doc=2959)
  0.22222222 = coord(2/9)

Abstract: RIMMF software is grew out of the need to visualize and realize records according to the RDA guidelines. The article describes the software structure and features in the creation of a rball, that is a small database populated by recordings of bibliographic and authority resources enriched by relationships between and among entities involved. At first it's introduced the need that led to RIMMF outcome, then starts the software functional analysis. With a description of the main steps of the r-ball building, emphasizing the issues raised. The results highlights some critical aspects, but above all the wide scope of possible developments that open the Cultural Heritage Institutions horizon to the web prospective. Conclusions display the RDF-linkeddata development of the RIMMF incoming future.

Lange, C.: Ontologies and languages for representing mathematical knowledge on the Semantic Web (2011) 0.02
```
0.021254174 = product of:
  0.06376252 = sum of:
    0.013388492 = weight(_text_:of in 135) [ClassicSimilarity], result of:
      0.013388492 = score(doc=135,freq=20.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.21854173 = fieldWeight in 135, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=135)
    0.023125032 = weight(_text_:systems in 135) [ClassicSimilarity], result of:
      0.023125032 = score(doc=135,freq=4.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.19207339 = fieldWeight in 135, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=135)
    0.027249003 = weight(_text_:software in 135) [ClassicSimilarity], result of:
      0.027249003 = score(doc=135,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.17532499 = fieldWeight in 135, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03125 = fieldNorm(doc=135)
  0.33333334 = coord(3/9)
```
Abstract

Mathematics is a ubiquitous foundation of science, technology, and engineering. Specific areas, such as numeric and symbolic computation or logics, enjoy considerable software support. Working mathematicians have recently started to adopt Web 2.0 environment, such as blogs and wikis, but these systems lack machine support for knowledge organization and reuse, and they are disconnected from tools such as computer algebra systems or interactive proof assistants.We argue that such scenarios will benefit from Semantic Web technology. Conversely, mathematics is still underrepresented on the Web of [Linked] Data. There are mathematics-related Linked Data, for example statistical government data or scientific publication databases, but their mathematical semantics has not yet been modeled. We argue that the services for the Web of Data will benefit from a deeper representation of mathematical knowledge. Mathematical knowledge comprises logical and functional structures - formulæ, statements, and theories -, a mixture of rigorous natural language and symbolic notation in documents, application-specific metadata, and discussions about conceptualizations, formalizations, proofs, and (counter-)examples. Our review of approaches to representing these structures covers ontologies for mathematical problems, proofs, interlinked scientific publications, scientific discourse, as well as mathematical metadata vocabularies and domain knowledge from pure and applied mathematics. Many fields of mathematics have not yet been implemented as proper Semantic Web ontologies; however, we show that MathML and OpenMath, the standard XML-based exchange languages for mathematical knowledge, can be fully integrated with RDF representations in order to contribute existing mathematical knowledge to theWeb of Data. We conclude with a roadmap for getting the mathematical Web of Data started: what datasets to publish, how to interlink them, and how to take advantage of these new connections.

Bauckhage, C.: Marginalizing over the PageRank damping factor (2014) 0.02

0.020995347 = product of:
  0.09447906 = sum of:
    0.08389453 = weight(_text_:applications in 928) [ClassicSimilarity], result of:
      0.08389453 = score(doc=928,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.4864132 = fieldWeight in 928, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.078125 = fieldNorm(doc=928)
    0.010584532 = weight(_text_:of in 928) [ClassicSimilarity], result of:
      0.010584532 = score(doc=928,freq=2.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.17277241 = fieldWeight in 928, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=928)
  0.22222222 = coord(2/9)

Abstract: In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification.

Takhirov, N.; Aalberg, T.; Duchateau, F.; Zumer, M.: FRBR-ML: a FRBR-based framework for semantic interoperability (2012) 0.02
```
0.020370431 = product of:
  0.061111294 = sum of:
    0.03355781 = weight(_text_:applications in 134) [ClassicSimilarity], result of:
      0.03355781 = score(doc=134,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.19456528 = fieldWeight in 134, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03125 = fieldNorm(doc=134)
    0.011201616 = weight(_text_:of in 134) [ClassicSimilarity], result of:
      0.011201616 = score(doc=134,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.18284513 = fieldWeight in 134, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03125 = fieldNorm(doc=134)
    0.016351866 = weight(_text_:systems in 134) [ClassicSimilarity], result of:
      0.016351866 = score(doc=134,freq=2.0), product of:
        0.12039685 = queryWeight, product of:
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03917671 = queryNorm
        0.1358164 = fieldWeight in 134, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0731742 = idf(docFreq=5561, maxDocs=44218)
          0.03125 = fieldNorm(doc=134)
  0.33333334 = coord(3/9)
```
Abstract

Metadata related to cultural items such as literature, music and movies is a valuable resource that is currently exploited in many applications and services based on semantic web technologies. A vast amount of such information has been created by memory institutions in the last decades using different standard or ad hoc schemas, and a main challenge is to make this legacy data accessible as reusable semantic data. On one hand, this is a syntactic problem that can be solved by transforming to formats that are compatible with the tools and services used for semantic aware services. On the other hand, this is a semantic problem. Simply transforming from one format to another does not automatically enable semantic interoperability and legacy data often needs to be reinterpreted as well as transformed. The conceptual model in the Functional Requirements for Bibliographic Records, initially developed as a conceptual framework for library standards and systems, is a major step towards a shared semantic model of the products of artistic and intellectual endeavor of mankind. The model is generally accepted as sufficiently generic to serve as a conceptual framework for a broad range of cultural heritage metadata. Unfortunately, the existing large body of legacy data makes a transition to this model difficult. For instance, most bibliographic data is still only available in various MARC-based formats which is hard to render into reusable and meaningful semantic data. Making legacy bibliographic data accessible as semantic data is a complex problem that includes interpreting and transforming the information. In this article, we present our work on transforming and enhancing legacy bibliographic information into a representation where the structure and semantics of the FRBR model is explicit.
Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.02
```
0.01981098 = product of:
  0.08914941 = sum of:
    0.07118686 = weight(_text_:applications in 2697) [ClassicSimilarity], result of:
      0.07118686 = score(doc=2697,freq=4.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.41273528 = fieldWeight in 2697, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.046875 = fieldNorm(doc=2697)
    0.017962547 = weight(_text_:of in 2697) [ClassicSimilarity], result of:
      0.017962547 = score(doc=2697,freq=16.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.2932045 = fieldWeight in 2697, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=2697)
  0.22222222 = coord(2/9)
```
Abstract

Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.

Source

Science of computer programming. In Press, 2016

Zhang, L.; Wang, S.; Liu, B.: Deep learning for sentiment analysis : a survey (2018) 0.02

0.019523773 = product of:
  0.08785698 = sum of:
    0.06711562 = weight(_text_:applications in 4092) [ClassicSimilarity], result of:
      0.06711562 = score(doc=4092,freq=2.0), product of:
        0.17247584 = queryWeight, product of:
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.03917671 = queryNorm
        0.38913056 = fieldWeight in 4092, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4025097 = idf(docFreq=1471, maxDocs=44218)
          0.0625 = fieldNorm(doc=4092)
    0.020741362 = weight(_text_:of in 4092) [ClassicSimilarity], result of:
      0.020741362 = score(doc=4092,freq=12.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.33856338 = fieldWeight in 4092, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=4092)
  0.22222222 = coord(2/9)

Abstract: Deep learning has emerged as a powerful machine learning technique that learns multiple layers of representations or features of the data and produces state-of-the-art prediction results. Along with the success of deep learning in many other application domains, deep learning is also popularly used in sentiment analysis in recent years. This paper first gives an overview of deep learning and then provides a comprehensive survey of its current applications in sentiment analysis.

Gödert, W.: Detecting multiword phrases in mathematical text corpora (2012) 0.02

0.017089166 = product of:
  0.07690124 = sum of:
    0.022403233 = weight(_text_:of in 466) [ClassicSimilarity], result of:
      0.022403233 = score(doc=466,freq=14.0), product of:
        0.061262865 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.03917671 = queryNorm
        0.36569026 = fieldWeight in 466, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=466)
    0.054498006 = weight(_text_:software in 466) [ClassicSimilarity], result of:
      0.054498006 = score(doc=466,freq=2.0), product of:
        0.15541996 = queryWeight, product of:
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.03917671 = queryNorm
        0.35064998 = fieldWeight in 466, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.9671519 = idf(docFreq=2274, maxDocs=44218)
          0.0625 = fieldNorm(doc=466)
  0.22222222 = coord(2/9)

Abstract: We present an approach for detecting multiword phrases in mathematical text corpora. The method used is based on characteristic features of mathematical terminology. It makes use of a software tool named Lingo which allows to identify words by means of previously defined dictionaries for specific word classes as adjectives, personal names or nouns. The detection of multiword groups is done algorithmically. Possible advantages of the method for indexing and information retrieval and conclusions for applying dictionary-based methods of automatic indexing instead of stemming procedures are discussed.

Search (256 results, page 1 of 13)

Authors

Languages

Themes