Search (4 results, page 1 of 1)

Golub, K.: Automated subject classification of textual documents in the context of Web-based hierarchical browsing (2011) 0.06
```
0.06093827 = product of:
  0.0914074 = sum of:
    0.06606405 = weight(_text_:wide in 4558) [ClassicSimilarity], result of:
      0.06606405 = score(doc=4558,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.29372054 = fieldWeight in 4558, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.046875 = fieldNorm(doc=4558)
    0.025343355 = product of:
      0.05068671 = sum of:
        0.05068671 = weight(_text_:web in 4558) [ClassicSimilarity], result of:
          0.05068671 = score(doc=4558,freq=4.0), product of:
            0.1656677 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.050763648 = queryNorm
            0.3059541 = fieldWeight in 4558, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.046875 = fieldNorm(doc=4558)
      0.5 = coord(1/2)
  0.6666667 = coord(2/3)
```
Abstract

While automated methods for information organization have been around for several decades now, exponential growth of the World Wide Web has put them into the forefront of research in different communities, within which several approaches can be identified: 1) machine learning (algorithms that allow computers to improve their performance based on learning from pre-existing data); 2) document clustering (algorithms for unsupervised document organization and automated topic extraction); and 3) string matching (algorithms that match given strings within larger text). Here the aim was to automatically organize textual documents into hierarchical structures for subject browsing. The string-matching approach was tested using a controlled vocabulary (containing pre-selected and pre-defined authorized terms, each corresponding to only one concept). The results imply that an appropriate controlled vocabulary, with a sufficient number of entry terms designating classes, could in itself be a solution for automated classification. Then, if the same controlled vocabulary had an appropriat hierarchical structure, it would at the same time provide a good browsing structure for the collection of automatically classified documents.
Golub, K.: Automatic subject indexing of text (2019) 0.02
```
0.018351128 = product of:
  0.055053383 = sum of:
    0.055053383 = weight(_text_:wide in 5268) [ClassicSimilarity], result of:
      0.055053383 = score(doc=5268,freq=2.0), product of:
        0.22492146 = queryWeight, product of:
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.050763648 = queryNorm
        0.24476713 = fieldWeight in 5268, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.4307585 = idf(docFreq=1430, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5268)
  0.33333334 = coord(1/3)
```
Abstract

Automatic subject indexing addresses problems of scale and sustainability and can be at the same time used to enrich existing metadata records, establish more connections across and between resources from various metadata and resource collec-tions, and enhance consistency of the metadata. In this work, au-tomatic subject indexing focuses on assigning index terms or classes from established knowledge organization systems (KOSs) for subject indexing like thesauri, subject headings systems and classification systems. The following major approaches are dis-cussed, in terms of their similarities and differences, advantages and disadvantages for automatic assigned indexing from KOSs: "text categorization," "document clustering," and "document classification." Text categorization is perhaps the most wide-spread, machine-learning approach with what seems generally good reported performance. Document clustering automatically both creates groups of related documents and extracts names of subjects depicting the group at hand. Document classification re-uses the intellectual effort invested into creating a KOS for sub-ject indexing and even simple string-matching algorithms have been reported to achieve good results, because one concept can be described using a number of different terms, including equiv-alent, related, narrower and broader terms. Finally, applicability of automatic subject indexing to operative information systems and challenges of evaluation are outlined, suggesting the need for more research.
Johansson, S.; Golub, K.: LibraryThing for libraries : how tag moderation and size limitations affect tag clouds (2019) 0.01
```
0.0070398217 = product of:
  0.021119464 = sum of:
    0.021119464 = product of:
      0.04223893 = sum of:
        0.04223893 = weight(_text_:web in 5398) [ClassicSimilarity], result of:
          0.04223893 = score(doc=5398,freq=4.0), product of:
            0.1656677 = queryWeight, product of:
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.050763648 = queryNorm
            0.25496176 = fieldWeight in 5398, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.2635105 = idf(docFreq=4597, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5398)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The aim of this study is to analyse differences between tags on LibraryThing's web page and tag clouds in their "Library-Thing for Libraries" service, and assess if, and how, the Library-Thing tag moderation and limitations to the size of the tag cloud in the library catalogue affect the description of the information resource. An e-mail survey was conducted with personnel at LibraryThing, and the results were compared against tags for twenty different fiction books, collected from two different library catalogues with disparate tag cloud sizes, and Library-Thing's web page. The data were analysed using a modified version of Golder and Huberman's tag categories (2006). The results show that while LibraryThing claims to only remove the inherently personal tags, several other types of tags are found to have been discarded as well. Occasionally a certain type of tag is in-cluded in one book, and excluded in another. The comparison between the two tag cloud sizes suggests that the larger tag clouds provide a more pronounced picture regarding the contents of the book but at the cost of an increase in the number of tags with synonymous or redundant information.

Golub, K.; Tudhope, D.; Zeng, M.L.; Zumer, M.: Terminology registries for knowledge organization systems : functionality, use, and attributes (2014) 0.01

0.006877774 = product of:
  0.020633321 = sum of:
    0.020633321 = product of:
      0.041266643 = sum of:
        0.041266643 = weight(_text_:22 in 1347) [ClassicSimilarity], result of:
          0.041266643 = score(doc=1347,freq=2.0), product of:
            0.17776565 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.050763648 = queryNorm
            0.23214069 = fieldWeight in 1347, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1347)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Date: 22. 8.2014 17:12:54

Search (4 results, page 1 of 1)

Authors

Themes