Document (#42483)

Author
Short, M.
Title
Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels
Source
Cataloging and classification quarterly. 57(2019) no.5, S.315-336
Year
2019
Abstract
This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
Content
Vgl.: https://doi.org/10.1080/01639374.2019.1653413.
Theme
Schöne Literatur
Automatisches Indexieren
Data Mining
Inhaltsanalyse

Similar documents (content)

  1. Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.16
    0.16323689 = sum of:
      0.16323689 = product of:
        0.6801537 = sum of:
          0.028798556 = weight(abstract_txt:analysis in 237) [ClassicSimilarity], result of:
            0.028798556 = score(doc=237,freq=2.0), product of:
              0.07095563 = queryWeight, product of:
                1.1628172 = boost
                3.67349 = idf(docFreq=2984, maxDocs=43254)
                0.016611028 = queryNorm
              0.40586713 = fieldWeight in 237, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.67349 = idf(docFreq=2984, maxDocs=43254)
                0.078125 = fieldNorm(doc=237)
          0.03858468 = weight(abstract_txt:text in 237) [ClassicSimilarity], result of:
            0.03858468 = score(doc=237,freq=2.0), product of:
              0.086234875 = queryWeight, product of:
                1.2819158 = boost
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.016611028 = queryNorm
              0.44743705 = fieldWeight in 237, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.049738 = idf(docFreq=2048, maxDocs=43254)
                0.078125 = fieldNorm(doc=237)
          0.09782594 = weight(abstract_txt:mining in 237) [ClassicSimilarity], result of:
            0.09782594 = score(doc=237,freq=1.0), product of:
              0.20201597 = queryWeight, product of:
                1.9620537 = boost
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.016611028 = queryNorm
              0.48424855 = fieldWeight in 237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1983814 = idf(docFreq=238, maxDocs=43254)
                0.078125 = fieldNorm(doc=237)
          0.09842525 = weight(abstract_txt:extraction in 237) [ClassicSimilarity], result of:
            0.09842525 = score(doc=237,freq=1.0), product of:
              0.2028402 = queryWeight, product of:
                1.9660522 = boost
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.016611028 = queryNorm
              0.48523542 = fieldWeight in 237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.2110133 = idf(docFreq=235, maxDocs=43254)
                0.078125 = fieldNorm(doc=237)
          0.049115762 = weight(abstract_txt:subject in 237) [ClassicSimilarity], result of:
            0.049115762 = score(doc=237,freq=1.0), product of:
              0.16078305 = queryWeight, product of:
                2.4754443 = boost
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.016611028 = queryNorm
              0.30547848 = fieldWeight in 237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.078125 = fieldNorm(doc=237)
          0.36740354 = weight(abstract_txt:novels in 237) [ClassicSimilarity], result of:
            0.36740354 = score(doc=237,freq=1.0), product of:
              0.558739 = queryWeight, product of:
                3.996393 = boost
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.016611028 = queryNorm
              0.65755844 = fieldWeight in 237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.078125 = fieldNorm(doc=237)
        0.24 = coord(6/25)
    
  2. Sauperl, A.: Four views of a novel : characteristics of novels as described by publishers, librarians, literary theorists, and readers (2013) 0.15
    0.14838357 = sum of:
      0.14838357 = product of:
        0.9273974 = sum of:
          0.067613855 = weight(abstract_txt:headings in 3417) [ClassicSimilarity], result of:
            0.067613855 = score(doc=3417,freq=1.0), product of:
              0.13984683 = queryWeight, product of:
                1.6324668 = boost
                5.1571736 = idf(docFreq=676, maxDocs=43254)
                0.016611028 = queryNorm
              0.48348504 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1571736 = idf(docFreq=676, maxDocs=43254)
                0.09375 = fieldNorm(doc=3417)
          0.15292685 = weight(abstract_txt:fiction in 3417) [ClassicSimilarity], result of:
            0.15292685 = score(doc=3417,freq=1.0), product of:
              0.24096353 = queryWeight, product of:
                2.1428595 = boost
                6.7695704 = idf(docFreq=134, maxDocs=43254)
                0.016611028 = queryNorm
              0.63464725 = fieldWeight in 3417, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7695704 = idf(docFreq=134, maxDocs=43254)
                0.09375 = fieldNorm(doc=3417)
          0.08335221 = weight(abstract_txt:subject in 3417) [ClassicSimilarity], result of:
            0.08335221 = score(doc=3417,freq=2.0), product of:
              0.16078305 = queryWeight, product of:
                2.4754443 = boost
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.016611028 = queryNorm
              0.51841414 = fieldWeight in 3417, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.09375 = fieldNorm(doc=3417)
          0.62350446 = weight(abstract_txt:novels in 3417) [ClassicSimilarity], result of:
            0.62350446 = score(doc=3417,freq=2.0), product of:
              0.558739 = queryWeight, product of:
                3.996393 = boost
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.016611028 = queryNorm
              1.1159136 = fieldWeight in 3417, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.09375 = fieldNorm(doc=3417)
        0.16 = coord(4/25)
    
  3. Mowery, R.L.: Spanish subject headings in ILLINET online (1995) 0.08
    0.08367049 = sum of:
      0.08367049 = product of:
        0.5229406 = sum of:
          0.1261759 = weight(abstract_txt:illinois in 2498) [ClassicSimilarity], result of:
            0.1261759 = score(doc=2498,freq=1.0), product of:
              0.13888136 = queryWeight, product of:
                1.1503367 = boost
                7.2681255 = idf(docFreq=81, maxDocs=43254)
                0.016611028 = queryNorm
              0.9085157 = fieldWeight in 2498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.2681255 = idf(docFreq=81, maxDocs=43254)
                0.125 = fieldNorm(doc=2498)
          0.12948093 = weight(abstract_txt:assign in 2498) [ClassicSimilarity], result of:
            0.12948093 = score(doc=2498,freq=1.0), product of:
              0.14129612 = queryWeight, product of:
                1.1602943 = boost
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.016611028 = queryNorm
              0.9163799 = fieldWeight in 2498, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.3310394 = idf(docFreq=76, maxDocs=43254)
                0.125 = fieldNorm(doc=2498)
          0.1561475 = weight(abstract_txt:headings in 2498) [ClassicSimilarity], result of:
            0.1561475 = score(doc=2498,freq=3.0), product of:
              0.13984683 = queryWeight, product of:
                1.6324668 = boost
                5.1571736 = idf(docFreq=676, maxDocs=43254)
                0.016611028 = queryNorm
              1.1165608 = fieldWeight in 2498, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.1571736 = idf(docFreq=676, maxDocs=43254)
                0.125 = fieldNorm(doc=2498)
          0.11113628 = weight(abstract_txt:subject in 2498) [ClassicSimilarity], result of:
            0.11113628 = score(doc=2498,freq=2.0), product of:
              0.16078305 = queryWeight, product of:
                2.4754443 = boost
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.016611028 = queryNorm
              0.69121885 = fieldWeight in 2498, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.125 = fieldNorm(doc=2498)
        0.16 = coord(4/25)
    
  4. Ekvall, I.-L.; Larsson, S.: EDVIN - a search system for fiction based on the experience of users' needs (1997) 0.08
    0.08350755 = sum of:
      0.08350755 = product of:
        0.69589627 = sum of:
          0.15292685 = weight(abstract_txt:fiction in 3832) [ClassicSimilarity], result of:
            0.15292685 = score(doc=3832,freq=1.0), product of:
              0.24096353 = queryWeight, product of:
                2.1428595 = boost
                6.7695704 = idf(docFreq=134, maxDocs=43254)
                0.016611028 = queryNorm
              0.63464725 = fieldWeight in 3832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7695704 = idf(docFreq=134, maxDocs=43254)
                0.09375 = fieldNorm(doc=3832)
          0.1020852 = weight(abstract_txt:subject in 3832) [ClassicSimilarity], result of:
            0.1020852 = score(doc=3832,freq=3.0), product of:
              0.16078305 = queryWeight, product of:
                2.4754443 = boost
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.016611028 = queryNorm
              0.6349251 = fieldWeight in 3832, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.9101245 = idf(docFreq=2355, maxDocs=43254)
                0.09375 = fieldNorm(doc=3832)
          0.44088426 = weight(abstract_txt:novels in 3832) [ClassicSimilarity], result of:
            0.44088426 = score(doc=3832,freq=1.0), product of:
              0.558739 = queryWeight, product of:
                3.996393 = boost
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.016611028 = queryNorm
              0.7890701 = fieldWeight in 3832, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.09375 = fieldNorm(doc=3832)
        0.12 = coord(3/25)
    
  5. Andersson, R.; Holst, E.: Indexes and other depictions of fictions : a new model for analysis empirically tested (1996) 0.08
    0.07733751 = sum of:
      0.07733751 = product of:
        0.6444793 = sum of:
          0.056344874 = weight(abstract_txt:headings in 1474) [ClassicSimilarity], result of:
            0.056344874 = score(doc=1474,freq=1.0), product of:
              0.13984683 = queryWeight, product of:
                1.6324668 = boost
                5.1571736 = idf(docFreq=676, maxDocs=43254)
                0.016611028 = queryNorm
              0.40290418 = fieldWeight in 1474, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1571736 = idf(docFreq=676, maxDocs=43254)
                0.078125 = fieldNorm(doc=1474)
          0.22073087 = weight(abstract_txt:fiction in 1474) [ClassicSimilarity], result of:
            0.22073087 = score(doc=1474,freq=3.0), product of:
              0.24096353 = queryWeight, product of:
                2.1428595 = boost
                6.7695704 = idf(docFreq=134, maxDocs=43254)
                0.016611028 = queryNorm
              0.91603434 = fieldWeight in 1474, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.7695704 = idf(docFreq=134, maxDocs=43254)
                0.078125 = fieldNorm(doc=1474)
          0.36740354 = weight(abstract_txt:novels in 1474) [ClassicSimilarity], result of:
            0.36740354 = score(doc=1474,freq=1.0), product of:
              0.558739 = queryWeight, product of:
                3.996393 = boost
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.016611028 = queryNorm
              0.65755844 = fieldWeight in 1474, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.416748 = idf(docFreq=25, maxDocs=43254)
                0.078125 = fieldNorm(doc=1474)
        0.12 = coord(3/25)