Document (#42482)

Author
Short, M.
Title
Text mining and subject analysis for fiction; or, using machine learning and information extraction to assign subject headings to dime novels
Source
Cataloging and classification quarterly. 57(2019) no.5, S.315-336
Year
2019
Abstract
This article describes multiple experiments in text mining at Northern Illinois University that were undertaken to improve the efficiency and accuracy of cataloging. It focuses narrowly on subject analysis of dime novels, a format of inexpensive fiction that was popular in the United States between 1860 and 1915. NIU holds more than 55,000 dime novels in its collections, which it is in the process of comprehensively digitizing. Classification, keyword extraction, named-entity recognition, clustering, and topic modeling are discussed as means of assigning subject headings to improve their discoverability by researchers and to increase the productivity of digitization workflows.
Content
Vgl.: https://doi.org/10.1080/01639374.2019.1653413.
Theme
Schöne Literatur
Automatisches Indexieren
Data Mining
Inhaltsanalyse

Similar documents (content)

  1. Wolfe, EW.: a case study in automated metadata enhancement : Natural Language Processing in the humanities (2019) 0.16
    0.16251522 = sum of:
      0.16251522 = product of:
        0.6771468 = sum of:
          0.028607687 = weight(abstract_txt:analysis in 5236) [ClassicSimilarity], result of:
            0.028607687 = score(doc=5236,freq=2.0), product of:
              0.07087013 = queryWeight, product of:
                1.1642256 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.016661406 = queryNorm
              0.40366352 = fieldWeight in 5236, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.078125 = fieldNorm(doc=5236)
          0.038790897 = weight(abstract_txt:text in 5236) [ClassicSimilarity], result of:
            0.038790897 = score(doc=5236,freq=2.0), product of:
              0.08682163 = queryWeight, product of:
                1.288604 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016661406 = queryNorm
              0.44678837 = fieldWeight in 5236, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=5236)
          0.0976844 = weight(abstract_txt:mining in 5236) [ClassicSimilarity], result of:
            0.0976844 = score(doc=5236,freq=1.0), product of:
              0.20247352 = queryWeight, product of:
                1.9678394 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.016661406 = queryNorm
              0.4824552 = fieldWeight in 5236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=5236)
          0.0984518 = weight(abstract_txt:extraction in 5236) [ClassicSimilarity], result of:
            0.0984518 = score(doc=5236,freq=1.0), product of:
              0.20353255 = queryWeight, product of:
                1.972979 = boost
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.016661406 = queryNorm
              0.48371527 = fieldWeight in 5236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1915555 = idf(docFreq=245, maxDocs=44218)
                0.078125 = fieldNorm(doc=5236)
          0.04947557 = weight(abstract_txt:subject in 5236) [ClassicSimilarity], result of:
            0.04947557 = score(doc=5236,freq=1.0), product of:
              0.16208965 = queryWeight, product of:
                2.489993 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.016661406 = queryNorm
              0.30523583 = fieldWeight in 5236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.078125 = fieldNorm(doc=5236)
          0.3641364 = weight(abstract_txt:novels in 5236) [ClassicSimilarity], result of:
            0.3641364 = score(doc=5236,freq=1.0), product of:
              0.55721724 = queryWeight, product of:
                3.99819 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.016661406 = queryNorm
              0.6534909 = fieldWeight in 5236, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=5236)
        0.24 = coord(6/25)
    
  2. Sauperl, A.: Four views of a novel : characteristics of novels as described by publishers, librarians, literary theorists, and readers (2013) 0.15
    0.14774518 = sum of:
      0.14774518 = product of:
        0.9234074 = sum of:
          0.06804813 = weight(abstract_txt:headings in 1952) [ClassicSimilarity], result of:
            0.06804813 = score(doc=1952,freq=1.0), product of:
              0.14089903 = queryWeight, product of:
                1.6415704 = boost
                5.1515374 = idf(docFreq=695, maxDocs=44218)
                0.016661406 = queryNorm
              0.48295665 = fieldWeight in 1952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1515374 = idf(docFreq=695, maxDocs=44218)
                0.09375 = fieldNorm(doc=1952)
          0.15343656 = weight(abstract_txt:fiction in 1952) [ClassicSimilarity], result of:
            0.15343656 = score(doc=1952,freq=1.0), product of:
              0.2422794 = queryWeight, product of:
                2.1526022 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.016661406 = queryNorm
              0.6333042 = fieldWeight in 1952, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.09375 = fieldNorm(doc=1952)
          0.08396282 = weight(abstract_txt:subject in 1952) [ClassicSimilarity], result of:
            0.08396282 = score(doc=1952,freq=2.0), product of:
              0.16208965 = queryWeight, product of:
                2.489993 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.016661406 = queryNorm
              0.5180024 = fieldWeight in 1952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.09375 = fieldNorm(doc=1952)
          0.61795986 = weight(abstract_txt:novels in 1952) [ClassicSimilarity], result of:
            0.61795986 = score(doc=1952,freq=2.0), product of:
              0.55721724 = queryWeight, product of:
                3.99819 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.016661406 = queryNorm
              1.1090107 = fieldWeight in 1952, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.09375 = fieldNorm(doc=1952)
        0.16 = coord(4/25)
    
  3. Moulaison-Sandy, H.; Adkins, D.; Bossaller, J.; Cho, H.: ¬An automated approach to describing fiction : a methodology to use book reviews to identify affect (2021) 0.15
    0.14534457 = sum of:
      0.14534457 = product of:
        0.6056024 = sum of:
          0.024274427 = weight(abstract_txt:analysis in 710) [ClassicSimilarity], result of:
            0.024274427 = score(doc=710,freq=1.0), product of:
              0.07087013 = queryWeight, product of:
                1.1642256 = boost
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.016661406 = queryNorm
              0.34251985 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.6535451 = idf(docFreq=3112, maxDocs=44218)
                0.09375 = fieldNorm(doc=710)
          0.046549074 = weight(abstract_txt:text in 710) [ClassicSimilarity], result of:
            0.046549074 = score(doc=710,freq=2.0), product of:
              0.08682163 = queryWeight, product of:
                1.288604 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016661406 = queryNorm
              0.53614604 = fieldWeight in 710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=710)
          0.06804813 = weight(abstract_txt:headings in 710) [ClassicSimilarity], result of:
            0.06804813 = score(doc=710,freq=1.0), product of:
              0.14089903 = queryWeight, product of:
                1.6415704 = boost
                5.1515374 = idf(docFreq=695, maxDocs=44218)
                0.016661406 = queryNorm
              0.48295665 = fieldWeight in 710, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.1515374 = idf(docFreq=695, maxDocs=44218)
                0.09375 = fieldNorm(doc=710)
          0.16577592 = weight(abstract_txt:mining in 710) [ClassicSimilarity], result of:
            0.16577592 = score(doc=710,freq=2.0), product of:
              0.20247352 = queryWeight, product of:
                1.9678394 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.016661406 = queryNorm
              0.8187536 = fieldWeight in 710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.09375 = fieldNorm(doc=710)
          0.21699205 = weight(abstract_txt:fiction in 710) [ClassicSimilarity], result of:
            0.21699205 = score(doc=710,freq=2.0), product of:
              0.2422794 = queryWeight, product of:
                2.1526022 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.016661406 = queryNorm
              0.8956273 = fieldWeight in 710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.09375 = fieldNorm(doc=710)
          0.08396282 = weight(abstract_txt:subject in 710) [ClassicSimilarity], result of:
            0.08396282 = score(doc=710,freq=2.0), product of:
              0.16208965 = queryWeight, product of:
                2.489993 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.016661406 = queryNorm
              0.5180024 = fieldWeight in 710, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.09375 = fieldNorm(doc=710)
        0.24 = coord(6/25)
    
  4. Becnel, K.; Moeller, R.A.: Graphic novels in the school library : questions of cataloging, classification, and arrangement (2022) 0.10
    0.100272186 = sum of:
      0.100272186 = product of:
        0.83560157 = sum of:
          0.12657869 = weight(abstract_txt:fiction in 1107) [ClassicSimilarity], result of:
            0.12657869 = score(doc=1107,freq=2.0), product of:
              0.2422794 = queryWeight, product of:
                2.1526022 = boost
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.016661406 = queryNorm
              0.52244925 = fieldWeight in 1107, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.7552447 = idf(docFreq=139, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1107)
          0.0346329 = weight(abstract_txt:subject in 1107) [ClassicSimilarity], result of:
            0.0346329 = score(doc=1107,freq=1.0), product of:
              0.16208965 = queryWeight, product of:
                2.489993 = boost
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.016661406 = queryNorm
              0.21366508 = fieldWeight in 1107, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9070187 = idf(docFreq=2415, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1107)
          0.67438996 = weight(abstract_txt:novels in 1107) [ClassicSimilarity], result of:
            0.67438996 = score(doc=1107,freq=7.0), product of:
              0.55721724 = queryWeight, product of:
                3.99819 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.016661406 = queryNorm
              1.210282 = fieldWeight in 1107, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.0546875 = fieldNorm(doc=1107)
        0.12 = coord(3/25)
    
  5. Lowe, D.B.; Dollinger, I.; Koster, T.; Herbert, B.E.: Text mining for type of research classification (2021) 0.10
    0.09719097 = sum of:
      0.09719097 = product of:
        0.48595482 = sum of:
          0.094296984 = weight(abstract_txt:workflows in 720) [ClassicSimilarity], result of:
            0.094296984 = score(doc=720,freq=1.0), product of:
              0.15696637 = queryWeight, product of:
                1.2251629 = boost
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.016661406 = queryNorm
              0.6007464 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.689554 = idf(docFreq=54, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.027429305 = weight(abstract_txt:text in 720) [ClassicSimilarity], result of:
            0.027429305 = score(doc=720,freq=1.0), product of:
              0.08682163 = queryWeight, product of:
                1.288604 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.016661406 = queryNorm
              0.3159271 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.14442277 = weight(abstract_txt:discoverability in 720) [ClassicSimilarity], result of:
            0.14442277 = score(doc=720,freq=1.0), product of:
              0.2085604 = queryWeight, product of:
                1.4122334 = boost
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.016661406 = queryNorm
              0.69247454 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                8.863674 = idf(docFreq=16, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.050611414 = weight(abstract_txt:improve in 720) [ClassicSimilarity], result of:
            0.050611414 = score(doc=720,freq=1.0), product of:
              0.1306122 = queryWeight, product of:
                1.5805105 = boost
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.016661406 = queryNorm
              0.3874938 = fieldWeight in 720, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9599204 = idf(docFreq=842, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
          0.16919436 = weight(abstract_txt:mining in 720) [ClassicSimilarity], result of:
            0.16919436 = score(doc=720,freq=3.0), product of:
              0.20247352 = queryWeight, product of:
                1.9678394 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.016661406 = queryNorm
              0.8356369 = fieldWeight in 720, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=720)
        0.2 = coord(5/25)