Document (#39698)

Author
Perovsek, M.
Kranjca, J.
Erjaveca, T.
Cestnika, B.
Lavraca, N.
Title
TextFlows : a visual programming platform for text mining and natural language processing
Source
Science of computer programming. In Press, 2016
Year
2016
Abstract
Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.
Content
Vgl.: http://www.sciencedirect.com/science/article/pii/S0167642316000113. Vgl. auch: http://textflows.org.
Theme
Computerlinguistik
Object
TextFlows

Similar documents (content)

  1. Barrio, P.; Gravano, L.: Sampling strategies for information extraction over the deep web (2017) 0.24
    0.23580001 = sum of:
      0.23580001 = product of:
        0.73687506 = sum of:
          0.013493652 = weight(abstract_txt:paper in 3412) [ClassicSimilarity], result of:
            0.013493652 = score(doc=3412,freq=2.0), product of:
              0.05031825 = queryWeight, product of:
                1.009635 = boost
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.014373422 = queryNorm
              0.26816618 = fieldWeight in 3412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.467376 = idf(docFreq=3749, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.04048186 = weight(abstract_txt:document in 3412) [ClassicSimilarity], result of:
            0.04048186 = score(doc=3412,freq=5.0), product of:
              0.07711984 = queryWeight, product of:
                1.2499272 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014373422 = queryNorm
              0.5249215 = fieldWeight in 3412, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.063632175 = weight(abstract_txt:natural in 3412) [ClassicSimilarity], result of:
            0.063632175 = score(doc=3412,freq=2.0), product of:
              0.16197728 = queryWeight, product of:
                2.218576 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.014373422 = queryNorm
              0.39284632 = fieldWeight in 3412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.18051298 = weight(abstract_txt:execution in 3412) [ClassicSimilarity], result of:
            0.18051298 = score(doc=3412,freq=2.0), product of:
              0.2835599 = queryWeight, product of:
                2.3967571 = boost
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.014373422 = queryNorm
              0.6365956 = fieldWeight in 3412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.231152 = idf(docFreq=31, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.047351748 = weight(abstract_txt:language in 3412) [ClassicSimilarity], result of:
            0.047351748 = score(doc=3412,freq=2.0), product of:
              0.14639957 = queryWeight, product of:
                2.435491 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014373422 = queryNorm
              0.32344183 = fieldWeight in 3412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.1401584 = weight(abstract_txt:text in 3412) [ClassicSimilarity], result of:
            0.1401584 = score(doc=3412,freq=7.0), product of:
              0.23954397 = queryWeight, product of:
                4.1212435 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014373422 = queryNorm
              0.5851051 = fieldWeight in 3412, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.11648687 = weight(abstract_txt:processing in 3412) [ClassicSimilarity], result of:
            0.11648687 = score(doc=3412,freq=2.0), product of:
              0.30539662 = queryWeight, product of:
                4.308185 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014373422 = queryNorm
              0.38142815 = fieldWeight in 3412, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
          0.13475738 = weight(abstract_txt:mining in 3412) [ClassicSimilarity], result of:
            0.13475738 = score(doc=3412,freq=1.0), product of:
              0.39902264 = queryWeight, product of:
                4.4954214 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.014373422 = queryNorm
              0.33771864 = fieldWeight in 3412, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0546875 = fieldNorm(doc=3412)
        0.32 = coord(8/25)
    
  2. Haravu, L.J.; Neelameghan, A.: Text mining and data mining in knowledge organization and discovery : the making of knowledge-based products (2003) 0.21
    0.21225888 = sum of:
      0.21225888 = product of:
        0.7580674 = sum of:
          0.020805407 = weight(abstract_txt:presents in 5653) [ClassicSimilarity], result of:
            0.020805407 = score(doc=5653,freq=1.0), product of:
              0.077405535 = queryWeight, product of:
                1.2522402 = boost
                4.300552 = idf(docFreq=1629, maxDocs=44218)
                0.014373422 = queryNorm
              0.2687845 = fieldWeight in 5653, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.300552 = idf(docFreq=1629, maxDocs=44218)
                0.0625 = fieldNorm(doc=5653)
          0.027896939 = weight(abstract_txt:applications in 5653) [ClassicSimilarity], result of:
            0.027896939 = score(doc=5653,freq=1.0), product of:
              0.09412223 = queryWeight, product of:
                1.3808539 = boost
                4.7422485 = idf(docFreq=1047, maxDocs=44218)
                0.014373422 = queryNorm
              0.29639053 = fieldWeight in 5653, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7422485 = idf(docFreq=1047, maxDocs=44218)
                0.0625 = fieldNorm(doc=5653)
          0.051422566 = weight(abstract_txt:natural in 5653) [ClassicSimilarity], result of:
            0.051422566 = score(doc=5653,freq=1.0), product of:
              0.16197728 = queryWeight, product of:
                2.218576 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.014373422 = queryNorm
              0.31746778 = fieldWeight in 5653, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0625 = fieldNorm(doc=5653)
          0.03826599 = weight(abstract_txt:language in 5653) [ClassicSimilarity], result of:
            0.03826599 = score(doc=5653,freq=1.0), product of:
              0.14639957 = queryWeight, product of:
                2.435491 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014373422 = queryNorm
              0.26138046 = fieldWeight in 5653, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=5653)
          0.14829883 = weight(abstract_txt:text in 5653) [ClassicSimilarity], result of:
            0.14829883 = score(doc=5653,freq=6.0), product of:
              0.23954397 = queryWeight, product of:
                4.1212435 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014373422 = queryNorm
              0.6190881 = fieldWeight in 5653, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=5653)
          0.094135605 = weight(abstract_txt:processing in 5653) [ClassicSimilarity], result of:
            0.094135605 = score(doc=5653,freq=1.0), product of:
              0.30539662 = queryWeight, product of:
                4.308185 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014373422 = queryNorm
              0.3082405 = fieldWeight in 5653, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=5653)
          0.3772421 = weight(abstract_txt:mining in 5653) [ClassicSimilarity], result of:
            0.3772421 = score(doc=5653,freq=6.0), product of:
              0.39902264 = queryWeight, product of:
                4.4954214 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.014373422 = queryNorm
              0.94541526 = fieldWeight in 5653, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=5653)
        0.28 = coord(7/25)
    
  3. Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.21
    0.21145192 = sum of:
      0.21145192 = product of:
        0.8810497 = sum of:
          0.027896939 = weight(abstract_txt:applications in 4019) [ClassicSimilarity], result of:
            0.027896939 = score(doc=4019,freq=1.0), product of:
              0.09412223 = queryWeight, product of:
                1.3808539 = boost
                4.7422485 = idf(docFreq=1047, maxDocs=44218)
                0.014373422 = queryNorm
              0.29639053 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7422485 = idf(docFreq=1047, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.051422566 = weight(abstract_txt:natural in 4019) [ClassicSimilarity], result of:
            0.051422566 = score(doc=4019,freq=1.0), product of:
              0.16197728 = queryWeight, product of:
                2.218576 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.014373422 = queryNorm
              0.31746778 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.05411628 = weight(abstract_txt:language in 4019) [ClassicSimilarity], result of:
            0.05411628 = score(doc=4019,freq=2.0), product of:
              0.14639957 = queryWeight, product of:
                2.435491 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014373422 = queryNorm
              0.3696478 = fieldWeight in 4019, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.19145297 = weight(abstract_txt:text in 4019) [ClassicSimilarity], result of:
            0.19145297 = score(doc=4019,freq=10.0), product of:
              0.23954397 = queryWeight, product of:
                4.1212435 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014373422 = queryNorm
              0.79923934 = fieldWeight in 4019, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.094135605 = weight(abstract_txt:processing in 4019) [ClassicSimilarity], result of:
            0.094135605 = score(doc=4019,freq=1.0), product of:
              0.30539662 = queryWeight, product of:
                4.308185 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014373422 = queryNorm
              0.3082405 = fieldWeight in 4019, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
          0.4620253 = weight(abstract_txt:mining in 4019) [ClassicSimilarity], result of:
            0.4620253 = score(doc=4019,freq=9.0), product of:
              0.39902264 = queryWeight, product of:
                4.4954214 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.014373422 = queryNorm
              1.1578925 = fieldWeight in 4019, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.0625 = fieldNorm(doc=4019)
        0.24 = coord(6/25)
    
  4. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.20
    0.1976123 = sum of:
      0.1976123 = product of:
        0.7057582 = sum of:
          0.052976508 = weight(abstract_txt:speech in 3015) [ClassicSimilarity], result of:
            0.052976508 = score(doc=3015,freq=1.0), product of:
              0.09872491 = queryWeight, product of:
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.014373422 = queryNorm
              0.5366073 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8685737 = idf(docFreq=124, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.06427821 = weight(abstract_txt:natural in 3015) [ClassicSimilarity], result of:
            0.06427821 = score(doc=3015,freq=1.0), product of:
              0.16197728 = queryWeight, product of:
                2.218576 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.014373422 = queryNorm
              0.39683473 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.079599224 = weight(abstract_txt:construction in 3015) [ClassicSimilarity], result of:
            0.079599224 = score(doc=3015,freq=1.0), product of:
              0.1867888 = queryWeight, product of:
                2.382444 = boost
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.014373422 = queryNorm
              0.4261456 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.4546638 = idf(docFreq=513, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.06764535 = weight(abstract_txt:language in 3015) [ClassicSimilarity], result of:
            0.06764535 = score(doc=3015,freq=2.0), product of:
              0.14639957 = queryWeight, product of:
                2.435491 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014373422 = queryNorm
              0.46205974 = fieldWeight in 3015, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.13107888 = weight(abstract_txt:text in 3015) [ClassicSimilarity], result of:
            0.13107888 = score(doc=3015,freq=3.0), product of:
              0.23954397 = queryWeight, product of:
                4.1212435 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014373422 = queryNorm
              0.54720175 = fieldWeight in 3015, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.11766951 = weight(abstract_txt:processing in 3015) [ClassicSimilarity], result of:
            0.11766951 = score(doc=3015,freq=1.0), product of:
              0.30539662 = queryWeight, product of:
                4.308185 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014373422 = queryNorm
              0.38530064 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
          0.19251055 = weight(abstract_txt:mining in 3015) [ClassicSimilarity], result of:
            0.19251055 = score(doc=3015,freq=1.0), product of:
              0.39902264 = queryWeight, product of:
                4.4954214 = boost
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.014373422 = queryNorm
              0.4824552 = fieldWeight in 3015, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.1754265 = idf(docFreq=249, maxDocs=44218)
                0.078125 = fieldNorm(doc=3015)
        0.28 = coord(7/25)
    
  5. Taylor, S.L.: Integrating natural language understanding with document structure analysis (1994) 0.19
    0.18588866 = sum of:
      0.18588866 = product of:
        0.77453613 = sum of:
          0.06939748 = weight(abstract_txt:document in 1794) [ClassicSimilarity], result of:
            0.06939748 = score(doc=1794,freq=5.0), product of:
              0.07711984 = queryWeight, product of:
                1.2499272 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.014373422 = queryNorm
              0.8998654 = fieldWeight in 1794, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.041845407 = weight(abstract_txt:applications in 1794) [ClassicSimilarity], result of:
            0.041845407 = score(doc=1794,freq=1.0), product of:
              0.09412223 = queryWeight, product of:
                1.3808539 = boost
                4.7422485 = idf(docFreq=1047, maxDocs=44218)
                0.014373422 = queryNorm
              0.4445858 = fieldWeight in 1794, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7422485 = idf(docFreq=1047, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.10908373 = weight(abstract_txt:natural in 1794) [ClassicSimilarity], result of:
            0.10908373 = score(doc=1794,freq=2.0), product of:
              0.16197728 = queryWeight, product of:
                2.218576 = boost
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.014373422 = queryNorm
              0.6734508 = fieldWeight in 1794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.0794845 = idf(docFreq=747, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.081174426 = weight(abstract_txt:language in 1794) [ClassicSimilarity], result of:
            0.081174426 = score(doc=1794,freq=2.0), product of:
              0.14639957 = queryWeight, product of:
                2.435491 = boost
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.014373422 = queryNorm
              0.55447173 = fieldWeight in 1794, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.1820874 = idf(docFreq=1834, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.15729466 = weight(abstract_txt:text in 1794) [ClassicSimilarity], result of:
            0.15729466 = score(doc=1794,freq=3.0), product of:
              0.23954397 = queryWeight, product of:
                4.1212435 = boost
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.014373422 = queryNorm
              0.6566421 = fieldWeight in 1794, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.0438666 = idf(docFreq=2106, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
          0.31574044 = weight(abstract_txt:processing in 1794) [ClassicSimilarity], result of:
            0.31574044 = score(doc=1794,freq=5.0), product of:
              0.30539662 = queryWeight, product of:
                4.308185 = boost
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.014373422 = queryNorm
              1.0338701 = fieldWeight in 1794, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                4.931848 = idf(docFreq=866, maxDocs=44218)
                0.09375 = fieldNorm(doc=1794)
        0.24 = coord(6/25)