Search (274 results, page 1 of 14)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.33

0.3290356 = product of:
  0.7520814 = sum of:
    0.050968137 = product of:
      0.1529044 = sum of:
        0.1529044 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1529044 = score(doc=562,freq=2.0), product of:
            0.27206317 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.032090448 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.0764522 = product of:
      0.1529044 = sum of:
        0.1529044 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.1529044 = score(doc=562,freq=2.0), product of:
            0.27206317 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.032090448 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.1529044 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.1529044 = score(doc=562,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.013043438 = product of:
      0.026086876 = sum of:
        0.026086876 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.026086876 = score(doc=562,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.4375 = coord(7/16)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Noever, D.; Ciolino, M.: ¬The Turing deception (2022) 0.28

0.27713922 = product of:
  0.73903793 = sum of:
    0.050968137 = product of:
      0.1529044 = sum of:
        0.1529044 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.1529044 = score(doc=862,freq=2.0), product of:
            0.27206317 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.032090448 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.33333334 = coord(1/3)
    0.1529044 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1529044 = score(doc=862,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.1529044 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1529044 = score(doc=862,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.0764522 = product of:
      0.1529044 = sum of:
        0.1529044 = weight(_text_:3a in 862) [ClassicSimilarity], result of:
          0.1529044 = score(doc=862,freq=2.0), product of:
            0.27206317 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.032090448 = queryNorm
            0.56201804 = fieldWeight in 862, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=862)
      0.5 = coord(1/2)
    0.1529044 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1529044 = score(doc=862,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
    0.1529044 = weight(_text_:2f in 862) [ClassicSimilarity], result of:
      0.1529044 = score(doc=862,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 862, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=862)
  0.375 = coord(6/16)

Source: https%3A%2F%2Farxiv.org%2Fabs%2F2212.06721&usg=AOvVaw3i_9pZm9y_dQWoHi6uv0EN

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.20

0.19520658 = product of:
  0.6246611 = sum of:
    0.1529044 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1529044 = score(doc=563,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.1529044 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1529044 = score(doc=563,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.1529044 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1529044 = score(doc=563,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.1529044 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.1529044 = score(doc=563,freq=2.0), product of:
        0.27206317 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.032090448 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.013043438 = product of:
      0.026086876 = sum of:
        0.026086876 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.026086876 = score(doc=563,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.3125 = coord(5/16)

Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Luo, L.; Ju, J.; Li, Y.-F.; Haffari, G.; Xiong, B.; Pan, S.: ChatRule: mining logical rules with large language models for knowledge graph reasoning (2023) 0.04

0.035497397 = product of:
  0.18931946 = sum of:
    0.05948331 = product of:
      0.11896662 = sum of:
        0.11896662 = weight(_text_:rules in 1171) [ClassicSimilarity], result of:
          0.11896662 = score(doc=1171,freq=14.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.7360998 = fieldWeight in 1171, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1171)
      0.5 = coord(1/2)
    0.11896662 = weight(_text_:rules in 1171) [ClassicSimilarity], result of:
      0.11896662 = score(doc=1171,freq=14.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.7360998 = fieldWeight in 1171, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1171)
    0.010869532 = product of:
      0.021739064 = sum of:
        0.021739064 = weight(_text_:22 in 1171) [ClassicSimilarity], result of:
          0.021739064 = score(doc=1171,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.19345059 = fieldWeight in 1171, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1171)
      0.5 = coord(1/2)
  0.1875 = coord(3/16)

Abstract: Logical rules are essential for uncovering the logical connections between relations, which could improve the reasoning performance and provide interpretable results on knowledge graphs (KGs). Although there have been many efforts to mine meaningful logical rules over KGs, existing methods suffer from the computationally intensive searches over the rule space and a lack of scalability for large-scale KGs. Besides, they often ignore the semantics of relations which is crucial for uncovering logical connections. Recently, large language models (LLMs) have shown impressive performance in the field of natural language processing and various applications, owing to their emergent ability and generalizability. In this paper, we propose a novel framework, ChatRule, unleashing the power of large language models for mining logical rules over knowledge graphs. Specifically, the framework is initiated with an LLM-based rule generator, leveraging both the semantic and structural information of KGs to prompt LLMs to generate logical rules. To refine the generated rules, a rule ranking module estimates the rule quality by incorporating facts from existing KGs. Last, a rule validator harnesses the reasoning ability of LLMs to validate the logical correctness of ranked rules through chain-of-thought reasoning. ChatRule is evaluated on four large-scale KGs, w.r.t. different rule quality metrics and downstream tasks, showing the effectiveness and scalability of our method.
Date: 23.11.2023 19:07:22

Ahmad, F.; Yusoff, M.; Sembok, T.M.T.: Experiments with a stemming algorithm for Malay words (1996) 0.03

0.034797486 = product of:
  0.18558659 = sum of:
    0.050872266 = product of:
      0.10174453 = sum of:
        0.10174453 = weight(_text_:rules in 6504) [ClassicSimilarity], result of:
          0.10174453 = score(doc=6504,freq=4.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.629539 = fieldWeight in 6504, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0625 = fieldNorm(doc=6504)
      0.5 = coord(1/2)
    0.03296979 = weight(_text_:american in 6504) [ClassicSimilarity], result of:
      0.03296979 = score(doc=6504,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.30134758 = fieldWeight in 6504, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0625 = fieldNorm(doc=6504)
    0.10174453 = weight(_text_:rules in 6504) [ClassicSimilarity], result of:
      0.10174453 = score(doc=6504,freq=4.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.629539 = fieldWeight in 6504, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0625 = fieldNorm(doc=6504)
  0.1875 = coord(3/16)

Abstract: Stemming is used in information retrieval systems to reduce variant word forms to common roots in order to improve retrieval effectiveness. As in other languages, there is a need for an effective stemming algorithm for the indexing and retrieval of Malay documents. The Malay stemming algorithm developed by Othman is studied and new versions proposed to enhance its performance. The improvements relate to the order in which the dictionary id looked-up, the order in which the morphological rules are applied, and the number of rules
Source: Journal of the American Society for Information Science. 47(1996) no.12, S.909-918

Whitelock, P.; Kilby, K.: Linguistic and computational techniques in machine translation system design : 2nd ed (1995) 0.03

0.033434153 = product of:
  0.17831549 = sum of:
    0.044218604 = weight(_text_:26 in 1750) [ClassicSimilarity], result of:
      0.044218604 = score(doc=1750,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.3901819 = fieldWeight in 1750, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.078125 = fieldNorm(doc=1750)
    0.11168019 = weight(_text_:2nd in 1750) [ClassicSimilarity], result of:
      0.11168019 = score(doc=1750,freq=2.0), product of:
        0.18010403 = queryWeight, product of:
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.032090448 = queryNorm
        0.6200871 = fieldWeight in 1750, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.078125 = fieldNorm(doc=1750)
    0.022416692 = product of:
      0.044833384 = sum of:
        0.044833384 = weight(_text_:ed in 1750) [ClassicSimilarity], result of:
          0.044833384 = score(doc=1750,freq=2.0), product of:
            0.11411327 = queryWeight, product of:
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.032090448 = queryNorm
            0.39288494 = fieldWeight in 1750, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.078125 = fieldNorm(doc=1750)
      0.5 = coord(1/2)
  0.1875 = coord(3/16)

Date: 26. 7.2002 21:21:16

Losee, R.M.: Learning syntactic rules and tags with genetic algorithms for information retrieval and filtering : an empirical basis for grammatical rules (1996) 0.02

0.024781879 = product of:
  0.19825503 = sum of:
    0.06608501 = product of:
      0.13217002 = sum of:
        0.13217002 = weight(_text_:rules in 4068) [ClassicSimilarity], result of:
          0.13217002 = score(doc=4068,freq=12.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.81779516 = fieldWeight in 4068, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=4068)
      0.5 = coord(1/2)
    0.13217002 = weight(_text_:rules in 4068) [ClassicSimilarity], result of:
      0.13217002 = score(doc=4068,freq=12.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.81779516 = fieldWeight in 4068, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.046875 = fieldNorm(doc=4068)
  0.125 = coord(2/16)

Abstract: The grammars of natural languages may be learned by using genetic algorithms that reproduce and mutate grammatical rules and parts of speech tags, improving the quality of later generations of grammatical components. Syntactic rules are randomly generated and then evolve; those rules resulting in improved parsing and occasionally improved filtering performance are allowed to further propagate. The LUST system learns the characteristics of the language or subkanguage used in document abstracts by learning from the document rankings obtained from the parsed abstracts. Unlike the application of traditional linguistic rules to retrieval and filtering applications, LUST develops grammatical structures and tags without the prior imposition of some common grammatical assumptions (e.g. part of speech assumptions), producing grammars that are empirically based and are optimized for this particular application

Driscoll, J.R.; Rajala, D.A.; Shaffer, W.H.: ¬The operation and performance of an artificially intelligent keywording system (1991) 0.02

0.023606706 = product of:
  0.18885365 = sum of:
    0.062951215 = product of:
      0.12590243 = sum of:
        0.12590243 = weight(_text_:rules in 6681) [ClassicSimilarity], result of:
          0.12590243 = score(doc=6681,freq=8.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.77901477 = fieldWeight in 6681, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6681)
      0.5 = coord(1/2)
    0.12590243 = weight(_text_:rules in 6681) [ClassicSimilarity], result of:
      0.12590243 = score(doc=6681,freq=8.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.77901477 = fieldWeight in 6681, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6681)
  0.125 = coord(2/16)

Abstract: Presents a new approach to text analysis for automating the key phrase indexing process, using artificial intelligence techniques. This mimics the behaviour of human experts by using a rule base consisting of insertion and deletion rules generated by subject-matter experts. The insertion rules are based on the idea that some phrases found in a text imply or trigger other phrases. The deletion rules apply to semantically ambiguous phrases where text presence alone does not determine appropriateness as a key phrase. The insertion and deletion rules are used to transform a list of found phrases to a list of key phrases for indexing a document. Statistical data are provided to demonstrate the performance of this expert rule based system

Hutchins, J.: ¬A new era in machine translation research (1995) 0.02

0.02350872 = product of:
  0.12537985 = sum of:
    0.030953024 = weight(_text_:26 in 3846) [ClassicSimilarity], result of:
      0.030953024 = score(doc=3846,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.27312735 = fieldWeight in 3846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3846)
    0.031475607 = product of:
      0.062951215 = sum of:
        0.062951215 = weight(_text_:rules in 3846) [ClassicSimilarity], result of:
          0.062951215 = score(doc=3846,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.38950738 = fieldWeight in 3846, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3846)
      0.5 = coord(1/2)
    0.062951215 = weight(_text_:rules in 3846) [ClassicSimilarity], result of:
      0.062951215 = score(doc=3846,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.38950738 = fieldWeight in 3846, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3846)
  0.1875 = coord(3/16)

Abstract: In the 1980s the dominant framework for machine translation research was the approach based on essentially linguistic rules. Describes the new approaches of the 1990s which are based on large text corpora, the alignment of bilingual texts, the use of statistical methods and the use of parallel corpora for example based translation. Most systems are now designed for specialized applications, such as restricted to controlled languages, to a sublanguage or to s specific domain, to a perticular organization or to a particular user type. In addition, the field is widening with research under way on speech translation, on systems for monolingual users not knowing target languages, on systems for multilingual generation directly from structured databases, and in general for uses other than those traditionally associated with translation services
Date: 8. 4.1996 11:08:26

Melucci, M.; Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation (2007) 0.02

0.023114135 = product of:
  0.12327539 = sum of:
    0.031475607 = product of:
      0.062951215 = sum of:
        0.062951215 = weight(_text_:rules in 268) [ClassicSimilarity], result of:
          0.062951215 = score(doc=268,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.38950738 = fieldWeight in 268, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0546875 = fieldNorm(doc=268)
      0.5 = coord(1/2)
    0.028848568 = weight(_text_:american in 268) [ClassicSimilarity], result of:
      0.028848568 = score(doc=268,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.26367915 = fieldWeight in 268, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0546875 = fieldNorm(doc=268)
    0.062951215 = weight(_text_:rules in 268) [ClassicSimilarity], result of:
      0.062951215 = score(doc=268,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.38950738 = fieldWeight in 268, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0546875 = fieldNorm(doc=268)
  0.1875 = coord(3/16)

Abstract: The authors describe a statistical approach based on hidden Markov models (HMMs), for generating stemmers automatically. The proposed approach requires little effort to insert new languages in the system even if minimal linguistic knowledge is available. This is a key advantage especially for digital libraries, which are often developed for a specific institution or government because the program can manage a great amount of documents written in local languages. The evaluation described in the article shows that the stemmers implemented by means of HMMs are as effective as those based on linguistic rules.
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.5, S.673-686

Chandrasekar, R.; Srinivas, B.: Automatic induction of rules for text simplification (1997) 0.02

0.020444008 = product of:
  0.16355206 = sum of:
    0.054517355 = product of:
      0.10903471 = sum of:
        0.10903471 = weight(_text_:rules in 2873) [ClassicSimilarity], result of:
          0.10903471 = score(doc=2873,freq=6.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.6746466 = fieldWeight in 2873, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2873)
      0.5 = coord(1/2)
    0.10903471 = weight(_text_:rules in 2873) [ClassicSimilarity], result of:
      0.10903471 = score(doc=2873,freq=6.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.6746466 = fieldWeight in 2873, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2873)
  0.125 = coord(2/16)

Abstract: Explores methods to automatically transform sentences in order to make them simpler. These methods involve the use of a rule-based system, driven by the syntax of the text in the domain of interest. Hand-crafting rules for every domain is time-consuming and impractical. Describes an algorithm and an implementation by which generalized rules for simplification are automatically induced from annotated training materials using a novel partial parsing technique, which combines constituent structure and dependency information. The algorithm employs example-based generalisations on linguistically motivated structures

Karakos, A.: Greeklish : an experimental interface for automatic transliteration (2003) 0.02

0.019812116 = product of:
  0.10566462 = sum of:
    0.026979093 = product of:
      0.053958185 = sum of:
        0.053958185 = weight(_text_:rules in 1820) [ClassicSimilarity], result of:
          0.053958185 = score(doc=1820,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.33386347 = fieldWeight in 1820, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=1820)
      0.5 = coord(1/2)
    0.024727343 = weight(_text_:american in 1820) [ClassicSimilarity], result of:
      0.024727343 = score(doc=1820,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.22601068 = fieldWeight in 1820, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.046875 = fieldNorm(doc=1820)
    0.053958185 = weight(_text_:rules in 1820) [ClassicSimilarity], result of:
      0.053958185 = score(doc=1820,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.33386347 = fieldWeight in 1820, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.046875 = fieldNorm(doc=1820)
  0.1875 = coord(3/16)

Abstract: "Transliteration" in linguistics means the system of conveying as nearly as possible by means of one set of letters or characters the pronunciation of the words in languages written and printed in a totally different script. This term may be applied to a transcription in Latin letters of Greek, Hebrew, or the Slavonic languages written in the Cyrillic alphabet. We present in this article Greeklish, a Windows application that automatically produces English to Greek transliteration and back-transliteration (retransliteration). This transliteration is based an an algorithm with a table of associations between the two character sets. This table can be modified by the user so that it can cover personal preferences or formal present and future rules. The novelty of this system is its speed of operation, its simplicity, and its ease of use. Our examples use a Greek to Latin (English) alphabet mapping, but the Greeklish application can easily use any X to Latin mapping, where X is any non-Latin alphabet.
Source: Journal of the American Society for Information Science and technology. 54(2003) no.11, S.1069-1074

Ahmed, F.; Nürnberger, A.: Evaluation of n-gram conflation approaches for Arabic text retrieval (2009) 0.02

0.019812116 = product of:
  0.10566462 = sum of:
    0.026979093 = product of:
      0.053958185 = sum of:
        0.053958185 = weight(_text_:rules in 2941) [ClassicSimilarity], result of:
          0.053958185 = score(doc=2941,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.33386347 = fieldWeight in 2941, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.046875 = fieldNorm(doc=2941)
      0.5 = coord(1/2)
    0.024727343 = weight(_text_:american in 2941) [ClassicSimilarity], result of:
      0.024727343 = score(doc=2941,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.22601068 = fieldWeight in 2941, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.046875 = fieldNorm(doc=2941)
    0.053958185 = weight(_text_:rules in 2941) [ClassicSimilarity], result of:
      0.053958185 = score(doc=2941,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.33386347 = fieldWeight in 2941, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.046875 = fieldNorm(doc=2941)
  0.1875 = coord(3/16)

Abstract: In this paper we present a language-independent approach for conflation that does not depend on predefined rules or prior knowledge of the target language. The proposed unsupervised method is based on an enhancement of the pure n-gram model that can group related words based on various string-similarity measures, while restricting the search to specific locations of the target word by taking into account the order of n-grams. We show that the method is effective to achieve high score similarities for all word-form variations and reduces the ambiguity, i.e., obtains a higher precision and recall, compared to pure n-gram-based approaches for English, Portuguese, and Arabic. The proposed method is especially suited for conflation approaches in Arabic, since Arabic is a highly inflectional language. Therefore, we present in addition an adaptive user interface for Arabic text retrieval called araSearch. araSearch serves as a metasearch interface to existing search engines. The system is able to extend a query using the proposed conflation approach such that additional results for relevant subwords can be found automatically.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.7, S.1448-1465

Lee, K.H.; Ng, M.K.M.; Lu, Q.: Text segmentation for Chinese spell checking (1999) 0.02

0.016510097 = product of:
  0.08805385 = sum of:
    0.022482576 = product of:
      0.04496515 = sum of:
        0.04496515 = weight(_text_:rules in 3913) [ClassicSimilarity], result of:
          0.04496515 = score(doc=3913,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.27821955 = fieldWeight in 3913, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3913)
      0.5 = coord(1/2)
    0.02060612 = weight(_text_:american in 3913) [ClassicSimilarity], result of:
      0.02060612 = score(doc=3913,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.18834224 = fieldWeight in 3913, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3913)
    0.04496515 = weight(_text_:rules in 3913) [ClassicSimilarity], result of:
      0.04496515 = score(doc=3913,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.27821955 = fieldWeight in 3913, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3913)
  0.1875 = coord(3/16)

Abstract: Chinese spell checking is different from its counterparts for Western languages because Chinese words in texts are not separated by spaces. Chinese spell checking in this article refers to how to identify the misuse of characters in text composition. In other words, it is error correction at the word level rather than at the character level. Before Chinese sentences are spell checked, the text is segmented into semantic units. Error detection can then be carried out on the segmented text based on thesaurus and grammar rules. Segmentation is not a trivial process due to ambiguities in the Chinese language and errors in texts. Because it is not practical to define all Chinese words in a dictionary, words not predefined must also be dealt with. The number of word combinations increases exponentially with the length of the sentence. In this article, a Block-of-Combinations (BOC) segmentation method based on frequency of word usage is proposed to reduce the word combinations from exponential growth to linear growth. From experiments carried out on Hong Kong newspapers, BOC can correctly solve 10% more ambiguities than the Maximum Match segmentation method. To make the segmentation more suitable for spell checking, user interaction is also suggested
Source: Journal of the American Society for Information Science. 50(1999) no.9, S.751-759

Pinker, S.: Wörter und Regeln : Die Natur der Sprache (2000) 0.01

0.014684487 = product of:
  0.07831726 = sum of:
    0.022482576 = product of:
      0.04496515 = sum of:
        0.04496515 = weight(_text_:rules in 734) [ClassicSimilarity], result of:
          0.04496515 = score(doc=734,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.27821955 = fieldWeight in 734, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0390625 = fieldNorm(doc=734)
      0.5 = coord(1/2)
    0.04496515 = weight(_text_:rules in 734) [ClassicSimilarity], result of:
      0.04496515 = score(doc=734,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.27821955 = fieldWeight in 734, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0390625 = fieldNorm(doc=734)
    0.010869532 = product of:
      0.021739064 = sum of:
        0.021739064 = weight(_text_:22 in 734) [ClassicSimilarity], result of:
          0.021739064 = score(doc=734,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.19345059 = fieldWeight in 734, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=734)
      0.5 = coord(1/2)
  0.1875 = coord(3/16)

Content: Originaltitel: Words and rules (1999)
Date: 19. 7.2002 14:22:31

Xianghao, G.; Yixin, Z.; Li, Y.: ¬A new method of news test understanding and abstracting based on speech acts theory (1998) 0.01

0.013489546 = product of:
  0.10791637 = sum of:
    0.035972122 = product of:
      0.071944244 = sum of:
        0.071944244 = weight(_text_:rules in 3532) [ClassicSimilarity], result of:
          0.071944244 = score(doc=3532,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.4451513 = fieldWeight in 3532, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0625 = fieldNorm(doc=3532)
      0.5 = coord(1/2)
    0.071944244 = weight(_text_:rules in 3532) [ClassicSimilarity], result of:
      0.071944244 = score(doc=3532,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.4451513 = fieldWeight in 3532, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0625 = fieldNorm(doc=3532)
  0.125 = coord(2/16)

Abstract: Presents a method for the automated analysis and comprehension of foreign affairs news produced by a Chinese news agency. Notes that the development of the method was prededed by a study of the structuring rules of the news. Describes how an abstract of the news story is produced automatically from the analysis. Stresses the main aim of the work which is to use specch act theory to analyse and classify sentences

Sharada, B.A.: Rules derivation for Kannada based indexing language using transformational grammar (1998) 0.01

0.013489546 = product of:
  0.10791637 = sum of:
    0.035972122 = product of:
      0.071944244 = sum of:
        0.071944244 = weight(_text_:rules in 3533) [ClassicSimilarity], result of:
          0.071944244 = score(doc=3533,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.4451513 = fieldWeight in 3533, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.0625 = fieldNorm(doc=3533)
      0.5 = coord(1/2)
    0.071944244 = weight(_text_:rules in 3533) [ClassicSimilarity], result of:
      0.071944244 = score(doc=3533,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.4451513 = fieldWeight in 3533, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.0625 = fieldNorm(doc=3533)
  0.125 = coord(2/16)

Olsen, K.A.; Williams, J.G.: Spelling and grammar checking using the Web as a text repository (2004) 0.01
```
0.013208077 = product of:
  0.07044308 = sum of:
    0.017986061 = product of:
      0.035972122 = sum of:
        0.035972122 = weight(_text_:rules in 2891) [ClassicSimilarity], result of:
          0.035972122 = score(doc=2891,freq=2.0), product of:
            0.16161752 = queryWeight, product of:
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.032090448 = queryNorm
            0.22257565 = fieldWeight in 2891, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.036312 = idf(docFreq=780, maxDocs=44218)
              0.03125 = fieldNorm(doc=2891)
      0.5 = coord(1/2)
    0.016484896 = weight(_text_:american in 2891) [ClassicSimilarity], result of:
      0.016484896 = score(doc=2891,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.15067379 = fieldWeight in 2891, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.03125 = fieldNorm(doc=2891)
    0.035972122 = weight(_text_:rules in 2891) [ClassicSimilarity], result of:
      0.035972122 = score(doc=2891,freq=2.0), product of:
        0.16161752 = queryWeight, product of:
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.032090448 = queryNorm
        0.22257565 = fieldWeight in 2891, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.036312 = idf(docFreq=780, maxDocs=44218)
          0.03125 = fieldNorm(doc=2891)
  0.1875 = coord(3/16)
```
Abstract

Natural languages are both complex and dynamic. They are in part formalized through dictionaries and grammar. Dictionaries attempt to provide definitions and examples of various usages for all the words in a language. Grammar, on the other hand, is the system of rules that defines the structure of a language and is concerned with the correct use and application of the language in speaking or writing. The fact that these two mechanisms lag behind the language as currently used is not a serious problem for those living in a language culture and talking their native language. However, the correct choice of words, expressions, and word relationships is much more difficult when speaking or writing in a foreign language. The basics of the grammar of a language may have been learned in school decades ago, and even then there were always several choices for the correct expression for an idea, fact, opinion, or emotion. Although many different parts of speech and their relationships can make for difficult language decisions, prepositions tend to be problematic for nonnative speakers of English, and, in reality, prepositions are a major problem in most languages. Does a speaker or writer say "in the West Coast" or "on the West Coast," or perhaps "at the West Coast"? In Norwegian, we are "in" a city, but "at" a place. But the distinction between cities and places is vague. To be absolutely correct, one really has to learn the right preposition for every single place. A simplistic way of resolving these language issues is to ask a native speaker. But even native speakers may disagree about the right choice of words. If there is disagreement, then one will have to ask more than one native speaker, treat his/her response as a vote for a particular choice, and perhaps choose the majority choice as the best possible alternative. In real life, such a procedure may be impossible or impractical, but in the electronic world, as we shall see, this is quite easy to achieve. Using the vast text repository of the Web, we may get a significant voting base for even the most detailed and distinct phrases. We shall start by introducing a set of examples to present our idea of using the text repository an the Web to aid in making the best word selection, especially for the use of prepositions. Then we will present a more general discussion of the possibilities and limitations of using the Web as an aid for correct writing.

Source

Journal of the American Society for Information Science and Technology. 55(2004) no.11, S.1020-1023
Levin, M.; Krawczyk, S.; Bethard, S.; Jurafsky, D.: Citation-based bootstrapping for large-scale author disambiguation (2012) 0.01
```
0.012892494 = product of:
  0.10313995 = sum of:
    0.08253384 = weight(_text_:author in 246) [ClassicSimilarity], result of:
      0.08253384 = score(doc=246,freq=8.0), product of:
        0.15482868 = queryWeight, product of:
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.032090448 = queryNorm
        0.53306556 = fieldWeight in 246, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.0390625 = fieldNorm(doc=246)
    0.02060612 = weight(_text_:american in 246) [ClassicSimilarity], result of:
      0.02060612 = score(doc=246,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.18834224 = fieldWeight in 246, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0390625 = fieldNorm(doc=246)
  0.125 = coord(2/16)
```
Abstract

We present a new, two-stage, self-supervised algorithm for author disambiguation in large bibliographic databases. In the first "bootstrap" stage, a collection of high-precision features is used to bootstrap a training set with positive and negative examples of coreferring authors. A supervised feature-based classifier is then trained on the bootstrap clusters and used to cluster the authors in a larger unlabeled dataset. Our self-supervised approach shares the advantages of unsupervised approaches (no need for expensive hand labels) as well as supervised approaches (a rich set of features that can be discriminatively trained). The algorithm disambiguates 54,000,000 author instances in Thomson Reuters' Web of Knowledge with B3 F1 of.807. We analyze parameters and features, particularly those from citation networks, which have not been deeply investigated in author disambiguation. The most important citation feature is self-citation, which can be approximated without expensive extraction of the full network. For the supervised stage, the minor improvement due to other citation features (increasing F1 from.748 to.767) suggests they may not be worth the trouble of extracting from databases that don't already have them. A lean feature set without expensive abstract and title features performs 130 times faster with about equal F1.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.5, S.1030-1047

Litkowski, K.C.: Category development based on semantic principles (1997) 0.01

0.012266113 = product of:
  0.09812891 = sum of:
    0.049064454 = weight(_text_:cataloguing in 1824) [ClassicSimilarity], result of:
      0.049064454 = score(doc=1824,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.34387225 = fieldWeight in 1824, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1824)
    0.049064454 = weight(_text_:cataloguing in 1824) [ClassicSimilarity], result of:
      0.049064454 = score(doc=1824,freq=2.0), product of:
        0.14268221 = queryWeight, product of:
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.032090448 = queryNorm
        0.34387225 = fieldWeight in 1824, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.446252 = idf(docFreq=1408, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1824)
  0.125 = coord(2/16)

Abstract: Describes the beginnings of computerized information retrieval and text analysis, particularly from the perspective of the use of thesauri and cataloguing systems. Describes formalisations of linguistic principles in the development of formal grammars and semantics. Presents the principles for category development, based on research in linguistic formalism continuing with ever richer grammars and semantic formalism. Descrines the progress of these formalisms in the examiniation of the categories used in Minnesota Contextual Content Analysis approach. Describes current research toward an integration of semantic principles into content analysis abstraction procedures for characterising the category of any text

Search (274 results, page 1 of 14)

Authors

Years

Languages

Types

Themes

Subjects

Classifications