Search (103 results, page 2 of 6)

Mesquita, L.A.P.; Souza, R.R.; Baracho Porto, R.M.A.: Noun phrases in automatic indexing: : a structural analysis of the distribution of relevant terms in doctoral theses (2014) 0.01
```
0.008542268 = product of:
  0.06833814 = sum of:
    0.033013538 = weight(_text_:author in 1442) [ClassicSimilarity], result of:
      0.033013538 = score(doc=1442,freq=2.0), product of:
        0.15482868 = queryWeight, product of:
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.032090448 = queryNorm
        0.21322623 = fieldWeight in 1442, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.03125 = fieldNorm(doc=1442)
    0.035324603 = sum of:
      0.017933354 = weight(_text_:ed in 1442) [ClassicSimilarity], result of:
        0.017933354 = score(doc=1442,freq=2.0), product of:
          0.11411327 = queryWeight, product of:
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.032090448 = queryNorm
          0.15715398 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
      0.017391251 = weight(_text_:22 in 1442) [ClassicSimilarity], result of:
        0.017391251 = score(doc=1442,freq=2.0), product of:
          0.11237528 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.032090448 = queryNorm
          0.15476047 = fieldWeight in 1442, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.03125 = fieldNorm(doc=1442)
  0.125 = coord(2/16)
```
Abstract

The main objective of this research was to analyze whether there was a characteristic distribution behavior of relevant terms over a scientific text that could contribute as a criterion for their process of automatic indexing. The terms considered in this study were only full noun phrases contained in the texts themselves. The texts were considered a total of 98 doctoral theses of the eight areas of knowledge in a same university. Initially, 20 full noun phrases were automatically extracted from each text as candidates to be the most relevant terms, and each author of each text assigned a relevance value 0-6 (not relevant and highly relevant, respectively) for each of the 20 noun phrases sent. Only, 22.1 % of noun phrases were considered not relevant. A relevance values of the terms assigned by the authors were associated with their positions in the text. Each full noun phrases found in the text was considered as a valid linear position. The results that were obtained showed values resulting from this distribution by considering two types of position: linear, with values consolidated into ten equal consecutive parts; and structural, considering parts of the text (such as introduction, development and conclusion). As a result of considerable importance, all areas of knowledge related to the Natural Sciences showed a characteristic behavior in the distribution of relevant terms, as well as all areas of knowledge related to Social Sciences showed the same characteristic behavior of distribution, but distinct from the Natural Sciences. The difference of the distribution behavior between the Natural and Social Sciences can be clearly visualized through graphs. All behaviors, including the general behavior of all areas of knowledge together, were characterized in polynomial equations and can be applied in future as criteria for automatic indexing. Until the present date this work has become inedited of for two reasons: to present a method for characterizing the distribution of relevant terms in a scientific text, and also, through this method, pointing out a quantitative trait difference between the Natural and Social Sciences.

Source

Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik

Newman, D.J.; Block, S.: Probabilistic topic decomposition of an eighteenth-century American newspaper (2006) 0.01

0.008148067 = product of:
  0.06518453 = sum of:
    0.04996719 = weight(_text_:american in 5291) [ClassicSimilarity], result of:
      0.04996719 = score(doc=5291,freq=6.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.4567057 = fieldWeight in 5291, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0546875 = fieldNorm(doc=5291)
    0.015217344 = product of:
      0.030434689 = sum of:
        0.030434689 = weight(_text_:22 in 5291) [ClassicSimilarity], result of:
          0.030434689 = score(doc=5291,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.2708308 = fieldWeight in 5291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5291)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Abstract: We use a probabilistic mixture decomposition method to determine topics in the Pennsylvania Gazette, a major colonial U.S. newspaper from 1728-1800. We assess the value of several topic decomposition techniques for historical research and compare the accuracy and efficacy of various methods. After determining the topics covered by the 80,000 articles and advertisements in the entire 18th century run of the Gazette, we calculate how the prevalence of those topics changed over time, and give historically relevant examples of our findings. This approach reveals important information about the content of this colonial newspaper, and suggests the value of such approaches to a more complete understanding of early American print culture and society.
Date: 22. 7.2006 17:32:00
Source: Journal of the American Society for Information Science and Technology. 57(2006) no.6, S.753-767

Croft, W.B.: Automatic indexing : file organization and display for information retrieval (1989) 0.01

0.007953616 = product of:
  0.06362893 = sum of:
    0.04121224 = weight(_text_:american in 2412) [ClassicSimilarity], result of:
      0.04121224 = score(doc=2412,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.3766845 = fieldWeight in 2412, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.078125 = fieldNorm(doc=2412)
    0.022416692 = product of:
      0.044833384 = sum of:
        0.044833384 = weight(_text_:ed in 2412) [ClassicSimilarity], result of:
          0.044833384 = score(doc=2412,freq=2.0), product of:
            0.11411327 = queryWeight, product of:
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.032090448 = queryNorm
            0.39288494 = fieldWeight in 2412, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.078125 = fieldNorm(doc=2412)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Source: Indexing: the state of our knowledge and the state of our ignorance. Proceedings of the 20th Annual Meeting of the American Society of Indexers, New York City, May 13, 1988. Ed.: B.H. Weinberg

Biebricher, N.; Fuhr, N.; Lustig, G.; Schwantner, M.; Knorz, G.: ¬The automatic indexing system AIR/PHYS : from research to application (1988) 0.01

0.006680132 = product of:
  0.10688211 = sum of:
    0.10688211 = sum of:
      0.06340399 = weight(_text_:ed in 1952) [ClassicSimilarity], result of:
        0.06340399 = score(doc=1952,freq=4.0), product of:
          0.11411327 = queryWeight, product of:
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.032090448 = queryNorm
          0.55562323 = fieldWeight in 1952, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.5559888 = idf(docFreq=3431, maxDocs=44218)
            0.078125 = fieldNorm(doc=1952)
      0.043478128 = weight(_text_:22 in 1952) [ClassicSimilarity], result of:
        0.043478128 = score(doc=1952,freq=2.0), product of:
          0.11237528 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.032090448 = queryNorm
          0.38690117 = fieldWeight in 1952, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.078125 = fieldNorm(doc=1952)
  0.0625 = coord(1/16)

Date: 16. 8.1998 12:51:22
Footnote: Wiederabgedruckt in: Readings in information retrieval. Ed.: K. Sparck Jones u. P. Willett. San Francisco: Morgan Kaufmann 1997. S.513-517.
Source: Proceedings of the 11th annual conference on research and development in information retrieval. Ed.: Y. Chiaramella

Salton, G.; Buckley, C.: Approaches to global text analysis (1990) 0.01

0.005567532 = product of:
  0.044540256 = sum of:
    0.028848568 = weight(_text_:american in 4901) [ClassicSimilarity], result of:
      0.028848568 = score(doc=4901,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.26367915 = fieldWeight in 4901, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4901)
    0.015691686 = product of:
      0.031383373 = sum of:
        0.031383373 = weight(_text_:ed in 4901) [ClassicSimilarity], result of:
          0.031383373 = score(doc=4901,freq=2.0), product of:
            0.11411327 = queryWeight, product of:
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.032090448 = queryNorm
            0.27501947 = fieldWeight in 4901, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4901)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Source: ASIS'90: Information in the year 2000, from research to applications. Proc. of the 53rd Annual Meeting of the American Society for Information Science, Toronto, Canada, 4.-8.11.1990. Ed. by Diana Henderson

Warner, A.J.: ¬A linguistic approach to the automated hierarchical organization of phrases (1990) 0.01

0.005567532 = product of:
  0.044540256 = sum of:
    0.028848568 = weight(_text_:american in 4902) [ClassicSimilarity], result of:
      0.028848568 = score(doc=4902,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.26367915 = fieldWeight in 4902, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4902)
    0.015691686 = product of:
      0.031383373 = sum of:
        0.031383373 = weight(_text_:ed in 4902) [ClassicSimilarity], result of:
          0.031383373 = score(doc=4902,freq=2.0), product of:
            0.11411327 = queryWeight, product of:
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.032090448 = queryNorm
            0.27501947 = fieldWeight in 4902, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4902)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Source: ASIS'90: Information in the year 2000, from research to applications. Proc. of the 53rd Annual Meeting of the American Society for Information Science, Toronto, Canada, 4.-8.11.1990. Ed. by Diana Henderson

Correa, C.A.; Kobashi, N.Y.: ¬A hybrid model of automatic indexing based on paraconsitent logic 0.00

0.0049976474 = product of:
  0.03998118 = sum of:
    0.026531162 = weight(_text_:26 in 3537) [ClassicSimilarity], result of:
      0.026531162 = score(doc=3537,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.23410915 = fieldWeight in 3537, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.046875 = fieldNorm(doc=3537)
    0.013450016 = product of:
      0.026900033 = sum of:
        0.026900033 = weight(_text_:ed in 3537) [ClassicSimilarity], result of:
          0.026900033 = score(doc=3537,freq=2.0), product of:
            0.11411327 = queryWeight, product of:
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.032090448 = queryNorm
            0.23573098 = fieldWeight in 3537, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5559888 = idf(docFreq=3431, maxDocs=44218)
              0.046875 = fieldNorm(doc=3537)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Source: Paradigms and conceptual systems in knowledge organization: Proceedings of the Eleventh International ISKO conference, Rome, 23-26 February 2010, ed. Claudio Gnoli, Indeks, Frankfurt M

Koch, T.: Experiments with automatic classification of WAIS databases and indexing of WWW : some results from the Nordic WAIS/WWW project (1994) 0.00

0.004886008 = product of:
  0.078176126 = sum of:
    0.078176126 = weight(_text_:2nd in 7209) [ClassicSimilarity], result of:
      0.078176126 = score(doc=7209,freq=2.0), product of:
        0.18010403 = queryWeight, product of:
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.032090448 = queryNorm
        0.43406096 = fieldWeight in 7209, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.6123877 = idf(docFreq=438, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7209)
  0.0625 = coord(1/16)

Source: Internet world and document delivery world international 94: Proceedings of the 2nd Annual Conference, London, May 1994

Garfield, E.; Sher, I.H.: KeyWords Plus: algorithmic derivative indexing (1993) 0.00

0.004121224 = product of:
  0.06593958 = sum of:
    0.06593958 = weight(_text_:american in 4341) [ClassicSimilarity], result of:
      0.06593958 = score(doc=4341,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.60269517 = fieldWeight in 4341, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.125 = fieldNorm(doc=4341)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science. 44(1993) no.5, S.298-299

Dattola, R.T.: FIRST: Flexible information retrieval system for text (1979) 0.00

0.004121224 = product of:
  0.06593958 = sum of:
    0.06593958 = weight(_text_:american in 5172) [ClassicSimilarity], result of:
      0.06593958 = score(doc=5172,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.60269517 = fieldWeight in 5172, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.125 = fieldNorm(doc=5172)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science. 30(1979), S.9-14

Damerau, F.J.: ¬An experiment in automatic indexing (1965) 0.00

0.004121224 = product of:
  0.06593958 = sum of:
    0.06593958 = weight(_text_:american in 5464) [ClassicSimilarity], result of:
      0.06593958 = score(doc=5464,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.60269517 = fieldWeight in 5464, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.125 = fieldNorm(doc=5464)
  0.0625 = coord(1/16)

Source: American documentation. 16(1965), S.283-289

Bookstein, A.; Swanson, D.R.: Probabilistic models for automatic indexing (1974) 0.00

0.004121224 = product of:
  0.06593958 = sum of:
    0.06593958 = weight(_text_:american in 5466) [ClassicSimilarity], result of:
      0.06593958 = score(doc=5466,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.60269517 = fieldWeight in 5466, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.125 = fieldNorm(doc=5466)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science. 25(1974), S.312-318

Croft, W.B.: Clustering large files of documents using the single link method (1977) 0.00

0.004121224 = product of:
  0.06593958 = sum of:
    0.06593958 = weight(_text_:american in 5489) [ClassicSimilarity], result of:
      0.06593958 = score(doc=5489,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.60269517 = fieldWeight in 5489, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.125 = fieldNorm(doc=5489)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science. 28(1977), S.341-344

Plaunt, C.; Norgard, B.A.: ¬An association-based method for automatic indexing with a controlled vocabulary (1998) 0.00

0.0039344565 = product of:
  0.031475652 = sum of:
    0.02060612 = weight(_text_:american in 1794) [ClassicSimilarity], result of:
      0.02060612 = score(doc=1794,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.18834224 = fieldWeight in 1794, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1794)
    0.010869532 = product of:
      0.021739064 = sum of:
        0.021739064 = weight(_text_:22 in 1794) [ClassicSimilarity], result of:
          0.021739064 = score(doc=1794,freq=2.0), product of:
            0.11237528 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.032090448 = queryNorm
            0.19345059 = fieldWeight in 1794, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1794)
      0.5 = coord(1/2)
  0.125 = coord(2/16)

Date: 11. 9.2000 19:53:22
Source: Journal of the American Society for Information Science. 49(1998) no.10, S.888-902

Keller, A.: Attitudes among German- and English-speaking librarians toward (automatic) subject indexing (2015) 0.00
```
0.0036108557 = product of:
  0.05777369 = sum of:
    0.05777369 = weight(_text_:author in 2629) [ClassicSimilarity], result of:
      0.05777369 = score(doc=2629,freq=2.0), product of:
        0.15482868 = queryWeight, product of:
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.032090448 = queryNorm
        0.3731459 = fieldWeight in 2629, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.824759 = idf(docFreq=964, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2629)
  0.0625 = coord(1/16)
```
Abstract

The survey described in this article investigates the attitudes of librarians in German- and English-speaking countries toward subject indexing in general, and automatic subject indexing in particular. The results show great similarity between attitudes in both language areas. Respondents agree that the current quality standards should be upheld and dismiss critical voices claiming that subject indexing has lost relevance. With regard to automatic subject indexing, respondents demonstrate considerable skepticism-both with regard to the likely timeframe and the expected quality of such systems. The author considers how this low acceptance poses a difficulty for those involved in change management.

Salton, G.: ¬A new comparison between conventional indexing (MEDLARS) and automatic text processing (SMART) (1972) 0.00

0.003606071 = product of:
  0.057697136 = sum of:
    0.057697136 = weight(_text_:american in 2325) [ClassicSimilarity], result of:
      0.057697136 = score(doc=2325,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.5273583 = fieldWeight in 2325, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.109375 = fieldNorm(doc=2325)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science. 23(1972), S.75-84

Griffiths, A.; Luckhurst, H.C.; Willett, P.: Using interdocument similarity information in document retrieval systems (1986) 0.00

0.003606071 = product of:
  0.057697136 = sum of:
    0.057697136 = weight(_text_:american in 2415) [ClassicSimilarity], result of:
      0.057697136 = score(doc=2415,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.5273583 = fieldWeight in 2415, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.109375 = fieldNorm(doc=2415)
  0.0625 = coord(1/16)

Source: Journal of the American Society for Information Science. 37(1986) no.1, S.3-11

Hlava, M.M.: Automatic indexing : a matter of degree (2002) 0.00

0.003606071 = product of:
  0.057697136 = sum of:
    0.057697136 = weight(_text_:american in 2501) [ClassicSimilarity], result of:
      0.057697136 = score(doc=2501,freq=2.0), product of:
        0.10940785 = queryWeight, product of:
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.032090448 = queryNorm
        0.5273583 = fieldWeight in 2501, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.4093587 = idf(docFreq=3973, maxDocs=44218)
          0.109375 = fieldNorm(doc=2501)
  0.0625 = coord(1/16)

Source: Bulletin of the American Society for Information Science. 28(2002) no.1, S.12-15

Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.00
```
0.0034759352 = product of:
  0.055614963 = sum of:
    0.055614963 = weight(_text_:descriptive in 3232) [ClassicSimilarity], result of:
      0.055614963 = score(doc=3232,freq=2.0), product of:
        0.17974061 = queryWeight, product of:
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.032090448 = queryNorm
        0.3094179 = fieldWeight in 3232, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.601063 = idf(docFreq=443, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3232)
  0.0625 = coord(1/16)
```
Abstract

Users of research databases, such as CiteSeerX, Google Scholar, and Microsoft Academic, often search for papers using a set of keywords. Unfortunately, many authors avoid listing sufficient keywords for their papers. As such, these applications may need to automatically associate good descriptive keywords with papers. When the full text of the paper is available this problem has been thoroughly studied. In many cases, however, due to copyright limitations, research databases do not have access to the full text. On the other hand, such databases typically maintain metadata, such as the title and abstract and the citation network of each paper. In this paper we study the problem of predicting which keywords are appropriate for a research paper, using different methods based on the citation network and available metadata. Our main goal is in providing search engines with the ability to extract keywords from the available metadata. However, our system can also be used for other applications, such as for recommending keywords for the authors of new papers. We create a data set of research papers, and their citation network, keywords, and other metadata, containing over 470K papers with and more than 2 million keywords. We compare our methods with predicting keywords using the title and abstract, in offline experiments and in a user study, concluding that the citation network provides much better predictions.

Lepsky, K.: Automatische Indexierung in der Inhaltserschließung (1998) 0.00

0.0033163952 = product of:
  0.053062323 = sum of:
    0.053062323 = weight(_text_:26 in 1283) [ClassicSimilarity], result of:
      0.053062323 = score(doc=1283,freq=2.0), product of:
        0.113328174 = queryWeight, product of:
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.032090448 = queryNorm
        0.4682183 = fieldWeight in 1283, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.5315237 = idf(docFreq=3516, maxDocs=44218)
          0.09375 = fieldNorm(doc=1283)
  0.0625 = coord(1/16)

Date: 11.12.2015 11:37:26

Search (103 results, page 2 of 6)

Authors

Years

Types

Themes