Search (9942 results, page 1 of 498)

Liu, Y.-H.; Dantzig, P.; Sachs, M.; Corey, J.T.; Hinnebusch, M.T.; Damashek, M.; Cohen, J.: Visualizing document classification : a search aid for the digital library (2000) 0.23
```
0.2281493 = product of:
  0.30419907 = sum of:
    0.004478925 = product of:
      0.0179157 = sum of:
        0.0179157 = weight(_text_:based in 4431) [ClassicSimilarity], result of:
          0.0179157 = score(doc=4431,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.13315678 = fieldWeight in 4431, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.03125 = fieldNorm(doc=4431)
      0.25 = coord(1/4)
    0.060764134 = weight(_text_:term in 4431) [ClassicSimilarity], result of:
      0.060764134 = score(doc=4431,freq=4.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.29162687 = fieldWeight in 4431, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.03125 = fieldNorm(doc=4431)
    0.238956 = weight(_text_:freuqency in 4431) [ClassicSimilarity], result of:
      0.238956 = score(doc=4431,freq=2.0), product of:
        0.49137446 = queryWeight, product of:
          11.00374 = idf(docFreq=1, maxDocs=44218)
          0.044655222 = queryNorm
        0.4863012 = fieldWeight in 4431, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          11.00374 = idf(docFreq=1, maxDocs=44218)
          0.03125 = fieldNorm(doc=4431)
  0.75 = coord(3/4)
```
Abstract

The recent explosion of the Internet and the WWW has made digital libraries popular. Easy access to a digital library is provided by commercially available Web browsers, which provide a user-friendly interface. To retrieve documents of interest, the user is provided with a search interface that may only consist of one input field and one push button. Most users type in a single keyword, click the button, and hope for the best. The result of a query using this kind of search interface can consist of a large unordered set of documents, or a ranked list of documents based on the freuqency of the keywords. Both lists can contain articles unrelated to the user's inquiry unless a sophisticated search was performed and the user knows exactly what to look for. More sophisticated algorithms for ranking the search results according to how well they meet the users needs as expressed in the search input may help. However, what is desperately needed are software tools that can analyze the search result and manipulate large hierarchies of data graphically. In this article we describe the design of a language-independent document classification systems being developed to help users of the Florida Center for Library Automation analyze search query results. Easy access through the Web is provided, as well as a graphical user interface to display the classification results. We also describe the use of this system to retrieve and analyze sets of documents from public Web sites

Content

"We use the term 'classification' to denote the general process of identifying the subject matter of a document. We use the term 'clustering' to refer to the process of forming groups (clusters) of documents with related topics and subtopics, and visualizing those clusters"

Liu, R.-L.: Context-based term frequency assessment for text classification (2010) 0.20

0.20405504 = product of:
  0.2720734 = sum of:
    0.011636588 = product of:
      0.04654635 = sum of:
        0.04654635 = weight(_text_:based in 3331) [ClassicSimilarity], result of:
          0.04654635 = score(doc=3331,freq=6.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.34595144 = fieldWeight in 3331, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=3331)
      0.25 = coord(1/4)
    0.1822924 = weight(_text_:term in 3331) [ClassicSimilarity], result of:
      0.1822924 = score(doc=3331,freq=16.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.8748806 = fieldWeight in 3331, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=3331)
    0.078144394 = product of:
      0.15628879 = sum of:
        0.15628879 = weight(_text_:assessment in 3331) [ClassicSimilarity], result of:
          0.15628879 = score(doc=3331,freq=6.0), product of:
            0.24654238 = queryWeight, product of:
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.044655222 = queryNorm
            0.63392264 = fieldWeight in 3331, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.046875 = fieldNorm(doc=3331)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Automatic text classification (TC) is essential for the management of information. To properly classify a document d, it is essential to identify the semantics of each term t in d, while the semantics heavily depend on context (neighboring terms) of t in d. Therefore, we present a technique CTFA (Context-based Term Frequency Assessment) that improves text classifiers by considering term contexts in test documents. The results of the term context recognition are used to assess term frequencies of terms, and hence CTFA may easily work with various kinds of text classifiers that base their TC decisions on term frequencies, without needing to modify the classifiers. Moreover, CTFA is efficient, and neither huge memory nor domain-specific knowledge is required. Empirical results show that CTFA successfully enhances performance of several kinds of text classifiers on different experimental data.
Object: Context-based Term Frequency Assessment

Losee, R.M.: Determining information retrieval and filtering performance without experimentation (1995) 0.13

0.13454795 = product of:
  0.17939726 = sum of:
    0.007838119 = product of:
      0.031352475 = sum of:
        0.031352475 = weight(_text_:based in 3368) [ClassicSimilarity], result of:
          0.031352475 = score(doc=3368,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.23302436 = fieldWeight in 3368, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3368)
      0.25 = coord(1/4)
    0.15038356 = weight(_text_:term in 3368) [ClassicSimilarity], result of:
      0.15038356 = score(doc=3368,freq=8.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.72173965 = fieldWeight in 3368, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3368)
    0.02117558 = product of:
      0.04235116 = sum of:
        0.04235116 = weight(_text_:22 in 3368) [ClassicSimilarity], result of:
          0.04235116 = score(doc=3368,freq=2.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.2708308 = fieldWeight in 3368, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3368)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: The performance of an information retrieval or text and media filtering system may be determined through analytic methods as well as by traditional simulation or experimental methods. These analytic methods can provide precise statements about expected performance. They can thus determine which of 2 similarly performing systems is superior. For both a single query terms and for a multiple query term retrieval model, a model for comparing the performance of different probabilistic retrieval methods is developed. This method may be used in computing the average search length for a query, given only knowledge of database parameter values. Describes predictive models for inverse document frequency, binary independence, and relevance feedback based retrieval and filtering. Simulation illustrate how the single term model performs and sample performance predictions are given for single term and multiple term problems
Date: 22. 2.1996 13:14:10

Chew, S.W.; Khoo, K.S.G.: Comparison of drug information on consumer drug review sites versus authoritative health information websites (2016) 0.12

0.12356427 = product of:
  0.16475236 = sum of:
    0.005598656 = product of:
      0.022394624 = sum of:
        0.022394624 = weight(_text_:based in 2643) [ClassicSimilarity], result of:
          0.022394624 = score(doc=2643,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.16644597 = fieldWeight in 2643, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2643)
      0.25 = coord(1/4)
    0.05370841 = weight(_text_:term in 2643) [ClassicSimilarity], result of:
      0.05370841 = score(doc=2643,freq=2.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.25776416 = fieldWeight in 2643, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2643)
    0.1054453 = sum of:
      0.07519447 = weight(_text_:assessment in 2643) [ClassicSimilarity], result of:
        0.07519447 = score(doc=2643,freq=2.0), product of:
          0.24654238 = queryWeight, product of:
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.044655222 = queryNorm
          0.30499613 = fieldWeight in 2643, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2643)
      0.03025083 = weight(_text_:22 in 2643) [ClassicSimilarity], result of:
        0.03025083 = score(doc=2643,freq=2.0), product of:
          0.15637498 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.044655222 = queryNorm
          0.19345059 = fieldWeight in 2643, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2643)
  0.75 = coord(3/4)

Abstract: Large amounts of health-related information of different types are available on the web. In addition to authoritative health information sites maintained by government health departments and healthcare institutions, there are many social media sites carrying user-contributed information. This study sought to identify the types of drug information available on consumer-contributed drug review sites when compared with authoritative drug information websites. Content analysis was performed on the information available for nine drugs on three authoritative sites (RxList, eMC, and PDRhealth) as well as three drug review sites (WebMD, RateADrug, and PatientsLikeMe). The types of information found on authoritative sites but rarely on drug review sites include pharmacology, special population considerations, contraindications, and drug interactions. Types of information found only on drug review sites include drug efficacy, drug resistance experienced by long-term users, cost of drug in relation to insurance coverage, availability of generic forms, comparison with other similar drugs and with other versions of the drug, difficulty in using the drug, and advice on coping with side effects. Drug efficacy ratings by users were found to be different across the three sites. Side effects were vividly described in context, with user assessment of severity based on discomfort and effect on their lives.
Date: 22. 1.2016 12:24:05

Vossen, G.A.: Strategic knowledge acquisition (1996) 0.12

0.11532679 = product of:
  0.15376906 = sum of:
    0.0067183874 = product of:
      0.02687355 = sum of:
        0.02687355 = weight(_text_:based in 915) [ClassicSimilarity], result of:
          0.02687355 = score(doc=915,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.19973516 = fieldWeight in 915, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=915)
      0.25 = coord(1/4)
    0.12890019 = weight(_text_:term in 915) [ClassicSimilarity], result of:
      0.12890019 = score(doc=915,freq=8.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.618634 = fieldWeight in 915, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=915)
    0.018150497 = product of:
      0.036300994 = sum of:
        0.036300994 = weight(_text_:22 in 915) [ClassicSimilarity], result of:
          0.036300994 = score(doc=915,freq=2.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.23214069 = fieldWeight in 915, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=915)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: In the competitive equation for the future Economies become knowledge-based. Therefore in Knowledge Intensive Firms (KIFs) the strategie management of knowledge becomes increasingly important. Im this paper three important conditions for efficient and affective knowledge acquisition are identified: Coordination, Communication and long term Contract. Research by the author showed that co-ordination is a relative important condition for Small and Medium sized industrial KIFs. For larger national and multinational industrial KIFs communication and Jong term contracts are relative important conditions. Because of the lack of time for co-ordination and communication a small and medium sized KIF should welcome am extemal knowledge broker as intermediary. Because knowledge is more than R&D a larger industrial KIF should adapt am approach to strategic knowledge management with am intemal knowledge broker, who is responsible for co-ordination, communication and establishing long term contracts. Furthermore, a Strategic Knowledge Network is an option im KIFs and between KIFs and partners for effective and efficient co-ordination, communication and Jong term cont(r)acts.
Source: Knowledge management: organization competence and methodolgy. Proceedings of the Fourth International ISMICK Symposium, 21-22 October 1996, Netherlands. Ed.: J.F. Schreinemakers

Haustein, S.; Sugimoto, C.; Larivière, V.: Social media in scholarly communication : Guest editorial (2015) 0.11
```
0.11076471 = product of:
  0.14768629 = sum of:
    0.005818294 = product of:
      0.023273176 = sum of:
        0.023273176 = weight(_text_:based in 3809) [ClassicSimilarity], result of:
          0.023273176 = score(doc=3809,freq=6.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.17297572 = fieldWeight in 3809, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3809)
      0.25 = coord(1/4)
    0.0455731 = weight(_text_:term in 3809) [ClassicSimilarity], result of:
      0.0455731 = score(doc=3809,freq=4.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.21872015 = fieldWeight in 3809, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3809)
    0.096294895 = sum of:
      0.078144394 = weight(_text_:assessment in 3809) [ClassicSimilarity], result of:
        0.078144394 = score(doc=3809,freq=6.0), product of:
          0.24654238 = queryWeight, product of:
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.044655222 = queryNorm
          0.31696132 = fieldWeight in 3809, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.0234375 = fieldNorm(doc=3809)
      0.018150497 = weight(_text_:22 in 3809) [ClassicSimilarity], result of:
        0.018150497 = score(doc=3809,freq=2.0), product of:
          0.15637498 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.044655222 = queryNorm
          0.116070345 = fieldWeight in 3809, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0234375 = fieldNorm(doc=3809)
  0.75 = coord(3/4)
```
Abstract

One of the solutions to help scientists filter the most relevant publications and, thus, to stay current on developments in their fields during the transition from "little science" to "big science", was the introduction of citation indexing as a Wellsian "World Brain" (Garfield, 1964) of scientific information: It is too much to expect a research worker to spend an inordinate amount of time searching for the bibliographic descendants of antecedent papers. It would not be excessive to demand that the thorough scholar check all papers that have cited or criticized such papers, if they could be located quickly. The citation index makes this check practicable (Garfield, 1955, p. 108). In retrospective, citation indexing can be perceived as a pre-social web version of crowdsourcing, as it is based on the concept that the community of citing authors outperforms indexers in highlighting cognitive links between papers, particularly on the level of specific ideas and concepts (Garfield, 1983). Over the last 50 years, citation analysis and more generally, bibliometric methods, have developed from information retrieval tools to research evaluation metrics, where they are presumed to make scientific funding more efficient and effective (Moed, 2006). However, the dominance of bibliometric indicators in research evaluation has also led to significant goal displacement (Merton, 1957) and the oversimplification of notions of "research productivity" and "scientific quality", creating adverse effects such as salami publishing, honorary authorships, citation cartels, and misuse of indicators (Binswanger, 2015; Cronin and Sugimoto, 2014; Frey and Osterloh, 2006; Haustein and Larivière, 2015; Weingart, 2005).
Furthermore, the rise of the web, and subsequently, the social web, has challenged the quasi-monopolistic status of the journal as the main form of scholarly communication and citation indices as the primary assessment mechanisms. Scientific communication is becoming more open, transparent, and diverse: publications are increasingly open access; manuscripts, presentations, code, and data are shared online; research ideas and results are discussed and criticized openly on blogs; and new peer review experiments, with open post publication assessment by anonymous or non-anonymous referees, are underway. The diversification of scholarly production and assessment, paired with the increasing speed of the communication process, leads to an increased information overload (Bawden and Robinson, 2008), demanding new filters. The concept of altmetrics, short for alternative (to citation) metrics, was created out of an attempt to provide a filter (Priem et al., 2010) and to steer against the oversimplification of the measurement of scientific success solely on the basis of number of journal articles published and citations received, by considering a wider range of research outputs and metrics (Piwowar, 2013). Although the term altmetrics was introduced in a tweet in 2010 (Priem, 2010), the idea of capturing traces - "polymorphous mentioning" (Cronin et al., 1998, p. 1320) - of scholars and their documents on the web to measure "impact" of science in a broader manner than citations was introduced years before, largely in the context of webometrics (Almind and Ingwersen, 1997; Thelwall et al., 2005):
There will soon be a critical mass of web-based digital objects and usage statistics on which to model scholars' communication behaviors - publishing, posting, blogging, scanning, reading, downloading, glossing, linking, citing, recommending, acknowledging - and with which to track their scholarly influence and impact, broadly conceived and broadly felt (Cronin, 2005, p. 196). A decade after Cronin's prediction and five years after the coining of altmetrics, the time seems ripe to reflect upon the role of social media in scholarly communication. This Special Issue does so by providing an overview of current research on the indicators and metrics grouped under the umbrella term of altmetrics, on their relationships with traditional indicators of scientific activity, and on the uses that are made of the various social media platforms - on which these indicators are based - by scientists of various disciplines.

Date

20. 1.2015 18:30:22

Wong, S.K.M.; Yao, Y.Y.: ¬An information-theoretic measure of term specifics (1992) 0.11

0.10501176 = product of:
  0.21002352 = sum of:
    0.011084774 = product of:
      0.044339094 = sum of:
        0.044339094 = weight(_text_:based in 4807) [ClassicSimilarity], result of:
          0.044339094 = score(doc=4807,freq=4.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.3295462 = fieldWeight in 4807, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=4807)
      0.25 = coord(1/4)
    0.19893874 = weight(_text_:term in 4807) [ClassicSimilarity], result of:
      0.19893874 = score(doc=4807,freq=14.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.9547718 = fieldWeight in 4807, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4807)
  0.5 = coord(2/4)

Abstract: The inverse document frequency (IDF) and signal-noise ratio (S/N) approaches are term weighting schemes based on term specifics. However, the existing justifications for these methods are still some what inconclusive and sometimes even based on incompatible assumptions. Introduces an information-theoretic measure for term specifics. Shows that the IDF weighting scheme can be derived from the proposed approach by assuming that the frequency of occurrence of each index term is uniform within the set of documents containing the term. The information-theoretic interpretation of term specifics also establishes the relationship between the IDF and S/N methods

Huffman, G.D.; Vital, D.A.; Bivins, R.G.: Generating indices with lexical association methods : term uniqueness (1990) 0.10

0.10278254 = product of:
  0.13704339 = sum of:
    0.007917695 = product of:
      0.03167078 = sum of:
        0.03167078 = weight(_text_:based in 4152) [ClassicSimilarity], result of:
          0.03167078 = score(doc=4152,freq=4.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.23539014 = fieldWeight in 4152, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
      0.25 = coord(1/4)
    0.075955175 = weight(_text_:term in 4152) [ClassicSimilarity], result of:
      0.075955175 = score(doc=4152,freq=4.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.3645336 = fieldWeight in 4152, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4152)
    0.05317052 = product of:
      0.10634104 = sum of:
        0.10634104 = weight(_text_:assessment in 4152) [ClassicSimilarity], result of:
          0.10634104 = score(doc=4152,freq=4.0), product of:
            0.24654238 = queryWeight, product of:
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.044655222 = queryNorm
            0.43132967 = fieldWeight in 4152, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4152)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: A software system has been developed which orders citations retrieved from an online database in terms of relevancy. The system resulted from an effort generated by NASA's Technology Utilization Program to create new advanced software tools to largely automate the process of determining relevancy of database citations retrieved to support large technology transfer studies. The ranking is based on the generation of an enriched vocabulary using lexical association methods, a user assessment of the vocabulary and a combination of the user assessment and the lexical metric. One of the key elements in relevancy ranking is the enriched vocabulary -the terms mst be both unique and descriptive. This paper examines term uniqueness. Six lexical association methods were employed to generate characteristic word indices. A limited subset of the terms - the highest 20,40,60 and 7,5% of the uniquess words - we compared and uniquess factors developed. Computational times were also measured. It was found that methods based on occurrences and signal produced virtually the same terms. The limited subset of terms producedby the exact and centroid discrimination value were also nearly identical. Unique terms sets were produced by teh occurrence, variance and discrimination value (centroid), An end-user evaluation showed that the generated terms were largely distinct and had values of word precision which were consistent with values of the search precision.

Crestani, F.; Rijsbergen, C.J. van: Information retrieval by imaging (1996) 0.10

0.1023748 = product of:
  0.13649973 = sum of:
    0.0067183874 = product of:
      0.02687355 = sum of:
        0.02687355 = weight(_text_:based in 6967) [ClassicSimilarity], result of:
          0.02687355 = score(doc=6967,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.19973516 = fieldWeight in 6967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=6967)
      0.25 = coord(1/4)
    0.11163084 = weight(_text_:term in 6967) [ClassicSimilarity], result of:
      0.11163084 = score(doc=6967,freq=6.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.5357528 = fieldWeight in 6967, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=6967)
    0.018150497 = product of:
      0.036300994 = sum of:
        0.036300994 = weight(_text_:22 in 6967) [ClassicSimilarity], result of:
          0.036300994 = score(doc=6967,freq=2.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.23214069 = fieldWeight in 6967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=6967)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Explains briefly what constitutes the imaging process and explains how imaging can be used in information retrieval. Proposes an approach based on the concept of: 'a term is a possible world'; which enables the exploitation of term to term relationships which are estimated using an information theoretic measure. Reports results of an evaluation exercise to compare the performance of imaging retrieval, using possible world semantics, with a benchmark and using the Cranfield 2 document collection to measure precision and recall. Initially, the performance imaging retrieval was seen to be better but statistical analysis proved that the difference was not significant. The problem with imaging retrieval lies in the amount of computations needed to be performed at run time and a later experiement investigated the possibility of reducing this amount. Notes lines of further investigation
Source: Information retrieval: new systems and current research. Proceedings of the 16th Research Colloquium of the British Computer Society Information Retrieval Specialist Group, Drymen, Scotland, 22-23 Mar 94. Ed.: R. Leon

Seo, H.-C.; Kim, S.-B.; Rim, H.-C.; Myaeng, S.-H.: lmproving query translation in English-Korean Cross-language information retrieval (2005) 0.10

0.1023748 = product of:
  0.13649973 = sum of:
    0.0067183874 = product of:
      0.02687355 = sum of:
        0.02687355 = weight(_text_:based in 1023) [ClassicSimilarity], result of:
          0.02687355 = score(doc=1023,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.19973516 = fieldWeight in 1023, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=1023)
      0.25 = coord(1/4)
    0.11163084 = weight(_text_:term in 1023) [ClassicSimilarity], result of:
      0.11163084 = score(doc=1023,freq=6.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.5357528 = fieldWeight in 1023, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=1023)
    0.018150497 = product of:
      0.036300994 = sum of:
        0.036300994 = weight(_text_:22 in 1023) [ClassicSimilarity], result of:
          0.036300994 = score(doc=1023,freq=2.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.23214069 = fieldWeight in 1023, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=1023)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with the candidate translations of other query terms. This paper proposes a new method where we examine all combinations of target query term translations corresponding to the source query terms, instead of looking at the candidates for each query term and selecting the best one at a time. The goodness value for a combination of target query terms is computed based on the association value between each pair of the terms in the combination. We tested our method using the NTCIR-3 English-Korean CLIR test collection. The results show some improvements regardless of the association measures we used.
Date: 26.12.2007 20:22:38

Witschel, H.F.: Global term weights in distributed environments (2008) 0.10

0.1023748 = product of:
  0.13649973 = sum of:
    0.0067183874 = product of:
      0.02687355 = sum of:
        0.02687355 = weight(_text_:based in 2096) [ClassicSimilarity], result of:
          0.02687355 = score(doc=2096,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.19973516 = fieldWeight in 2096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
      0.25 = coord(1/4)
    0.11163084 = weight(_text_:term in 2096) [ClassicSimilarity], result of:
      0.11163084 = score(doc=2096,freq=6.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.5357528 = fieldWeight in 2096, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=2096)
    0.018150497 = product of:
      0.036300994 = sum of:
        0.036300994 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
          0.036300994 = score(doc=2096,freq=2.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.23214069 = fieldWeight in 2096, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
Date: 1. 8.2008 9:44:22

Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.10

0.1023748 = product of:
  0.13649973 = sum of:
    0.0067183874 = product of:
      0.02687355 = sum of:
        0.02687355 = weight(_text_:based in 690) [ClassicSimilarity], result of:
          0.02687355 = score(doc=690,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.19973516 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.25 = coord(1/4)
    0.11163084 = weight(_text_:term in 690) [ClassicSimilarity], result of:
      0.11163084 = score(doc=690,freq=6.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.5357528 = fieldWeight in 690, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.046875 = fieldNorm(doc=690)
    0.018150497 = product of:
      0.036300994 = sum of:
        0.036300994 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
          0.036300994 = score(doc=690,freq=2.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.23214069 = fieldWeight in 690, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
Date: 23. 3.2013 13:22:36

Dillon, M.; Jul, E.: Cataloging Internet resources : the convergence of libraries and Internet resources (1996) 0.10

0.10151319 = product of:
  0.13535093 = sum of:
    0.007838119 = product of:
      0.031352475 = sum of:
        0.031352475 = weight(_text_:based in 6737) [ClassicSimilarity], result of:
          0.031352475 = score(doc=6737,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.23302436 = fieldWeight in 6737, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6737)
      0.25 = coord(1/4)
    0.106337234 = weight(_text_:term in 6737) [ClassicSimilarity], result of:
      0.106337234 = score(doc=6737,freq=4.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.510347 = fieldWeight in 6737, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0546875 = fieldNorm(doc=6737)
    0.02117558 = product of:
      0.04235116 = sum of:
        0.04235116 = weight(_text_:22 in 6737) [ClassicSimilarity], result of:
          0.04235116 = score(doc=6737,freq=2.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.2708308 = fieldWeight in 6737, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6737)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Reviews issues related to the cataloguing of Internet resources and considers short term and long term directions for cataloguing and the gereal provision of library services for remotely accessible, electronic information resources. Discusses the strengths and weaknesses of using a library catalogue model to improve access to Internet resources. Based on experience gained through 2 OCLC Internet cataloguing projects, recommends continued application of library cataloguing standard and methods for Internet resources with the expectation that catalogues, cataloguing and libraries in general will continue to evolve. Points to problems inherent in the MARC field 856
Series: Cataloging and classification quarterly; vol.22, nos.3/4

Ruge, G.; Schwarz, C.: Term association and computational linguistics (1991) 0.10

0.09862436 = product of:
  0.19724873 = sum of:
    0.011197312 = product of:
      0.044789247 = sum of:
        0.044789247 = weight(_text_:based in 2310) [ClassicSimilarity], result of:
          0.044789247 = score(doc=2310,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.33289194 = fieldWeight in 2310, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.078125 = fieldNorm(doc=2310)
      0.25 = coord(1/4)
    0.18605141 = weight(_text_:term in 2310) [ClassicSimilarity], result of:
      0.18605141 = score(doc=2310,freq=6.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.8929213 = fieldWeight in 2310, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.078125 = fieldNorm(doc=2310)
  0.5 = coord(2/4)

Abstract: Most systems for term associations are statistically based. In general they exploit term co-occurrences. A critical overview about statistical approaches in this field is given. A new approach on the basis of a linguistic analysis for large amounts of textual data is outlined

Buzydlowski, J.W.; White, H.D.; Lin, X.: Term Co-occurrence Analysis as an Interface for Digital Libraries (2002) 0.10

0.095887676 = product of:
  0.19177535 = sum of:
    0.12890019 = weight(_text_:term in 1339) [ClassicSimilarity], result of:
      0.12890019 = score(doc=1339,freq=2.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.618634 = fieldWeight in 1339, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.09375 = fieldNorm(doc=1339)
    0.062875174 = product of:
      0.12575035 = sum of:
        0.12575035 = weight(_text_:22 in 1339) [ClassicSimilarity], result of:
          0.12575035 = score(doc=1339,freq=6.0), product of:
            0.15637498 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.044655222 = queryNorm
            0.804159 = fieldWeight in 1339, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1339)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Date: 22. 2.2003 17:25:39
22. 2.2003 18:16:22

Giachanou, A.; Rosso, P.; Crestani, F.: ¬The impact of emotional signals on credibility assessment (2021) 0.09

0.09332055 = product of:
  0.1244274 = sum of:
    0.005598656 = product of:
      0.022394624 = sum of:
        0.022394624 = weight(_text_:based in 328) [ClassicSimilarity], result of:
          0.022394624 = score(doc=328,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.16644597 = fieldWeight in 328, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=328)
      0.25 = coord(1/4)
    0.05370841 = weight(_text_:term in 328) [ClassicSimilarity], result of:
      0.05370841 = score(doc=328,freq=2.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.25776416 = fieldWeight in 328, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=328)
    0.06512033 = product of:
      0.13024066 = sum of:
        0.13024066 = weight(_text_:assessment in 328) [ClassicSimilarity], result of:
          0.13024066 = score(doc=328,freq=6.0), product of:
            0.24654238 = queryWeight, product of:
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.044655222 = queryNorm
            0.5282689 = fieldWeight in 328, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=328)
      0.5 = coord(1/2)
  0.75 = coord(3/4)

Abstract: Fake news is considered one of the main threats of our society. The aim of fake news is usually to confuse readers and trigger intense emotions to them in an attempt to be spread through social networks. Even though recent studies have explored the effectiveness of different linguistic patterns for fake news detection, the role of emotional signals has not yet been explored. In this paper, we focus on extracting emotional signals from claims and evaluating their effectiveness on credibility assessment. First, we explore different methodologies for extracting the emotional signals that can be triggered to the users when they read a claim. Then, we present emoCred, a model that is based on a long-short term memory model that incorporates emotional signals extracted from the text of the claims to differentiate between credible and non-credible ones. In addition, we perform an analysis to understand which emotional signals and which terms are the most useful for the different credibility classes. We conduct extensive experiments and a thorough analysis on real-world datasets. Our results indicate the importance of incorporating emotional signals in the credibility assessment problem.

Brooks, T.A.: Relevance auras : macro patterns and micro scatter (2001) 0.09
```
0.09236436 = product of:
  0.18472873 = sum of:
    0.13155821 = weight(_text_:term in 1591) [ClassicSimilarity], result of:
      0.13155821 = score(doc=1591,freq=12.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.6313907 = fieldWeight in 1591, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1591)
    0.05317052 = product of:
      0.10634104 = sum of:
        0.10634104 = weight(_text_:assessment in 1591) [ClassicSimilarity], result of:
          0.10634104 = score(doc=1591,freq=4.0), product of:
            0.24654238 = queryWeight, product of:
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.044655222 = queryNorm
            0.43132967 = fieldWeight in 1591, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.52102 = idf(docFreq=480, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1591)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Empirical analysis of relevance assessments can Illuminate how different groups of readdes perceive the relationship between bibliographic records and Index terms. This experiment harvested relevance assessments from two groups: engineering students (here after "engineers") and library school students ("librarians"). These groups assessed the relevance relationships between bibliographic records and Index terms for theee literatures: engineering, psychology and education. Assessment included the indexer-selected term (the topically relevant term) as well as broader, narrower and related terms. Figures 1-8 (pages 27-35) show these terms arranged as two-dimensional term domains. Positive relevance assessments plotted across the two-dimensional term domains revealed regular patterns, here called "relevance auras." A relevance aura is a penumbra of positive relevance, emanating from bibliographic records across a term domain of broader, narrower and related index terms. This experiment attempted to compare the relevance auras produced by engineers and librarians at both a macro and micro level of aggregatinn. Relevance auras appeared in data aggregating reader groups and literatures. Micro analyses of individual records, however, showed that relevance auras were ragged or did not develop. Agreement in relevance assessment appears an the individual term basis and often independently of the formation of a relevance aura.
Dang, E.K.F.; Luk, R.W.P.; Allan, J.: Beyond bag-of-words : bigram-enhanced context-dependent term weights (2014) 0.09
```
0.0854112 = product of:
  0.1708224 = sum of:
    0.009697157 = product of:
      0.038788628 = sum of:
        0.038788628 = weight(_text_:based in 1283) [ClassicSimilarity], result of:
          0.038788628 = score(doc=1283,freq=6.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.28829288 = fieldWeight in 1283, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1283)
      0.25 = coord(1/4)
    0.16112524 = weight(_text_:term in 1283) [ClassicSimilarity], result of:
      0.16112524 = score(doc=1283,freq=18.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.7732925 = fieldWeight in 1283, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1283)
  0.5 = coord(2/4)
```
Abstract

While term independence is a widely held assumption in most of the established information retrieval approaches, it is clearly not true and various works in the past have investigated a relaxation of the assumption. One approach is to use n-grams in document representation instead of unigrams. However, the majority of early works on n-grams obtained only modest performance improvement. On the other hand, the use of information based on supporting terms or "contexts" of queries has been found to be promising. In particular, recent studies showed that using new context-dependent term weights improved the performance of relevance feedback (RF) retrieval compared with using traditional bag-of-words BM25 term weights. Calculation of the new term weights requires an estimation of the local probability of relevance of each query term occurrence. In previous studies, the estimation of this probability was based on unigrams that occur in the neighborhood of a query term. We explore an integration of the n-gram and context approaches by computing context-dependent term weights based on a mixture of unigrams and bigrams. Extensive experiments are performed using the title queries of the Text Retrieval Conference (TREC)-6, TREC-7, TREC-8, and TREC-2005 collections, for RF with relevance judgment of either the top 10 or top 20 documents of an initial retrieval. We identify some crucial elements needed in the use of bigrams in our methods, such as proper inverse document frequency (IDF) weighting of the bigrams and noise reduction by pruning bigrams with large document frequency values. We show that enhancing context-dependent term weights with bigrams is effective in further improving retrieval performance.

Jiang, Z.; Gu, Q.; Yin, Y.; Wang, J.; Chen, D.: GRAW+ : a two-view graph propagation method with word coupling for readability assessment (2019) 0.08

0.0842046 = product of:
  0.1684092 = sum of:
    0.007917695 = product of:
      0.03167078 = sum of:
        0.03167078 = weight(_text_:based in 5218) [ClassicSimilarity], result of:
          0.03167078 = score(doc=5218,freq=4.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.23539014 = fieldWeight in 5218, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5218)
      0.25 = coord(1/4)
    0.1604915 = sum of:
      0.13024066 = weight(_text_:assessment in 5218) [ClassicSimilarity], result of:
        0.13024066 = score(doc=5218,freq=6.0), product of:
          0.24654238 = queryWeight, product of:
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.044655222 = queryNorm
          0.5282689 = fieldWeight in 5218, product of:
            2.4494898 = tf(freq=6.0), with freq of:
              6.0 = termFreq=6.0
            5.52102 = idf(docFreq=480, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5218)
      0.03025083 = weight(_text_:22 in 5218) [ClassicSimilarity], result of:
        0.03025083 = score(doc=5218,freq=2.0), product of:
          0.15637498 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.044655222 = queryNorm
          0.19345059 = fieldWeight in 5218, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=5218)
  0.5 = coord(2/4)

Abstract: Existing methods for readability assessment usually construct inductive classification models to assess the readability of singular text documents based on extracted features, which have been demonstrated to be effective. However, they rarely make use of the interrelationship among documents on readability, which can help increase the accuracy of readability assessment. In this article, we adopt a graph-based classification method to model and utilize the relationship among documents using the coupled bag-of-words model. We propose a word coupling method to build the coupled bag-of-words model by estimating the correlation between words on reading difficulty. In addition, we propose a two-view graph propagation method to make use of both the coupled bag-of-words model and the linguistic features. Our method employs a graph merging operation to combine graphs built according to different views, and improves the label propagation by incorporating the ordinal relation among reading levels. Experiments were conducted on both English and Chinese data sets, and the results demonstrate both effectiveness and potential of the method.
Date: 15. 4.2019 13:46:22

Arents, H.C.; Bogaerts, W.F.L.: Concept-based retrieval of hypermedia information : from term indexing to semantic hyperindexing (1993) 0.08

0.083029896 = product of:
  0.16605979 = sum of:
    0.015676238 = product of:
      0.06270495 = sum of:
        0.06270495 = weight(_text_:based in 4715) [ClassicSimilarity], result of:
          0.06270495 = score(doc=4715,freq=2.0), product of:
            0.1345459 = queryWeight, product of:
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.044655222 = queryNorm
            0.46604872 = fieldWeight in 4715, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.0129938 = idf(docFreq=5906, maxDocs=44218)
              0.109375 = fieldNorm(doc=4715)
      0.25 = coord(1/4)
    0.15038356 = weight(_text_:term in 4715) [ClassicSimilarity], result of:
      0.15038356 = score(doc=4715,freq=2.0), product of:
        0.20836261 = queryWeight, product of:
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.044655222 = queryNorm
        0.72173965 = fieldWeight in 4715, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.66603 = idf(docFreq=1130, maxDocs=44218)
          0.109375 = fieldNorm(doc=4715)
  0.5 = coord(2/4)

Search (9942 results, page 1 of 498)

Authors

Years

Languages

Types

Themes

Subjects

Classifications