Search (116 results, page 1 of 6)

Ayadi, H.; Torjmen-Khemakhem, M.; Daoud, M.; Huang, J.X.; Jemaa, M.B.: Mining correlations between medically dependent features and image retrieval models for query classification (2017) 0.06

0.062148638 = product of:
  0.124297276 = sum of:
    0.055226028 = weight(_text_:retrieval in 3607) [ClassicSimilarity], result of:
      0.055226028 = score(doc=3607,freq=14.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.442117 = fieldWeight in 3607, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3607)
    0.04277933 = weight(_text_:use in 3607) [ClassicSimilarity], result of:
      0.04277933 = score(doc=3607,freq=8.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.3383162 = fieldWeight in 3607, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3607)
    0.0167351 = weight(_text_:of in 3607) [ClassicSimilarity], result of:
      0.0167351 = score(doc=3607,freq=18.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.25915858 = fieldWeight in 3607, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3607)
    0.00955682 = product of:
      0.01911364 = sum of:
        0.01911364 = weight(_text_:on in 3607) [ClassicSimilarity], result of:
          0.01911364 = score(doc=3607,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.21044704 = fieldWeight in 3607, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3607)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: The abundance of medical resources has encouraged the development of systems that allow for efficient searches of information in large medical image data sets. State-of-the-art image retrieval models are classified into three categories: content-based (visual) models, textual models, and combined models. Content-based models use visual features to answer image queries, textual image retrieval models use word matching to answer textual queries, and combined image retrieval models, use both textual and visual features to answer queries. Nevertheless, most of previous works in this field have used the same image retrieval model independently of the query type. In this article, we define a list of generic and specific medical query features and exploit them in an association rule mining technique to discover correlations between query features and image retrieval models. Based on these rules, we propose to use an associative classifier (NaiveClass) to find the best suitable retrieval model given a new textual query. We also propose a second associative classifier (SmartClass) to select the most appropriate default class for the query. Experiments are performed on Medical ImageCLEF queries from 2008 to 2012 to evaluate the impact of the proposed query features on the classification performance. The results show that combining our proposed specific and generic query features is effective in query classification.
Source: Journal of the Association for Information Science and Technology. 68(2017) no.5, S.1323-1334

Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.05

0.05097526 = product of:
  0.10195052 = sum of:
    0.055226028 = weight(_text_:retrieval in 607) [ClassicSimilarity], result of:
      0.055226028 = score(doc=607,freq=14.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.442117 = fieldWeight in 607, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
    0.021389665 = weight(_text_:use in 607) [ClassicSimilarity], result of:
      0.021389665 = score(doc=607,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.1691581 = fieldWeight in 607, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
    0.015778005 = weight(_text_:of in 607) [ClassicSimilarity], result of:
      0.015778005 = score(doc=607,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.24433708 = fieldWeight in 607, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
    0.00955682 = product of:
      0.01911364 = sum of:
        0.01911364 = weight(_text_:on in 607) [ClassicSimilarity], result of:
          0.01911364 = score(doc=607,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.21044704 = fieldWeight in 607, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=607)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
Footnote: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1884-1898

Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.05

0.047877792 = product of:
  0.095755585 = sum of:
    0.04338471 = weight(_text_:retrieval in 1326) [ClassicSimilarity], result of:
      0.04338471 = score(doc=1326,freq=6.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.34732026 = fieldWeight in 1326, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1326)
    0.025667597 = weight(_text_:use in 1326) [ClassicSimilarity], result of:
      0.025667597 = score(doc=1326,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.20298971 = fieldWeight in 1326, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=1326)
    0.02008212 = weight(_text_:of in 1326) [ClassicSimilarity], result of:
      0.02008212 = score(doc=1326,freq=18.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.3109903 = fieldWeight in 1326, product of:
          4.2426405 = tf(freq=18.0), with freq of:
            18.0 = termFreq=18.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1326)
    0.006621159 = product of:
      0.013242318 = sum of:
        0.013242318 = weight(_text_:on in 1326) [ClassicSimilarity], result of:
          0.013242318 = score(doc=1326,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.14580199 = fieldWeight in 1326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.
Source: Journal of the Association for Information Science and Technology. 65(2014) no.8, S.1597-1614

Biskri, I.; Rompré, L.: Using association rules for query reformulation (2012) 0.05

0.046379514 = product of:
  0.09275903 = sum of:
    0.035423465 = weight(_text_:retrieval in 92) [ClassicSimilarity], result of:
      0.035423465 = score(doc=92,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.2835858 = fieldWeight in 92, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=92)
    0.025667597 = weight(_text_:use in 92) [ClassicSimilarity], result of:
      0.025667597 = score(doc=92,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.20298971 = fieldWeight in 92, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=92)
    0.025046807 = weight(_text_:of in 92) [ClassicSimilarity], result of:
      0.025046807 = score(doc=92,freq=28.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.38787308 = fieldWeight in 92, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=92)
    0.006621159 = product of:
      0.013242318 = sum of:
        0.013242318 = weight(_text_:on in 92) [ClassicSimilarity], result of:
          0.013242318 = score(doc=92,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.14580199 = fieldWeight in 92, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=92)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.
Source: Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a

Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.04

0.044872273 = product of:
  0.089744546 = sum of:
    0.03422346 = weight(_text_:use in 1737) [ClassicSimilarity], result of:
      0.03422346 = score(doc=1737,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.27065295 = fieldWeight in 1737, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0625 = fieldNorm(doc=1737)
    0.017850775 = weight(_text_:of in 1737) [ClassicSimilarity], result of:
      0.017850775 = score(doc=1737,freq=8.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.27643585 = fieldWeight in 1737, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0625 = fieldNorm(doc=1737)
    0.015290912 = product of:
      0.030581824 = sum of:
        0.030581824 = weight(_text_:on in 1737) [ClassicSimilarity], result of:
          0.030581824 = score(doc=1737,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.33671528 = fieldWeight in 1737, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0625 = fieldNorm(doc=1737)
      0.5 = coord(1/2)
    0.0223794 = product of:
      0.0447588 = sum of:
        0.0447588 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
          0.0447588 = score(doc=1737,freq=2.0), product of:
            0.1446067 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041294612 = queryNorm
            0.30952093 = fieldWeight in 1737, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1737)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Defines digital libraries and discusses the effects of new technology on librarians. Examines the different viewpoints of librarians and information technologists on digital libraries. Describes the development of a digital library at the National Drug Intelligence Center, USA, which was carried out in collaboration with information technology experts. The system is based on Web enabled search technology to find information, data visualization and data mining to visualize it and use of SGML as an information standard to store it
Date: 22.11.1998 18:57:22

Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.04

0.039792337 = product of:
  0.07958467 = sum of:
    0.029519552 = weight(_text_:retrieval in 604) [ClassicSimilarity], result of:
      0.029519552 = score(doc=604,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.23632148 = fieldWeight in 604, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
    0.021389665 = weight(_text_:use in 604) [ClassicSimilarity], result of:
      0.021389665 = score(doc=604,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.1691581 = fieldWeight in 604, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
    0.02087234 = weight(_text_:of in 604) [ClassicSimilarity], result of:
      0.02087234 = score(doc=604,freq=28.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.32322758 = fieldWeight in 604, product of:
          5.2915025 = tf(freq=28.0), with freq of:
            28.0 = termFreq=28.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
    0.007803111 = product of:
      0.015606222 = sum of:
        0.015606222 = weight(_text_:on in 604) [ClassicSimilarity], result of:
          0.015606222 = score(doc=604,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.1718293 = fieldWeight in 604, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=604)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
Footnote: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1820-1837

Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.04

0.03965332 = product of:
  0.07930664 = sum of:
    0.035423465 = weight(_text_:retrieval in 521) [ClassicSimilarity], result of:
      0.035423465 = score(doc=521,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.2835858 = fieldWeight in 521, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=521)
    0.025667597 = weight(_text_:use in 521) [ClassicSimilarity], result of:
      0.025667597 = score(doc=521,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.20298971 = fieldWeight in 521, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=521)
    0.011594418 = weight(_text_:of in 521) [ClassicSimilarity], result of:
      0.011594418 = score(doc=521,freq=6.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.17955035 = fieldWeight in 521, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=521)
    0.006621159 = product of:
      0.013242318 = sum of:
        0.013242318 = weight(_text_:on in 521) [ClassicSimilarity], result of:
          0.013242318 = score(doc=521,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.14580199 = fieldWeight in 521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=521)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.2, S.280-290

Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.04

0.038957372 = product of:
  0.077914745 = sum of:
    0.029945528 = weight(_text_:use in 2908) [ClassicSimilarity], result of:
      0.029945528 = score(doc=2908,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.23682132 = fieldWeight in 2908, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2908)
    0.020662563 = weight(_text_:of in 2908) [ClassicSimilarity], result of:
      0.020662563 = score(doc=2908,freq=14.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.31997898 = fieldWeight in 2908, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2908)
    0.007724685 = product of:
      0.01544937 = sum of:
        0.01544937 = weight(_text_:on in 2908) [ClassicSimilarity], result of:
          0.01544937 = score(doc=2908,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.17010231 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
      0.5 = coord(1/2)
    0.019581974 = product of:
      0.039163947 = sum of:
        0.039163947 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
          0.039163947 = score(doc=2908,freq=2.0), product of:
            0.1446067 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041294612 = queryNorm
            0.2708308 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: Focuses on the information modelling side of conceptual modelling. Deals with the exploitation of fact verbalisations after finishing the actual information system. Verbalisations are used as input for the design of the so-called information model. Exploits these verbalisation in 4 directions: considers their use for a conceptual query language, the verbalisation of instances, the description of the contents of a database and for the verbalisation of queries in a computer supported query environment. Provides an example session with an envisioned tool for end user query formulations that exploits the verbalisation
Source: Information systems. 22(1997) nos.5/6, S.349-385

Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.03

0.033537634 = product of:
  0.08943369 = sum of:
    0.035423465 = weight(_text_:retrieval in 91) [ClassicSimilarity], result of:
      0.035423465 = score(doc=91,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.2835858 = fieldWeight in 91, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=91)
    0.036299463 = weight(_text_:use in 91) [ClassicSimilarity], result of:
      0.036299463 = score(doc=91,freq=4.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.2870708 = fieldWeight in 91, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.046875 = fieldNorm(doc=91)
    0.017710768 = weight(_text_:of in 91) [ClassicSimilarity], result of:
      0.017710768 = score(doc=91,freq=14.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2742677 = fieldWeight in 91, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=91)
  0.375 = coord(3/8)

Abstract: Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.
Source: Next generation search engines: advanced models for information retrieval. Eds.: C. Jouis, u.a

Hallonsten, O.; Holmberg, D.: Analyzing structural stratification in the Swedish higher education system : data contextualization with policy-history analysis (2013) 0.03

0.031286977 = product of:
  0.062573954 = sum of:
    0.021389665 = weight(_text_:use in 668) [ClassicSimilarity], result of:
      0.021389665 = score(doc=668,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.1691581 = fieldWeight in 668, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=668)
    0.017640345 = weight(_text_:of in 668) [ClassicSimilarity], result of:
      0.017640345 = score(doc=668,freq=20.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.27317715 = fieldWeight in 668, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=668)
    0.00955682 = product of:
      0.01911364 = sum of:
        0.01911364 = weight(_text_:on in 668) [ClassicSimilarity], result of:
          0.01911364 = score(doc=668,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.21044704 = fieldWeight in 668, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=668)
      0.5 = coord(1/2)
    0.013987125 = product of:
      0.02797425 = sum of:
        0.02797425 = weight(_text_:22 in 668) [ClassicSimilarity], result of:
          0.02797425 = score(doc=668,freq=2.0), product of:
            0.1446067 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.041294612 = queryNorm
            0.19345059 = fieldWeight in 668, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=668)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: 20th century massification of higher education and research in academia is said to have produced structurally stratified higher education systems in many countries. Most manifestly, the research mission of universities appears to be divisive. Authors have claimed that the Swedish system, while formally unified, has developed into a binary state, and statistics seem to support this conclusion. This article makes use of a comprehensive statistical data source on Swedish higher education institutions to illustrate stratification, and uses literature on Swedish research policy history to contextualize the statistics. Highlighting the opportunities as well as constraints of the data, the article argues that there is great merit in combining statistics with a qualitative analysis when studying the structural characteristics of national higher education systems. Not least the article shows that it is an over-simplification to describe the Swedish system as binary; the stratification is more complex. On basis of the analysis, the article also argues that while global trends certainly influence national developments, higher education systems have country-specific features that may enrich the understanding of how systems evolve and therefore should be analyzed as part of a broader study of the increasingly globalized academic system.
Date: 22. 3.2013 19:43:01
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.3, S.574-586

Lihui, C.; Lian, C.W.: Using Web structure and summarisation techniques for Web content mining (2005) 0.03

0.031269874 = product of:
  0.06253975 = sum of:
    0.020873476 = weight(_text_:retrieval in 1046) [ClassicSimilarity], result of:
      0.020873476 = score(doc=1046,freq=2.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.16710453 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1046)
    0.021389665 = weight(_text_:use in 1046) [ClassicSimilarity], result of:
      0.021389665 = score(doc=1046,freq=2.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.1691581 = fieldWeight in 1046, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1046)
    0.014758972 = weight(_text_:of in 1046) [ClassicSimilarity], result of:
      0.014758972 = score(doc=1046,freq=14.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.22855641 = fieldWeight in 1046, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1046)
    0.0055176322 = product of:
      0.0110352645 = sum of:
        0.0110352645 = weight(_text_:on in 1046) [ClassicSimilarity], result of:
          0.0110352645 = score(doc=1046,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.121501654 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.5 = coord(4/8)

Abstract: The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a "Tree-like" Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.

Sánchez, D.; Chamorro-Martínez, J.; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval (2003) 0.03

0.028369166 = product of:
  0.07565111 = sum of:
    0.050096344 = weight(_text_:retrieval in 1067) [ClassicSimilarity], result of:
      0.050096344 = score(doc=1067,freq=8.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.40105087 = fieldWeight in 1067, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=1067)
    0.018933605 = weight(_text_:of in 1067) [ClassicSimilarity], result of:
      0.018933605 = score(doc=1067,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2932045 = fieldWeight in 1067, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=1067)
    0.006621159 = product of:
      0.013242318 = sum of:
        0.013242318 = weight(_text_:on in 1067) [ClassicSimilarity], result of:
          0.013242318 = score(doc=1067,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.14580199 = fieldWeight in 1067, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.

Saz, J.T.: Perspectivas en recuperacion y explotacion de informacion electronica : el 'data mining' (1997) 0.03

0.027039845 = product of:
  0.07210625 = sum of:
    0.04174695 = weight(_text_:retrieval in 3723) [ClassicSimilarity], result of:
      0.04174695 = score(doc=3723,freq=2.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.33420905 = fieldWeight in 3723, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.078125 = fieldNorm(doc=3723)
    0.019324033 = weight(_text_:of in 3723) [ClassicSimilarity], result of:
      0.019324033 = score(doc=3723,freq=6.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.2992506 = fieldWeight in 3723, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.078125 = fieldNorm(doc=3723)
    0.0110352645 = product of:
      0.022070529 = sum of:
        0.022070529 = weight(_text_:on in 3723) [ClassicSimilarity], result of:
          0.022070529 = score(doc=3723,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.24300331 = fieldWeight in 3723, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.078125 = fieldNorm(doc=3723)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: Presents the concept and the techniques identified by the term data mining. Explains the principles and phases of developing a data mining process, and the main types of data mining tools
Footnote: Übers. des Titels: Perspectives on the retrieval and exploitation of electronic information: data mining

Lam, W.; Yang, C.C.; Menczer, F.: Introduction to the special topic section on mining Web resources for enhancing information retrieval (2007) 0.03

0.025587654 = product of:
  0.06823374 = sum of:
    0.041327372 = weight(_text_:retrieval in 600) [ClassicSimilarity], result of:
      0.041327372 = score(doc=600,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.33085006 = fieldWeight in 600, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=600)
    0.013526822 = weight(_text_:of in 600) [ClassicSimilarity], result of:
      0.013526822 = score(doc=600,freq=6.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.20947541 = fieldWeight in 600, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=600)
    0.013379549 = product of:
      0.026759097 = sum of:
        0.026759097 = weight(_text_:on in 600) [ClassicSimilarity], result of:
          0.026759097 = score(doc=600,freq=6.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.29462588 = fieldWeight in 600, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=600)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: The amount of information on the Web has been expanding at an enormous pace. There are a variety of Web documents in different genres, such as news, reports, reviews. Traditionally, the information displayed on Web sites has been static. Recently, there are many Web sites offering content that is dynamically generated and frequently updated. It is also common for Web sites to contain information in different languages since many countries adopt more than one language. Moreover, content may exist in multimedia formats including text, images, video, and audio.
Footnote: Einführung in einen Themenschwerpunkt "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1791-1792

Gaizauskas, R.; Wilks, Y.: Information extraction : beyond document retrieval (1998) 0.02

0.023704888 = product of:
  0.063213035 = sum of:
    0.035423465 = weight(_text_:retrieval in 4716) [ClassicSimilarity], result of:
      0.035423465 = score(doc=4716,freq=4.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.2835858 = fieldWeight in 4716, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.046875 = fieldNorm(doc=4716)
    0.021168415 = weight(_text_:of in 4716) [ClassicSimilarity], result of:
      0.021168415 = score(doc=4716,freq=20.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.32781258 = fieldWeight in 4716, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.046875 = fieldNorm(doc=4716)
    0.006621159 = product of:
      0.013242318 = sum of:
        0.013242318 = weight(_text_:on in 4716) [ClassicSimilarity], result of:
          0.013242318 = score(doc=4716,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.14580199 = fieldWeight in 4716, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.046875 = fieldNorm(doc=4716)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: In this paper we give a synoptic view of the growth of the text processing technology of informatione xtraction (IE) whose function is to extract information about a pre-specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining
Source: Journal of documentation. 54(1998) no.1, S.70-105

Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.02

0.022803668 = product of:
  0.060809784 = sum of:
    0.029222867 = weight(_text_:retrieval in 601) [ClassicSimilarity], result of:
      0.029222867 = score(doc=601,freq=2.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.23394634 = fieldWeight in 601, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0546875 = fieldNorm(doc=601)
    0.020662563 = weight(_text_:of in 601) [ClassicSimilarity], result of:
      0.020662563 = score(doc=601,freq=14.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.31997898 = fieldWeight in 601, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0546875 = fieldNorm(doc=601)
    0.010924355 = product of:
      0.02184871 = sum of:
        0.02184871 = weight(_text_:on in 601) [ClassicSimilarity], result of:
          0.02184871 = score(doc=601,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.24056101 = fieldWeight in 601, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0546875 = fieldNorm(doc=601)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
Footnote: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1793-1804

Tonkin, E.L.; Tourte, G.J.L.: Working with text. tools, techniques and approaches for text mining (2016) 0.02

0.022735914 = product of:
  0.060629103 = sum of:
    0.037047986 = weight(_text_:use in 4019) [ClassicSimilarity], result of:
      0.037047986 = score(doc=4019,freq=6.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.29299045 = fieldWeight in 4019, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4019)
    0.015778005 = weight(_text_:of in 4019) [ClassicSimilarity], result of:
      0.015778005 = score(doc=4019,freq=16.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.24433708 = fieldWeight in 4019, product of:
          4.0 = tf(freq=16.0), with freq of:
            16.0 = termFreq=16.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4019)
    0.007803111 = product of:
      0.015606222 = sum of:
        0.015606222 = weight(_text_:on in 4019) [ClassicSimilarity], result of:
          0.015606222 = score(doc=4019,freq=4.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.1718293 = fieldWeight in 4019, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4019)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: What is text mining, and how can it be used? What relevance do these methods have to everyday work in information science and the digital humanities? How does one develop competences in text mining? Working with Text provides a series of cross-disciplinary perspectives on text mining and its applications. As text mining raises legal and ethical issues, the legal background of text mining and the responsibilities of the engineer are discussed in this book. Chapters provide an introduction to the use of the popular GATE text mining package with data drawn from social media, the use of text mining to support semantic search, the development of an authority system to support content tagging, and recent techniques in automatic language evaluation. Focused studies describe text mining on historical texts, automated indexing using constrained vocabularies, and the use of natural language processing to explore the climate science literature. Interviews are included that offer a glimpse into the real-life experience of working within commercial and academic text mining.

Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.02

0.022096936 = product of:
  0.058925163 = sum of:
    0.030249555 = weight(_text_:use in 967) [ClassicSimilarity], result of:
      0.030249555 = score(doc=967,freq=4.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.23922569 = fieldWeight in 967, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
    0.017640345 = weight(_text_:of in 967) [ClassicSimilarity], result of:
      0.017640345 = score(doc=967,freq=20.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.27317715 = fieldWeight in 967, product of:
          4.472136 = tf(freq=20.0), with freq of:
            20.0 = termFreq=20.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=967)
    0.0110352645 = product of:
      0.022070529 = sum of:
        0.022070529 = weight(_text_:on in 967) [ClassicSimilarity], result of:
          0.022070529 = score(doc=967,freq=8.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.24300331 = fieldWeight in 967, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.5 = coord(1/2)
  0.375 = coord(3/8)

Abstract: Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.
Source: Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1399-1410

Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.02
```
0.021879721 = product of:
  0.05834592 = sum of:
    0.036153924 = weight(_text_:retrieval in 605) [ClassicSimilarity], result of:
      0.036153924 = score(doc=605,freq=6.0), product of:
        0.124912694 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.041294612 = queryNorm
        0.28943354 = fieldWeight in 605, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=605)
    0.011156735 = weight(_text_:of in 605) [ClassicSimilarity], result of:
      0.011156735 = score(doc=605,freq=8.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.17277241 = fieldWeight in 605, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=605)
    0.0110352645 = product of:
      0.022070529 = sum of:
        0.022070529 = weight(_text_:on in 605) [ClassicSimilarity], result of:
          0.022070529 = score(doc=605,freq=8.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.24300331 = fieldWeight in 605, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=605)
      0.5 = coord(1/2)
  0.375 = coord(3/8)
```
Abstract

Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1838-1850
O'Brien, H.L.; Lebow, M.: Mixed-methods approach to measuring user experience in online news interactions (2013) 0.02
```
0.021780247 = product of:
  0.05808066 = sum of:
    0.030249555 = weight(_text_:use in 1001) [ClassicSimilarity], result of:
      0.030249555 = score(doc=1001,freq=4.0), product of:
        0.12644777 = queryWeight, product of:
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.041294612 = queryNorm
        0.23922569 = fieldWeight in 1001, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0620887 = idf(docFreq=5623, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1001)
    0.02231347 = weight(_text_:of in 1001) [ClassicSimilarity], result of:
      0.02231347 = score(doc=1001,freq=32.0), product of:
        0.06457475 = queryWeight, product of:
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.041294612 = queryNorm
        0.34554482 = fieldWeight in 1001, product of:
          5.656854 = tf(freq=32.0), with freq of:
            32.0 = termFreq=32.0
          1.5637573 = idf(docFreq=25162, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1001)
    0.0055176322 = product of:
      0.0110352645 = sum of:
        0.0110352645 = weight(_text_:on in 1001) [ClassicSimilarity], result of:
          0.0110352645 = score(doc=1001,freq=2.0), product of:
            0.090823986 = queryWeight, product of:
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.041294612 = queryNorm
            0.121501654 = fieldWeight in 1001, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.199415 = idf(docFreq=13325, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1001)
      0.5 = coord(1/2)
  0.375 = coord(3/8)
```
Abstract

When it comes to evaluating online information experiences, what metrics matter? We conducted a study in which 30 people browsed and selected content within an online news website. Data collected included psychometric scales (User Engagement, Cognitive Absorption, System Usability Scales), self-reported interest in news content, and performance metrics (i.e., reading time, browsing time, total time, number of pages visited, and use of recommended links); a subset of the participants had their physiological responses recorded during the interaction (i.e., heart rate, electrodermal activity, electrocmytogram). Findings demonstrated the concurrent validity of the psychometric scales and interest ratings and revealed that increased time on tasks, number of pages visited, and use of recommended links were not necessarily indicative of greater self-reported engagement, cognitive absorption, or perceived usability. Positive ratings of news content were associated with lower physiological activity. The implications of this research are twofold. First, we propose that user experience is a useful framework for studying online information interactions and will result in a broader conceptualization of information interaction and its evaluation. Second, we advocate a mixed-methods approach to measurement that employs a suite of metrics capable of capturing the pragmatic (e.g., usability) and hedonic (e.g., fun, engagement) aspects of information interactions. We underscore the importance of using multiple measures in information research, because our results emphasize that performance and physiological data must be interpreted in the context of users' subjective experiences.

Source

Journal of the American Society for Information Science and Technology. 64(2013) no.8, S.1543-1556

Search (116 results, page 1 of 6)

Authors

Years

Languages

Themes

Subjects

Classifications