Search (46 results, page 1 of 3)

Bath, P.A.: Data mining in health and medical information (2003) 0.02

0.023819726 = product of:
  0.047639452 = sum of:
    0.028983762 = weight(_text_:information in 4263) [ClassicSimilarity], result of:
      0.028983762 = score(doc=4263,freq=10.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.3469568 = fieldWeight in 4263, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0625 = fieldNorm(doc=4263)
    0.01865569 = product of:
      0.03731138 = sum of:
        0.03731138 = weight(_text_:technology in 4263) [ClassicSimilarity], result of:
          0.03731138 = score(doc=4263,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.2632547 = fieldWeight in 4263, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0625 = fieldNorm(doc=4263)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Data mining (DM) is part of a process by which information can be extracted from data or databases and used to inform decision making in a variety of contexts (Benoit, 2002; Michalski, Bratka & Kubat, 1997). DM includes a range of tools and methods for extractiog information; their use in the commercial sector for knowledge extraction and discovery has been one of the main driving forces in their development (Adriaans & Zantinge, 1996; Benoit, 2002). DM has been developed and applied in numerous areas. This review describes its use in analyzing health and medical information.
Source: Annual review of information science and technology. 38(2004), S.331-370

Lam, W.; Yang, C.C.; Menczer, F.: Introduction to the special topic section on mining Web resources for enhancing information retrieval (2007) 0.02

0.022052541 = product of:
  0.044105083 = sum of:
    0.027781356 = weight(_text_:information in 600) [ClassicSimilarity], result of:
      0.027781356 = score(doc=600,freq=12.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.3325631 = fieldWeight in 600, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=600)
    0.016323728 = product of:
      0.032647457 = sum of:
        0.032647457 = weight(_text_:technology in 600) [ClassicSimilarity], result of:
          0.032647457 = score(doc=600,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.23034787 = fieldWeight in 600, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0546875 = fieldNorm(doc=600)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: The amount of information on the Web has been expanding at an enormous pace. There are a variety of Web documents in different genres, such as news, reports, reviews. Traditionally, the information displayed on Web sites has been static. Recently, there are many Web sites offering content that is dynamically generated and frequently updated. It is also common for Web sites to contain information in different languages since many countries adopt more than one language. Moreover, content may exist in multimedia formats including text, images, video, and audio.
Footnote: Einführung in einen Themenschwerpunkt "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1791-1792

Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.02

0.017984057 = product of:
  0.035968114 = sum of:
    0.019644385 = weight(_text_:information in 601) [ClassicSimilarity], result of:
      0.019644385 = score(doc=601,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.23515764 = fieldWeight in 601, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=601)
    0.016323728 = product of:
      0.032647457 = sum of:
        0.032647457 = weight(_text_:technology in 601) [ClassicSimilarity], result of:
          0.032647457 = score(doc=601,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.23034787 = fieldWeight in 601, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0546875 = fieldNorm(doc=601)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: In this paper, we present a framework for clustering Web search engine queries whose aim is to identify groups of queries used to search for similar information on the Web. The framework is based on a novel term vector model of queries that integrates user selections and the content of selected documents extracted from the logs of a search engine. The query representation obtained allows us to treat query clustering similarly to standard document clustering. We study the application of the clustering framework to two problems: relevance ranking boosting and query recommendation. Finally, we evaluate with experiments the effectiveness of our approach.
Footnote: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1793-1804

Zhou, L.; Chaovalit, P.: Ontology-supported polarity mining (2008) 0.02

0.017984057 = product of:
  0.035968114 = sum of:
    0.019644385 = weight(_text_:information in 1343) [ClassicSimilarity], result of:
      0.019644385 = score(doc=1343,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.23515764 = fieldWeight in 1343, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1343)
    0.016323728 = product of:
      0.032647457 = sum of:
        0.032647457 = weight(_text_:technology in 1343) [ClassicSimilarity], result of:
          0.032647457 = score(doc=1343,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.23034787 = fieldWeight in 1343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1343)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: Polarity mining provides an in-depth analysis of semantic orientations of text information. Motivated by its success in the area of topic mining, we propose an ontology-supported polarity mining (OSPM) approach. The approach aims to enhance polarity mining with ontology by providing detailed topic-specific information. OSPM was evaluated in the movie review domain using both supervised and unsupervised techniques. Results revealed that OSPM outperformed the baseline method without ontology support. The findings of this study not only advance the state of polarity mining research but also shed light on future research directions.
Source: Journal of the American Society for Information Science and Technology. 59(2008) no.1, S.98-110

Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.02
```
0.017864795 = product of:
  0.03572959 = sum of:
    0.021737823 = weight(_text_:information in 4242) [ClassicSimilarity], result of:
      0.021737823 = score(doc=4242,freq=10.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.2602176 = fieldWeight in 4242, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4242)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 4242) [ClassicSimilarity], result of:
          0.027983533 = score(doc=4242,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 4242, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=4242)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.

Source

Annual review of information science and technology. 38(2004), S.289-330

Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.02

0.016717333 = product of:
  0.033434667 = sum of:
    0.0194429 = weight(_text_:information in 602) [ClassicSimilarity], result of:
      0.0194429 = score(doc=602,freq=8.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.23274569 = fieldWeight in 602, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=602)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 602) [ClassicSimilarity], result of:
          0.027983533 = score(doc=602,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 602, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=602)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: We present an approach to enhancing information access through Web structure mining in contrast to traditional approaches involving usage mining. Specifically, we mine the hardwired hierarchical hyperlink structure of Web sites to identify patterns of term-term co-occurrences we call Web functional dependencies (FDs). Intuitively, a Web FD x -> y declares that all paths through a site involving a hyperlink labeled x also contain a hyperlink labeled y. The complete set of FDs satisfied by a site help characterize (flexible and expressive) interaction paradigms supported by a site, where a paradigm is the set of explorable sequences therein. We describe algorithms for mining FDs and results from mining several hierarchical Web sites and present several interface designs that can exploit such FDs to provide compelling user experiences.
Footnote: Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Source: Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1805-1819

Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.02
```
0.016546793 = product of:
  0.033093587 = sum of:
    0.02143378 = weight(_text_:information in 605) [ClassicSimilarity], result of:
      0.02143378 = score(doc=605,freq=14.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.256578 = fieldWeight in 605, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=605)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 605) [ClassicSimilarity], result of:
          0.02331961 = score(doc=605,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 605, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=605)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Documents discussing public affairs, common themes, interesting products, and so on, are reported and distributed on the Web. Positive and negative opinions embedded in documents are useful references and feedbacks for governments to improve their services, for companies to market their products, and for customers to purchase their objects. Web opinion mining aims to extract, summarize, and track various aspects of subjective information on the Web. Mining subjective information enables traditional information retrieval (IR) systems to retrieve more data from human viewpoints and provide information with finer granularity. Opinion extraction identifies opinion holders, extracts the relevant opinion sentences, and decides their polarities. Opinion summarization recognizes the major events embedded in documents and summarizes the supportive and the nonsupportive evidence. Opinion tracking captures subjective information from various genres and monitors the developments of opinions from spatial and temporal dimensions. To demonstrate and evaluate the proposed opinion mining algorithms, news and bloggers' articles are adopted. Documents in the evaluation corpora are tagged in different granularities from words, sentences to documents. In the experiments, positive and negative sentiment words and their weights are mined on the basis of Chinese word structures. The f-measure is 73.18% and 63.75% for verbs and nouns, respectively. Utilizing the sentiment words mined together with topical words, we achieve f-measure 62.16% at the sentence level and 74.37% at the document level.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1838-1850
Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.02
```
0.015751816 = product of:
  0.031503633 = sum of:
    0.019843826 = weight(_text_:information in 3237) [ClassicSimilarity], result of:
      0.019843826 = score(doc=3237,freq=12.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.23754507 = fieldWeight in 3237, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3237)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 3237) [ClassicSimilarity], result of:
          0.02331961 = score(doc=3237,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 3237, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3237)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

In this article we present a semi-supervised active learning algorithm for pattern discovery in information extraction from textual data. The patterns are reduced regular expressions composed of various characteristics of features useful in information extraction. Our major contribution is a semi-supervised learning algorithm that extracts information from a set of examples labeled as relevant or irrelevant to a given attribute. The approach is semi-supervised because it does not require precise labeling of the exact location of features in the training data. This significantly reduces the effort needed to develop a training set. An active learning algorithm is used to assist the semi-supervised learning algorithm to further reduce the training set development effort. The active learning algorithm is seeded with a Single positive example of a given attribute. The context of the seed is used to automatically identify candidates for additional positive examples of the given attribute. Candidate examples are manually pruned during the active learning phase, and our semi-supervised learning algorithm automatically discovers reduced regular expressions for each attribute. We have successfully applied this learning technique in the extraction of textual features from police incident reports, university crime reports, and patents. The performance of our algorithm compares favorably with competitive extraction systems being used in criminal justice information systems.

Source

Journal of the American Society for Information Science and Technology. 56(2005) no.3, S.258-271
Benoit, G.: Data mining (2002) 0.02
```
0.015414905 = product of:
  0.03082981 = sum of:
    0.016838044 = weight(_text_:information in 4296) [ClassicSimilarity], result of:
      0.016838044 = score(doc=4296,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.20156369 = fieldWeight in 4296, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4296)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 4296) [ClassicSimilarity], result of:
          0.027983533 = score(doc=4296,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 4296, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=4296)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Data mining (DM) is a multistaged process of extracting previously unanticipated knowledge from large databases, and applying the results to decision making. Data mining tools detect patterns from the data and infer associations and rules from them. The extracted information may then be applied to prediction or classification models by identifying relations within the data records or between databases. Those patterns and rules can then guide decision making and forecast the effects of those decisions. However, this definition may be applied equally to "knowledge discovery in databases" (KDD). Indeed, in the recent literature of DM and KDD, a source of confusion has emerged, making it difficult to determine the exact parameters of both. KDD is sometimes viewed as the broader discipline, of which data mining is merely a component-specifically pattern extraction, evaluation, and cleansing methods (Raghavan, Deogun, & Sever, 1998, p. 397). Thurasingham (1999, p. 2) remarked that "knowledge discovery," "pattern discovery," "data dredging," "information extraction," and "knowledge mining" are all employed as synonyms for DM. Trybula, in his ARIST chapter an text mining, observed that the "existing work [in KDD] is confusing because the terminology is inconsistent and poorly defined.

Source

Annual review of information science and technology. 36(2002), S.265-312

Li, J.; Zhang, P.; Cao, J.: External concept support for group support systems through Web mining (2009) 0.02

0.015414905 = product of:
  0.03082981 = sum of:
    0.016838044 = weight(_text_:information in 2806) [ClassicSimilarity], result of:
      0.016838044 = score(doc=2806,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.20156369 = fieldWeight in 2806, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2806)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 2806) [ClassicSimilarity], result of:
          0.027983533 = score(doc=2806,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 2806, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=2806)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Abstract: External information plays an important role in group decision-making processes, yet research about external information support for Group Support Systems (GSS) has been lacking. In this study, we propose an approach to build a concept space to provide external concept support for GSS users. Built on a Web mining algorithm, the approach can mine a concept space from the Web and retrieve related concepts from the concept space based on users' comments in a real-time manner. We conduct two experiments to evaluate the quality of the proposed approach and the effectiveness of the external concept support provided by this approach. The experiment results indicate that the concept space mined from the Web contained qualified concepts to stimulate divergent thinking. The results also demonstrate that external concept support in GSS greatly enhanced group productivity for idea generation tasks.
Source: Journal of the American Society for Information Science and Technology. 60(2009) no.5, S.1057-1070

Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.01
```
0.014887327 = product of:
  0.029774655 = sum of:
    0.01811485 = weight(_text_:information in 597) [ClassicSimilarity], result of:
      0.01811485 = score(doc=597,freq=10.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.21684799 = fieldWeight in 597, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=597)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 597) [ClassicSimilarity], result of:
          0.02331961 = score(doc=597,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 597, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=597)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

With the overwhelming volume of information, the task of finding relevant information on a given topic on the Web is becoming increasingly difficult. Web search engines hence become one of the most popular solutions available on the Web. However, it has never been easy for novice users to organize and represent their information needs using simple queries. Users have to keep modifying their input queries until they get expected results. Therefore, it is often desirable for search engines to give suggestions on related queries to users. Besides, by identifying those related queries, search engines can potentially perform optimizations on their systems, such as query expansion and file indexing. In this work we propose a method that suggests a list of related queries given an initial input query. The related queries are based in the query log of previously submitted queries by human users, which can be identified using an enhanced model of association rules. Users can utilize the suggested related queries to tune or redirect the search process. Our method not only discovers the related queries, but also ranks them according to the degree of their relatedness. Unlike many other rival techniques, it also performs reasonably well on less frequent input queries.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1871-1883
Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.01
```
0.01393111 = product of:
  0.02786222 = sum of:
    0.016202414 = weight(_text_:information in 607) [ClassicSimilarity], result of:
      0.016202414 = score(doc=607,freq=8.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.19395474 = fieldWeight in 607, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 607) [ClassicSimilarity], result of:
          0.02331961 = score(doc=607,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 607, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=607)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1884-1898
Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.01
```
0.012845755 = product of:
  0.02569151 = sum of:
    0.0140317045 = weight(_text_:information in 604) [ClassicSimilarity], result of:
      0.0140317045 = score(doc=604,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16796975 = fieldWeight in 604, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=604)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 604) [ClassicSimilarity], result of:
          0.02331961 = score(doc=604,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 604, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=604)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1820-1837
Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.01
```
0.012845755 = product of:
  0.02569151 = sum of:
    0.0140317045 = weight(_text_:information in 606) [ClassicSimilarity], result of:
      0.0140317045 = score(doc=606,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16796975 = fieldWeight in 606, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=606)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 606) [ClassicSimilarity], result of:
          0.02331961 = score(doc=606,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 606, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=606)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

With more and more information available on the Internet, the task of making personalized recommendations to assist the user's navigation has become increasingly important. Considering there might be millions of users with different backgrounds accessing a Web site everyday, it is infeasible to build a separate recommendation system for each user. To address this problem, clustering techniques can first be employed to discover user groups. Then, user navigation patterns for each group can be discovered, to allow the adaptation of a Web site to the interest of each individual group. In this paper, we propose to model user access sequences as stochastic processes, and a mixture of Markov models based approach is taken to cluster users and to capture the sequential relationships inherent in user access histories. Several important issues that arise in constructing the Markov models are also addressed. The first issue lies in the complexity of the mixture of Markov models. To improve the efficiency of building/maintaining the mixture of Markov models, we develop a lightweight adapt-ive algorithm to update the model parameters without recomputing model parameters from scratch. The second issue concerns the proper selection of training data for building the mixture of Markov models. We investigate two different training data selection strategies and perform extensive experiments to compare their effectiveness on a real dataset that is generated by a Web-based knowledge management system, Livelink.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"

Source

Journal of the American Society for Information Science and Technology. 58(2007) no.12, S.1851-1870
Chen, C.-C.; Chen, A.-P.: Using data mining technology to provide a recommendation service in the digital library (2007) 0.01
```
0.012845755 = product of:
  0.02569151 = sum of:
    0.0140317045 = weight(_text_:information in 2533) [ClassicSimilarity], result of:
      0.0140317045 = score(doc=2533,freq=6.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.16796975 = fieldWeight in 2533, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2533)
    0.011659805 = product of:
      0.02331961 = sum of:
        0.02331961 = weight(_text_:technology in 2533) [ClassicSimilarity], result of:
          0.02331961 = score(doc=2533,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.16453418 = fieldWeight in 2533, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2533)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Abstract

Purpose - Since library storage has been increasing day by day, it is difficult for readers to find the books which interest them as well as representative booklists. How to utilize meaningful information effectively to improve the service quality of the digital library appears to be very important. The purpose of this paper is to provide a recommendation system architecture to promote digital library services in electronic libraries. Design/methodology/approach - In the proposed architecture, a two-phase data mining process used by association rule and clustering methods is designed to generate a recommendation system. The process considers not only the relationship of a cluster of users but also the associations among the information accessed. Findings - The process considered not only the relationship of a cluster of users but also the associations among the information accessed. With the advanced filter, the recommendation supported by the proposed system architecture would be closely served to meet users' needs. Originality/value - This paper not only constructs a recommendation service for readers to search books from the web but takes the initiative in finding the most suitable books for readers as well. Furthermore, library managers are expected to purchase core and hot books from a limited budget to maintain and satisfy the requirements of readers along with promoting digital library services.

Fenstermacher, K.D.; Ginsburg, M.: Client-side monitoring for Web mining (2003) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 1611) [ClassicSimilarity], result of:
      0.00972145 = score(doc=1611,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 1611, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1611)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 1611) [ClassicSimilarity], result of:
          0.027983533 = score(doc=1611,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 1611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=1611)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Journal of the American Society for Information Science and technology. 54(2003) no.7, S.625-637

Srinivasan, P.: Text mining : generating hypotheses from MEDLINE (2004) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 2225) [ClassicSimilarity], result of:
      0.00972145 = score(doc=2225,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 2225, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=2225)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 2225) [ClassicSimilarity], result of:
          0.027983533 = score(doc=2225,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 2225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=2225)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Journal of the American Society for Information Science and technology. 55(2004) no.5, S.396-413

Gluck , M.: Multimedia exploratory data analysis for geospatial data mining : the case for augmented seriation (2001) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 5214) [ClassicSimilarity], result of:
      0.00972145 = score(doc=5214,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 5214, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=5214)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 5214) [ClassicSimilarity], result of:
          0.027983533 = score(doc=5214,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 5214, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=5214)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Journal of the American Society for Information Science and technology. 52(2001) no.8, S.686-696

Whittle, M.; Eaglestone, B.; Ford, N.; Gillet, V.J.; Madden, A.: Data mining of search engine logs (2007) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 1330) [ClassicSimilarity], result of:
      0.00972145 = score(doc=1330,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 1330, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=1330)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 1330) [ClassicSimilarity], result of:
          0.027983533 = score(doc=1330,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 1330, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=1330)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Journal of the American Society for Information Science and Technology. 58(2007) no.14, S.2382-2400

Thelwall, M.; Wilkinson, D.; Uppal, S.: Data mining emotion in social network communication : gender differences in MySpace (2009) 0.01

0.011856608 = product of:
  0.023713216 = sum of:
    0.00972145 = weight(_text_:information in 3322) [ClassicSimilarity], result of:
      0.00972145 = score(doc=3322,freq=2.0), product of:
        0.083537094 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.047586527 = queryNorm
        0.116372846 = fieldWeight in 3322, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3322)
    0.013991767 = product of:
      0.027983533 = sum of:
        0.027983533 = weight(_text_:technology in 3322) [ClassicSimilarity], result of:
          0.027983533 = score(doc=3322,freq=2.0), product of:
            0.1417311 = queryWeight, product of:
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.047586527 = queryNorm
            0.19744103 = fieldWeight in 3322, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              2.978387 = idf(docFreq=6114, maxDocs=44218)
              0.046875 = fieldNorm(doc=3322)
      0.5 = coord(1/2)
  0.5 = coord(2/4)

Source: Journal of the American Society for Information Science and Technology. 61(2010) no.1, S.190-199

Search (46 results, page 1 of 3)

Authors

Types

Themes

Subjects

Classifications