Search (2 results, page 1 of 1)

Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.03
```
0.025496587 = product of:
  0.10198635 = sum of:
    0.05617869 = weight(_text_:storage in 607) [ClassicSimilarity], result of:
      0.05617869 = score(doc=607,freq=2.0), product of:
        0.1866346 = queryWeight, product of:
          5.4488444 = idf(docFreq=516, maxDocs=44218)
          0.034252144 = queryNorm
        0.30100897 = fieldWeight in 607, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.4488444 = idf(docFreq=516, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
    0.045807663 = weight(_text_:retrieval in 607) [ClassicSimilarity], result of:
      0.045807663 = score(doc=607,freq=14.0), product of:
        0.10360982 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.034252144 = queryNorm
        0.442117 = fieldWeight in 607, product of:
          3.7416575 = tf(freq=14.0), with freq of:
            14.0 = termFreq=14.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=607)
  0.25 = coord(2/8)
```
Abstract

Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.

Footnote

Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
Lim, S.C.J.; Liu, Y.; Lee, W.B.: ¬A methodology for building a semantically annotated multi-faceted ontology for product family modelling (2011) 0.01
```
0.014698472 = product of:
  0.058793887 = sum of:
    0.044942953 = weight(_text_:storage in 1485) [ClassicSimilarity], result of:
      0.044942953 = score(doc=1485,freq=2.0), product of:
        0.1866346 = queryWeight, product of:
          5.4488444 = idf(docFreq=516, maxDocs=44218)
          0.034252144 = queryNorm
        0.24080718 = fieldWeight in 1485, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.4488444 = idf(docFreq=516, maxDocs=44218)
          0.03125 = fieldNorm(doc=1485)
    0.013850937 = weight(_text_:retrieval in 1485) [ClassicSimilarity], result of:
      0.013850937 = score(doc=1485,freq=2.0), product of:
        0.10360982 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.034252144 = queryNorm
        0.13368362 = fieldWeight in 1485, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.03125 = fieldNorm(doc=1485)
  0.25 = coord(2/8)
```
Abstract

Product family design is one of the prevailing approaches in realizing mass customization. With the increasing number of product offerings targeted at different market segments, the issue of information management in product family design, that is related to an efficient and effective storage, sharing and timely retrieval of design information, has become more complicated and challenging. Product family modelling schema reported in the literature generally stress the component aspects of a product family and its analysis, with a limited capability to model complex inter-relations between physical components and other required information in different semantic orientations, such as manufacturing, material and marketing wise. To tackle this problem, ontology-based representation has been identified as a promising solution to redesign product platforms especially in a semantically rich environment. However, ontology development in design engineering demands a great deal of time commitment and human effort to process complex information. When a large variety of products are available, particularly in the consumer market, a more efficient method for building a product family ontology with the incorporation of multi-faceted semantic information is therefore highly desirable. In this study, we propose a methodology for building a semantically annotated multi-faceted ontology for product family modelling that is able to automatically suggest semantically-related annotations based on the design and manufacturing repository. The six steps of building such ontology: formation of product family taxonomy; extraction of entities; faceted unit generation and concept identification; facet modelling and semantic annotation; formation of a semantically annotated multi-faceted product family ontology (MFPFO); and ontology validation and evaluation are discussed in detail. Using a family of laptop computers as an illustrative example, we demonstrate how our methodology can be deployed step by step to create a semantically annotated MFPFO. Finally, we briefly discuss future research issues as well as interesting applications that can be further pursued based on the MFPFO developed.

Search (2 results, page 1 of 1)

Authors

Years

Themes