Search (7 results, page 1 of 1)

Wang, P.; Berry, M.W.; Yang, Y.: Mining longitudinal Web queries : trends and patterns (2003) 0.00
```
0.0023678814 = product of:
  0.0047357627 = sum of:
    0.0047357627 = product of:
      0.009471525 = sum of:
        0.009471525 = weight(_text_:a in 6561) [ClassicSimilarity], result of:
          0.009471525 = score(doc=6561,freq=8.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.17835285 = fieldWeight in 6561, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.0546875 = fieldNorm(doc=6561)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

This project analyzed 541,920 user queries submitted to and executed in an academic Website during a four-year period (May 1997 to May 2001) using a relational database. The purpose of the study is three-fold: (1) to understand Web users' query behavior; (2) to identify problems encountered by these Web users; (3) to develop appropriate techniques for optimization of query analysis and mining. The linguistic analyses focus an query structures, lexicon, and word associations using statistical measures such as Zipf distribution and mutual information. A data model with finest granularity is used for data storage and iterative analyses. Patterns and trends of querying behavior are identified and compared with previous studies.

Type

a
Martin, D.I.; Berry, M.W.: Latent Semantic Indexing (2009) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 3834) [ClassicSimilarity], result of:
          0.009076704 = score(doc=3834,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 3834, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=3834)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Latent Semantic Indexing (LSI) is a proven successful indexing and retrieval method. This method is based on an automated, mathematical technique known as singular value decomposition (SVD). Given a large information database, LSI uses SVD to create a "semantic space" of the document collection where both terms and documents are represented. It does this by producing a reduced dimensional vector space in which the underlying or "latent" semantic structure in the pattern of word usage of the document collection emerges. Similarities between terms, terms and documents, or documents in the document collection are then based on semantic content not on individual terms. This ability to extract meaning of terms and documents has given LSI success in many different applications.

Type

a
Berry, M.W.; Dumais, S.T.; O'Brien, G.W.: Using linear algebra for intelligent information retrieval (1995) 0.00
```
0.002269176 = product of:
  0.004538352 = sum of:
    0.004538352 = product of:
      0.009076704 = sum of:
        0.009076704 = weight(_text_:a in 2206) [ClassicSimilarity], result of:
          0.009076704 = score(doc=2206,freq=10.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.1709182 = fieldWeight in 2206, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=2206)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Currently, most approaches to retrieving textual materials from scientific databases depend on a lexical match between words in users' requests and those in or assigned to documents in a database. Because of the tremendous diversity in the words people use to describe the same document, lexical methods are necessarily incomplete and imprecise. Using the singular value decomposition (SVD), one can take advantage of the implicit higher-order structure in the association of terms with documents by determining the SVD of large sparse term by document matrices. Terms and documents represented by 200-300 of the largest singular vectors are then matched against user queries. We call this retrieval method Latent Semantic Indexing (LSI) because the subspace represents important associative relationships between terms and documents that are not evident in individual documents. LSI is a completely automatic yet intelligent indexing method, widely applicable, and a promising way to improve users...

Type

a
Berry, M.W.; Esau, R.; Kiefer, B.: ¬The use of text mining techniques in electronic discovery for legal matters (2012) 0.00
```
0.001757696 = product of:
  0.003515392 = sum of:
    0.003515392 = product of:
      0.007030784 = sum of:
        0.007030784 = weight(_text_:a in 91) [ClassicSimilarity], result of:
          0.007030784 = score(doc=91,freq=6.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.13239266 = fieldWeight in 91, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=91)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

Electronic discovery (eDiscovery) is the process of collecting and analyzing electronic documents to determine their relevance to a legal matter. Office technology has advanced and eased the requirements necessary to create a document. As such, the volume of data has outgrown the manual processes previously used to make relevance judgments. Methods of text mining and information retrieval have been put to use in eDiscovery to help tame the volume of data; however, the results have been uneven. This chapter looks at the historical bias of the collection process. The authors examine how tools like classifiers, latent semantic analysis, and non-negative matrix factorization deal with nuances of the collection process.

Type

a

Courtois, M.P.; Berry, M.W.: Results ranking in Web search engines (1999) 0.00

0.0016913437 = product of:
  0.0033826875 = sum of:
    0.0033826875 = product of:
      0.006765375 = sum of:
        0.006765375 = weight(_text_:a in 3726) [ClassicSimilarity], result of:
          0.006765375 = score(doc=3726,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.12739488 = fieldWeight in 3726, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.078125 = fieldNorm(doc=3726)
      0.5 = coord(1/2)
  0.5 = coord(1/2)

Type: a

Britt, B.L.; Berry, M.W.; Browne, M.; Merrell, M.A.; Kolpack, J.: Document classification techniques for automated technology readiness level analysis (2008) 0.00
```
0.0014351527 = product of:
  0.0028703054 = sum of:
    0.0028703054 = product of:
      0.005740611 = sum of:
        0.005740611 = weight(_text_:a in 1384) [ClassicSimilarity], result of:
          0.005740611 = score(doc=1384,freq=4.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.10809815 = fieldWeight in 1384, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046875 = fieldNorm(doc=1384)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The overhead of assessing technology readiness for deployment and investment purposes can be costly to both large and small businesses. Recent advances in the automatic interpretation of technology readiness levels (TRLs) of a given technology can substantially reduce the risk and associated cost of bringing these new technologies to market. Using vector-space information-retrieval models, such as latent semantic indexing, it is feasible to group similar technology descriptions by exploiting the latent structure of term usage within textual documents. Once the documents have been semantically clustered (or grouped), they can be classified based on the TRL scores of (known) nearest-neighbor documents. Three automated (no human curation) strategies for assigning TRLs to documents are discussed with accuracies as high as 86% achieved for two-class problems.

Type

a
Berry, M.W.; Browne, M.: Understanding search engines : mathematical modeling and text retrieval (2005) 0.00
```
6.765375E-4 = product of:
  0.001353075 = sum of:
    0.001353075 = product of:
      0.00270615 = sum of:
        0.00270615 = weight(_text_:a in 7) [ClassicSimilarity], result of:
          0.00270615 = score(doc=7,freq=2.0), product of:
            0.053105544 = queryWeight, product of:
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.046056706 = queryNorm
            0.050957955 = fieldWeight in 7, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.153047 = idf(docFreq=37942, maxDocs=44218)
              0.03125 = fieldNorm(doc=7)
      0.5 = coord(1/2)
  0.5 = coord(1/2)
```
Abstract

The second edition of Understanding Search Engines: Mathematical Modeling and Text Retrieval follows the basic premise of the first edition by discussing many of the key design issues for building search engines and emphasizing the important role that applied mathematics can play in improving information retrieval. The authors discuss important data structures, algorithms, and software as well as user-centered issues such as interfaces, manual indexing, and document preparation. Significant changes bring the text up to date on current information retrieval methods: for example the addition of a new chapter on link-structure algorithms used in search engines such as Google. The chapter on user interface has been rewritten to specifically focus on search engine usability. In addition the authors have added new recommendations for further reading and expanded the bibliography, and have updated and streamlined the index to make it more reader friendly.

Search (7 results, page 1 of 1)

Authors

Years

Types

Themes

Subjects

Classifications