Search (45 results, page 1 of 3)

Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01

0.0055932454 = product of:
  0.044745963 = sum of:
    0.044745963 = product of:
      0.06711894 = sum of:
        0.033711098 = weight(_text_:29 in 1270) [ClassicSimilarity], result of:
          0.033711098 = score(doc=1270,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.31092256 = fieldWeight in 1270, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0625 = fieldNorm(doc=1270)
        0.03340785 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
          0.03340785 = score(doc=1270,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.30952093 = fieldWeight in 1270, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0625 = fieldNorm(doc=1270)
      0.6666667 = coord(2/3)
  0.125 = coord(1/8)

Date: 5. 4.1996 15:29:15
Source: Information systems. 22(1997) nos.5/6, S.333-347

Qiu, X.Y.; Srinivasan, P.; Hu, Y.: Supervised learning models to predict firm performance with annual reports : an empirical study (2014) 0.01

0.0051744715 = product of:
  0.020697886 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 1205) [ClassicSimilarity], result of:
          0.03681033 = score(doc=1205,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 1205, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=1205)
      0.33333334 = coord(1/3)
    0.008427775 = product of:
      0.025283325 = sum of:
        0.025283325 = weight(_text_:29 in 1205) [ClassicSimilarity], result of:
          0.025283325 = score(doc=1205,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.23319192 = fieldWeight in 1205, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.046875 = fieldNorm(doc=1205)
      0.33333334 = coord(1/3)
  0.25 = coord(2/8)

Abstract: Text mining and machine learning methodologies have been applied toward knowledge discovery in several domains, such as biomedicine and business. Interestingly, in the business domain, the text mining and machine learning community has minimally explored company annual reports with their mandatory disclosures. In this study, we explore the question "How can annual reports be used to predict change in company performance from one year to the next?" from a text mining perspective. Our article contributes a systematic study of the potential of company mandatory disclosures using a computational viewpoint in the following aspects: (a) We characterize our research problem along distinct dimensions to gain a reasonably comprehensive understanding of the capacity of supervised learning methods in predicting change in company performance using annual reports, and (b) our findings from unbiased systematic experiments provide further evidence about the economic incentives faced by analysts in their stock recommendations and speculations on analysts having access to more information in producing earnings forecast.
Date: 29. 1.2014 16:46:40

Hofstede, A.H.M. ter; Proper, H.A.; Van der Weide, T.P.: Exploiting fact verbalisation in conceptual information modelling (1997) 0.00

0.00489409 = product of:
  0.03915272 = sum of:
    0.03915272 = product of:
      0.05872908 = sum of:
        0.029497212 = weight(_text_:29 in 2908) [ClassicSimilarity], result of:
          0.029497212 = score(doc=2908,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.27205724 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
        0.029231867 = weight(_text_:22 in 2908) [ClassicSimilarity], result of:
          0.029231867 = score(doc=2908,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.2708308 = fieldWeight in 2908, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2908)
      0.6666667 = coord(2/3)
  0.125 = coord(1/8)

Date: 5. 4.1996 15:29:15
Source: Information systems. 22(1997) nos.5/6, S.349-385

Ma, Z.; Sun, A.; Cong, G.: On predicting the popularity of newly emerging hashtags in Twitter (2013) 0.00
```
0.0043120594 = product of:
  0.017248238 = sum of:
    0.010225092 = product of:
      0.030675275 = sum of:
        0.030675275 = weight(_text_:problem in 967) [ClassicSimilarity], result of:
          0.030675275 = score(doc=967,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.23447686 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
    0.007023146 = product of:
      0.021069437 = sum of:
        0.021069437 = weight(_text_:29 in 967) [ClassicSimilarity], result of:
          0.021069437 = score(doc=967,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.19432661 = fieldWeight in 967, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.0390625 = fieldNorm(doc=967)
      0.33333334 = coord(1/3)
  0.25 = coord(2/8)
```
Abstract

Because of Twitter's popularity and the viral nature of information dissemination on Twitter, predicting which Twitter topics will become popular in the near future becomes a task of considerable economic importance. Many Twitter topics are annotated by hashtags. In this article, we propose methods to predict the popularity of new hashtags on Twitter by formulating the problem as a classification task. We use five standard classification models (i.e., Naïve bayes, k-nearest neighbors, decision trees, support vector machines, and logistic regression) for prediction. The main challenge is the identification of effective features for describing new hashtags. We extract 7 content features from a hashtag string and the collection of tweets containing the hashtag and 11 contextual features from the social graph formed by users who have adopted the hashtag. We conducted experiments on a Twitter data set consisting of 31 million tweets from 2 million Singapore-based users. The experimental results show that the standard classifiers using the extracted features significantly outperform the baseline methods that do not use these features. Among the five classifiers, the logistic regression model performs the best in terms of the Micro-F1 measure. We also observe that contextual features are more effective than content features.

Date

25. 6.2013 19:05:29

Budzik, J.; Hammond, K.J.; Birnbaum, L.: Information access in context (2001) 0.00

0.0024581011 = product of:
  0.01966481 = sum of:
    0.01966481 = product of:
      0.058994424 = sum of:
        0.058994424 = weight(_text_:29 in 3835) [ClassicSimilarity], result of:
          0.058994424 = score(doc=3835,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.5441145 = fieldWeight in 3835, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.109375 = fieldNorm(doc=3835)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 29. 3.2002 17:31:17

Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.00

0.002435989 = product of:
  0.019487912 = sum of:
    0.019487912 = product of:
      0.058463734 = sum of:
        0.058463734 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
          0.058463734 = score(doc=4577,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.5416616 = fieldWeight in 4577, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.109375 = fieldNorm(doc=4577)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 2. 4.2000 18:01:22

Wong, M.L.; Leung, K.S.; Cheng, J.C.Y.: Discovering knowledge from noisy databases using genetic programming (2000) 0.00
```
0.0021690698 = product of:
  0.017352559 = sum of:
    0.017352559 = product of:
      0.052057672 = sum of:
        0.052057672 = weight(_text_:problem in 4863) [ClassicSimilarity], result of:
          0.052057672 = score(doc=4863,freq=4.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.39792046 = fieldWeight in 4863, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=4863)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

In data mining, we emphasize the need for learning from huge, incomplete, and imperfect data sets. To handle noise in the problem domain, existing learning systems avoid overfitting the imperfect training examples by excluding insignificant patterns. The problem is that these systems use a limiting attribute-value language for representing the training examples and the induced knowledge. Moreover, some important patterns are ignored because they are statistically insignificant. In this article, we present a framework that combines genetic programming and inductive logic programming to induce knowledge represented in various knowledge representation formalisms from noisy databases (LOGENPRO). Moreover, the system is applied to one real-life medical database. The knowledge discovered provides insights to and allows better understanding of the medical domains

Witten, I.H.; Frank, E.: Data Mining : Praktische Werkzeuge und Techniken für das maschinelle Lernen (2000) 0.00

0.0021069439 = product of:
  0.01685555 = sum of:
    0.01685555 = product of:
      0.05056665 = sum of:
        0.05056665 = weight(_text_:29 in 6833) [ClassicSimilarity], result of:
          0.05056665 = score(doc=6833,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.46638384 = fieldWeight in 6833, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=6833)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 27. 1.1996 10:29:55

Keim, D.A.: Data Mining mit bloßem Auge (2002) 0.00

0.0021069439 = product of:
  0.01685555 = sum of:
    0.01685555 = product of:
      0.05056665 = sum of:
        0.05056665 = weight(_text_:29 in 1086) [ClassicSimilarity], result of:
          0.05056665 = score(doc=1086,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.46638384 = fieldWeight in 1086, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=1086)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 31.12.1996 19:29:41

Kruse, R.; Borgelt, C.: Suche im Datendschungel (2002) 0.00

0.0021069439 = product of:
  0.01685555 = sum of:
    0.01685555 = product of:
      0.05056665 = sum of:
        0.05056665 = weight(_text_:29 in 1087) [ClassicSimilarity], result of:
          0.05056665 = score(doc=1087,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.46638384 = fieldWeight in 1087, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=1087)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 31.12.1996 19:29:41

Wrobel, S.: Lern- und Entdeckungsverfahren (2002) 0.00

0.0021069439 = product of:
  0.01685555 = sum of:
    0.01685555 = product of:
      0.05056665 = sum of:
        0.05056665 = weight(_text_:29 in 1105) [ClassicSimilarity], result of:
          0.05056665 = score(doc=1105,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.46638384 = fieldWeight in 1105, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.09375 = fieldNorm(doc=1105)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 31.12.1996 19:29:41

KDD : techniques and applications (1998) 0.00

0.0020879905 = product of:
  0.016703924 = sum of:
    0.016703924 = product of:
      0.05011177 = sum of:
        0.05011177 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
          0.05011177 = score(doc=6783,freq=2.0), product of:
            0.10793405 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.030822188 = queryNorm
            0.46428138 = fieldWeight in 6783, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=6783)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Footnote: A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997

Fayyad, U.M.; Djorgovski, S.G.; Weir, N.: From digitized images to online catalogs : data ming a sky server (1996) 0.00

0.0020450184 = product of:
  0.016360147 = sum of:
    0.016360147 = product of:
      0.04908044 = sum of:
        0.04908044 = weight(_text_:problem in 6625) [ClassicSimilarity], result of:
          0.04908044 = score(doc=6625,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.375163 = fieldWeight in 6625, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0625 = fieldNorm(doc=6625)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Abstract: Offers a data mining approach based on machine learning classification methods to the problem of automated cataloguing of online databases of digital images resulting from sky surveys. The SKICAT system automates the reduction and analysis of 3 terabytes of images expected to contain about 2 billion sky objects. It offers a solution to problems associated with the analysis of large data sets in science

Kong, S.; Ye, F.; Feng, L.; Zhao, Z.: Towards the prediction problems of bursting hashtags on Twitter (2015) 0.00
```
0.001789391 = product of:
  0.014315128 = sum of:
    0.014315128 = product of:
      0.042945385 = sum of:
        0.042945385 = weight(_text_:problem in 2338) [ClassicSimilarity], result of:
          0.042945385 = score(doc=2338,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.3282676 = fieldWeight in 2338, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2338)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Hundreds of thousands of hashtags are generated every day on Twitter. Only a few will burst and become trending topics. In this article, we provide the definition of a bursting hashtag and conduct a systematic study of a series of challenging prediction problems that span the entire life cycles of bursting hashtags. Around the problem of "how to build a system to predict bursting hashtags," we explore different types of features and present machine learning solutions. On real data sets from Twitter, experiments are conducted to evaluate the effectiveness of the proposed solutions and the contributions of features.

Borgelt, C.; Kruse, R.: Unsicheres Wissen nutzen (2002) 0.00

0.0017557865 = product of:
  0.014046292 = sum of:
    0.014046292 = product of:
      0.042138875 = sum of:
        0.042138875 = weight(_text_:29 in 1104) [ClassicSimilarity], result of:
          0.042138875 = score(doc=1104,freq=2.0), product of:
            0.108422816 = queryWeight, product of:
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.030822188 = queryNorm
            0.38865322 = fieldWeight in 1104, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5176873 = idf(docFreq=3565, maxDocs=44218)
              0.078125 = fieldNorm(doc=1104)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)

Date: 31.12.1996 19:29:41

Deogun, J.S.: Feature selection and effective classifiers (1998) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 2911) [ClassicSimilarity], result of:
          0.03681033 = score(doc=2911,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 2911, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=2911)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Develops and analyzes 4 algorithms for feature selection in the context of rough set methodology. Develops the notion of accuracy of classification that can be used for upper or lower classification methods and defines the feature selection problem. Presents a discussion of upper classifiers and develops 4 features selection heuristics and discusses the family of stepwise backward selection algorithms. Analyzes the worst case time complexity in all algorithms presented. Discusses details of the experiments and results of using a family of stepwise backward selection learning data sets and a duodenal ulcer data set. Includes the experimental setup and results of comparison of lower classifiers and upper classiers on the duodenal ulcer data set. Discusses exteded decision tables
Chen, H.; Chau, M.: Web mining : machine learning for Web applications (2003) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 4242) [ClassicSimilarity], result of:
          0.03681033 = score(doc=4242,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 4242, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=4242)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

With more than two billion pages created by millions of Web page authors and organizations, the World Wide Web is a tremendously rich knowledge base. The knowledge comes not only from the content of the pages themselves, but also from the unique characteristics of the Web, such as its hyperlink structure and its diversity of content and languages. Analysis of these characteristics often reveals interesting patterns and new knowledge. Such knowledge can be used to improve users' efficiency and effectiveness in searching for information an the Web, and also for applications unrelated to the Web, such as support for decision making or business management. The Web's size and its unstructured and dynamic content, as well as its multilingual nature, make the extraction of useful knowledge a challenging research problem. Furthermore, the Web generates a large amount of data in other formats that contain valuable information. For example, Web server logs' information about user access patterns can be used for information personalization or improving Web page design.
Dang, X.H.; Ong. K.-L.: Knowledge discovery in data streams (2009) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 3829) [ClassicSimilarity], result of:
          0.03681033 = score(doc=3829,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 3829, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=3829)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Knowing what to do with the massive amount of data collected has always been an ongoing issue for many organizations. While data mining has been touted to be the solution, it has failed to deliver the impact despite its successes in many areas. One reason is that data mining algorithms were not designed for the real world, i.e., they usually assume a static view of the data and a stable execution environment where resourcesare abundant. The reality however is that data are constantly changing and the execution environment is dynamic. Hence, it becomes difficult for data mining to truly deliver timely and relevant results. Recently, the processing of stream data has received many attention. What is interesting is that the methodology to design stream-based algorithms may well be the solution to the above problem. In this entry, we discuss this issue and present an overview of recent works.
Chen, Y.-L.; Liu, Y.-H.; Ho, W.-L.: ¬A text mining approach to assist the general public in the retrieval of legal documents (2013) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 521) [ClassicSimilarity], result of:
          0.03681033 = score(doc=521,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 521, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=521)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

Applying text mining techniques to legal issues has been an emerging research topic in recent years. Although some previous studies focused on assisting professionals in the retrieval of related legal documents, they did not take into account the general public and their difficulty in describing legal problems in professional legal terms. Because this problem has not been addressed by previous research, this study aims to design a text-mining-based method that allows the general public to use everyday vocabulary to search for and retrieve criminal judgments. The experimental results indicate that our method can help the general public, who are not familiar with professional legal terms, to acquire relevant criminal judgments more accurately and effectively.
Sarnikar, S.; Zhang, Z.; Zhao, J.L.: Query-performance prediction for effective query routing in domain-specific repositories (2014) 0.00
```
0.0015337638 = product of:
  0.012270111 = sum of:
    0.012270111 = product of:
      0.03681033 = sum of:
        0.03681033 = weight(_text_:problem in 1326) [ClassicSimilarity], result of:
          0.03681033 = score(doc=1326,freq=2.0), product of:
            0.13082431 = queryWeight, product of:
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.030822188 = queryNorm
            0.28137225 = fieldWeight in 1326, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.244485 = idf(docFreq=1723, maxDocs=44218)
              0.046875 = fieldNorm(doc=1326)
      0.33333334 = coord(1/3)
  0.125 = coord(1/8)
```
Abstract

The effective use of corporate memory is becoming increasingly important because every aspect of e-business requires access to information repositories. Unfortunately, less-than-satisfying effectiveness in state-of-the-art information-retrieval techniques is well known, even for some of the best search engines such as Google. In this study, the authors resolve this retrieval ineffectiveness problem by developing a new framework for predicting query performance, which is the first step toward better retrieval effectiveness. Specifically, they examine the relationship between query performance and query context. A query context consists of the query itself, the document collection, and the interaction between the two. The authors first analyze the characteristics of query context and develop various features for predicting query performance. Then, they propose a context-sensitive model for predicting query performance based on the characteristics of the query and the document collection. Finally, they validate this model with respect to five real-world collections of documents and demonstrate its utility in routing queries to the correct repository with high accuracy.

Search (45 results, page 1 of 3)

Authors

Years

Languages

Types

Themes

Subjects

Classifications