Search (55 results, page 1 of 3)

  • × theme_ss:"Automatisches Klassifizieren"
  1. Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.06
    0.05613823 = product of:
      0.11227646 = sum of:
        0.10035812 = weight(_text_:markup in 1665) [ClassicSimilarity], result of:
          0.10035812 = score(doc=1665,freq=2.0), product of:
            0.27638784 = queryWeight, product of:
              6.572923 = idf(docFreq=167, maxDocs=44218)
              0.042049456 = queryNorm
            0.36310613 = fieldWeight in 1665, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.572923 = idf(docFreq=167, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1665)
        0.011918336 = product of:
          0.03575501 = sum of:
            0.03575501 = weight(_text_:language in 1665) [ClassicSimilarity], result of:
              0.03575501 = score(doc=1665,freq=2.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.21673335 = fieldWeight in 1665, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1665)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.
  2. Kwon, O.W.; Lee, J.H.: Text categorization based on k-nearest neighbor approach for web site classification (2003) 0.05
    0.054969758 = product of:
      0.109939516 = sum of:
        0.10035812 = weight(_text_:markup in 1070) [ClassicSimilarity], result of:
          0.10035812 = score(doc=1070,freq=2.0), product of:
            0.27638784 = queryWeight, product of:
              6.572923 = idf(docFreq=167, maxDocs=44218)
              0.042049456 = queryNorm
            0.36310613 = fieldWeight in 1070, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              6.572923 = idf(docFreq=167, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1070)
        0.009581393 = product of:
          0.028744178 = sum of:
            0.028744178 = weight(_text_:29 in 1070) [ClassicSimilarity], result of:
              0.028744178 = score(doc=1070,freq=2.0), product of:
                0.14791684 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042049456 = queryNorm
                0.19432661 = fieldWeight in 1070, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1070)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Abstract
    Automatic categorization is a viable method to deal with the scaling problem on the World Wide Web. For Web site classification, this paper proposes the use of Web pages linked with the home page in a different manner from the sole use of home pages in previous research. To implement our proposed method, we derive a scheme for Web site classification based on the k-nearest neighbor (k-NN) approach. It consists of three phases: Web page selection (connectivity analysis), Web page classification, and Web site classification. Given a Web site, the Web page selection chooses several representative Web pages using connectivity analysis. The k-NN classifier next classifies each of the selected Web pages. Finally, the classified Web pages are extended to a classification of the entire Web site. To improve performance, we supplement the k-NN approach with a feature selection method and a term weighting scheme using markup tags, and also reform its document-document similarity measure. In our experiments on a Korean commercial Web directory, the proposed system, using both a home page and its linked pages, improved the performance of micro-averaging breakeven point by 30.02%, compared with an ordinary classification which uses a home page only.
    Date
    27.12.2007 17:32:29
  3. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.04
    0.039089963 = product of:
      0.078179926 = sum of:
        0.066785686 = product of:
          0.20035705 = sum of:
            0.20035705 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
              0.20035705 = score(doc=562,freq=2.0), product of:
                0.35649577 = queryWeight, product of:
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.042049456 = queryNorm
                0.56201804 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  8.478011 = idf(docFreq=24, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
        0.011394242 = product of:
          0.034182724 = sum of:
            0.034182724 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
              0.034182724 = score(doc=562,freq=2.0), product of:
                0.14725003 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042049456 = queryNorm
                0.23214069 = fieldWeight in 562, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=562)
          0.33333334 = coord(1/3)
      0.5 = coord(2/4)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
  4. Ibekwe-SanJuan, F.; SanJuan, E.: From term variants to research topics (2002) 0.01
    0.010749864 = product of:
      0.042999458 = sum of:
        0.042999458 = product of:
          0.064499184 = sum of:
            0.03575501 = weight(_text_:language in 1853) [ClassicSimilarity], result of:
              0.03575501 = score(doc=1853,freq=2.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.21673335 = fieldWeight in 1853, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1853)
            0.028744178 = weight(_text_:29 in 1853) [ClassicSimilarity], result of:
              0.028744178 = score(doc=1853,freq=2.0), product of:
                0.14791684 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042049456 = queryNorm
                0.19432661 = fieldWeight in 1853, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1853)
          0.6666667 = coord(2/3)
      0.25 = coord(1/4)
    
    Abstract
    In a scientific and technological watch (STW) task, an expert user needs to survey the evolution of research topics in his area of specialisation in order to detect interesting changes. The majority of methods proposing evaluation metrics (bibliometrics and scientometrics studies) for STW rely solely an statistical data analysis methods (Co-citation analysis, co-word analysis). Such methods usually work an structured databases where the units of analysis (words, keywords) are already attributed to documents by human indexers. The advent of huge amounts of unstructured textual data has rendered necessary the integration of natural language processing (NLP) techniques to first extract meaningful units from texts. We propose a method for STW which is NLP-oriented. The method not only analyses texts linguistically in order to extract terms from them, but also uses linguistic relations (syntactic variations) as the basis for clustering. Terms and variation relations are formalised as weighted di-graphs which the clustering algorithm, CPCL (Classification by Preferential Clustered Link) will seek to reduce in order to produces classes. These classes ideally represent the research topics present in the corpus. The results of the classification are subjected to validation by an expert in STW.
    Source
    Knowledge organization. 29(2002) nos.3/4, S.181-197
  5. Panyr, J.: STEINADLER: ein Verfahren zur automatischen Deskribierung und zur automatischen thematischen Klassifikation (1978) 0.01
    0.007665114 = product of:
      0.030660456 = sum of:
        0.030660456 = product of:
          0.09198137 = sum of:
            0.09198137 = weight(_text_:29 in 5169) [ClassicSimilarity], result of:
              0.09198137 = score(doc=5169,freq=2.0), product of:
                0.14791684 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042049456 = queryNorm
                0.6218451 = fieldWeight in 5169, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.125 = fieldNorm(doc=5169)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Nachrichten für Dokumentation. 29(1978), S.92-96
  6. Peng, F.; Huang, X.: Machine learning for Asian language text classification (2007) 0.01
    0.0066625527 = product of:
      0.02665021 = sum of:
        0.02665021 = product of:
          0.07995063 = sum of:
            0.07995063 = weight(_text_:language in 831) [ClassicSimilarity], result of:
              0.07995063 = score(doc=831,freq=10.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.48463053 = fieldWeight in 831, product of:
                  3.1622777 = tf(freq=10.0), with freq of:
                    10.0 = termFreq=10.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=831)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task. Design/methodology/approach - Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation-based approach was compared with the non-segmentation-based approach. Findings - There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy. Practical implications - Apply the findings to real web text classification is ongoing work. Originality/value - The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
  7. Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.01
    0.005697121 = product of:
      0.022788484 = sum of:
        0.022788484 = product of:
          0.06836545 = sum of:
            0.06836545 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
              0.06836545 = score(doc=1046,freq=2.0), product of:
                0.14725003 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042049456 = queryNorm
                0.46428138 = fieldWeight in 1046, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1046)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    5. 5.2003 14:17:22
  8. Malenica, M.; Smuc, T.; Snajder, J.; Basic, B.D.: Language morphology offset : text classification on a Croatian-English parallel corpus (2008) 0.01
    0.005056522 = product of:
      0.020226087 = sum of:
        0.020226087 = product of:
          0.06067826 = sum of:
            0.06067826 = weight(_text_:language in 2035) [ClassicSimilarity], result of:
              0.06067826 = score(doc=2035,freq=4.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.3678087 = fieldWeight in 2035, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2035)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    We investigate how, and to what extent, morphological complexity of the language influences text classification using support vector machines (SVM). The Croatian-English parallel corpus provides the basis for direct comparison of two languages of radically different morphological complexity. We quantified, compared, and statistically tested the effects of morphological normalisation on SVM classifier performance based on a series of parallel experiments on both languages, carried over a large scale of different feature subset sizes obtained by different feature selection methods, and applying different levels of morphological normalisation. We also quantified the trade-off between feature space size and performance for different levels of morphological normalisation, and compared the results for both languages. Our experiments have shown that the improvements in SVM classifier performance is statistically significant; they are greater for small and medium number of features, especially for Croatian, whereas for large number of features the improvements are rather small and may be negligible in practice for both languages.
  9. Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.01
    0.005056522 = product of:
      0.020226087 = sum of:
        0.020226087 = product of:
          0.06067826 = sum of:
            0.06067826 = weight(_text_:language in 1057) [ClassicSimilarity], result of:
              0.06067826 = score(doc=1057,freq=4.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.3678087 = fieldWeight in 1057, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1057)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.
  10. Teich, E.; Degaetano-Ortlieb, S.; Fankhauser, P.; Kermes, H.; Lapshinova-Koltunski, E.: ¬The linguistic construal of disciplinarity : a data-mining approach using register features (2016) 0.01
    0.005056522 = product of:
      0.020226087 = sum of:
        0.020226087 = product of:
          0.06067826 = sum of:
            0.06067826 = weight(_text_:language in 3015) [ClassicSimilarity], result of:
              0.06067826 = score(doc=3015,freq=4.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.3678087 = fieldWeight in 3015, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3015)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    We analyze the linguistic evolution of selected scientific disciplines over a 30-year time span (1970s to 2000s). Our focus is on four highly specialized disciplines at the boundaries of computer science that emerged during that time: computational linguistics, bioinformatics, digital construction, and microelectronics. Our analysis is driven by the question whether these disciplines develop a distinctive language use-both individually and collectively-over the given time period. The data set is the English Scientific Text Corpus (scitex), which includes texts from the 1970s/1980s and early 2000s. Our theoretical basis is register theory. In terms of methods, we combine corpus-based methods of feature extraction (various aggregated features [part-of-speech based], n-grams, lexico-grammatical patterns) and automatic text classification. The results of our research are directly relevant to the study of linguistic variation and languages for specific purposes (LSP) and have implications for various natural language processing (NLP) tasks, for example, authorship attribution, text mining, or training NLP tools.
  11. Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.00
    0.0047476008 = product of:
      0.018990403 = sum of:
        0.018990403 = product of:
          0.056971207 = sum of:
            0.056971207 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
              0.056971207 = score(doc=611,freq=2.0), product of:
                0.14725003 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042049456 = queryNorm
                0.38690117 = fieldWeight in 611, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=611)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    22. 8.2009 12:54:24
  12. HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00
    0.0047476008 = product of:
      0.018990403 = sum of:
        0.018990403 = product of:
          0.056971207 = sum of:
            0.056971207 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
              0.056971207 = score(doc=2748,freq=2.0), product of:
                0.14725003 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.042049456 = queryNorm
                0.38690117 = fieldWeight in 2748, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2748)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    1. 2.2016 18:25:22
  13. Meder, N.: Artificial intelligence as a tool of classification, or: the network of language games as cognitive paradigm (1985) 0.00
    0.0041714176 = product of:
      0.01668567 = sum of:
        0.01668567 = product of:
          0.05005701 = sum of:
            0.05005701 = weight(_text_:language in 7694) [ClassicSimilarity], result of:
              0.05005701 = score(doc=7694,freq=2.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.30342668 = fieldWeight in 7694, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=7694)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
  14. Sebastiani, F.: Classification of text, automatic (2006) 0.00
    0.0041714176 = product of:
      0.01668567 = sum of:
        0.01668567 = product of:
          0.05005701 = sum of:
            0.05005701 = weight(_text_:language in 5003) [ClassicSimilarity], result of:
              0.05005701 = score(doc=5003,freq=2.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.30342668 = fieldWeight in 5003, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5003)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Encyclopedia of language and linguistics. 2nd ed. Ed.: K. Brown. Vol. 14
  15. Savic, D.: Designing an expert system for classifying office documents (1994) 0.00
    0.003832557 = product of:
      0.015330228 = sum of:
        0.015330228 = product of:
          0.045990683 = sum of:
            0.045990683 = weight(_text_:29 in 2655) [ClassicSimilarity], result of:
              0.045990683 = score(doc=2655,freq=2.0), product of:
                0.14791684 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042049456 = queryNorm
                0.31092256 = fieldWeight in 2655, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2655)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Records management quarterly. 28(1994) no.3, S.20-29
  16. Cosh, K.J.; Burns, R.; Daniel, T.: Content clouds : classifying content in Web 2.0 (2008) 0.00
    0.0035755006 = product of:
      0.014302002 = sum of:
        0.014302002 = product of:
          0.042906005 = sum of:
            0.042906005 = weight(_text_:language in 2013) [ClassicSimilarity], result of:
              0.042906005 = score(doc=2013,freq=2.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.26008 = fieldWeight in 2013, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2013)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    Purpose - With increasing amounts of user generated content being produced electronically in the form of wikis, blogs, forums etc. the purpose of this paper is to investigate a new approach to classifying ad hoc content. Design/methodology/approach - The approach applies natural language processing (NLP) tools to automatically extract the content of some text, visualizing the results in a content cloud. Findings - Content clouds share the visual simplicity of a tag cloud, but display the details of an article at a different level of abstraction, providing a complimentary classification. Research limitations/implications - Provides the general approach to creating a content cloud. In the future, the process can be refined and enhanced by further evaluation of results. Further work is also required to better identify closely related articles. Practical implications - Being able to automatically classify the content generated by web users will enable others to find more appropriate content. Originality/value - The approach is original. Other researchers have produced a cloud, simply by using skiplists to filter unwanted words, this paper's approach improves this by applying appropriate NLP techniques.
  17. Barbu, E.: What kind of knowledge is in Wikipedia? : unsupervised extraction of properties for similar concepts (2014) 0.00
    0.0035755006 = product of:
      0.014302002 = sum of:
        0.014302002 = product of:
          0.042906005 = sum of:
            0.042906005 = weight(_text_:language in 1547) [ClassicSimilarity], result of:
              0.042906005 = score(doc=1547,freq=2.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.26008 = fieldWeight in 1547, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1547)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Abstract
    This article presents a novel method for extracting knowledge from Wikipedia and a classification schema for annotating the extracted knowledge. Unlike the majority of approaches in the literature, we use the raw Wikipedia text for knowledge acquisition. The main assumption made is that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The annotation of the extracted knowledge is done at two levels: ontological and logical. The extracted properties are evaluated in the traditional way, that is, by computing the precision of the extraction procedure and in a clustering task. The second method of evaluation is seldom used in the natural language processing community, but it is regularly employed in cognitive psychology.
  18. Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
    0.0033710147 = product of:
      0.013484059 = sum of:
        0.013484059 = product of:
          0.040452175 = sum of:
            0.040452175 = weight(_text_:language in 2596) [ClassicSimilarity], result of:
              0.040452175 = score(doc=2596,freq=4.0), product of:
                0.16497234 = queryWeight, product of:
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.042049456 = queryNorm
                0.2452058 = fieldWeight in 2596, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.9232929 = idf(docFreq=2376, maxDocs=44218)
                  0.03125 = fieldNorm(doc=2596)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Content
    Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
  19. Savic, D.: Automatic classification of office documents : review of available methods and techniques (1995) 0.00
    0.0033534872 = product of:
      0.013413949 = sum of:
        0.013413949 = product of:
          0.040241845 = sum of:
            0.040241845 = weight(_text_:29 in 2219) [ClassicSimilarity], result of:
              0.040241845 = score(doc=2219,freq=2.0), product of:
                0.14791684 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042049456 = queryNorm
                0.27205724 = fieldWeight in 2219, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=2219)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Source
    Records management quarterly. 29(1995) no.4, S.3-18
  20. Ruocco, A.S.; Frieder, O.: Clustering and classification of large document bases in a parallel environment (1997) 0.00
    0.0033534872 = product of:
      0.013413949 = sum of:
        0.013413949 = product of:
          0.040241845 = sum of:
            0.040241845 = weight(_text_:29 in 1661) [ClassicSimilarity], result of:
              0.040241845 = score(doc=1661,freq=2.0), product of:
                0.14791684 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.042049456 = queryNorm
                0.27205724 = fieldWeight in 1661, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1661)
          0.33333334 = coord(1/3)
      0.25 = coord(1/4)
    
    Date
    29. 7.1998 17:45:02

Languages

  • e 48
  • d 7

Types

  • a 49
  • el 6
  • x 1
  • More… Less…