Search (34 results, page 1 of 2)

Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.12

0.11799447 = product of:
  0.23598894 = sum of:
    0.055449657 = product of:
      0.16634896 = sum of:
        0.16634896 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
          0.16634896 = score(doc=562,freq=2.0), product of:
            0.2959851 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.03491209 = queryNorm
            0.56201804 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.33333334 = coord(1/3)
    0.16634896 = weight(_text_:2f in 562) [ClassicSimilarity], result of:
      0.16634896 = score(doc=562,freq=2.0), product of:
        0.2959851 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.03491209 = queryNorm
        0.56201804 = fieldWeight in 562, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=562)
    0.014190319 = product of:
      0.028380638 = sum of:
        0.028380638 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
          0.028380638 = score(doc=562,freq=2.0), product of:
            0.1222562 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03491209 = queryNorm
            0.23214069 = fieldWeight in 562, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=562)
      0.5 = coord(1/2)
  0.5 = coord(3/6)

Content: Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
Date: 8. 1.2013 10:22:32

Denoyer, L.; Gallinari, P.: Bayesian network model for semi-structured document classification (2004) 0.02

0.023941021 = product of:
  0.07182306 = sum of:
    0.03787318 = weight(_text_:searching in 995) [ClassicSimilarity], result of:
      0.03787318 = score(doc=995,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.26816747 = fieldWeight in 995, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.046875 = fieldNorm(doc=995)
    0.03394988 = product of:
      0.06789976 = sum of:
        0.06789976 = weight(_text_:etc in 995) [ClassicSimilarity], result of:
          0.06789976 = score(doc=995,freq=2.0), product of:
            0.18910104 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03491209 = queryNorm
            0.35906604 = fieldWeight in 995, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.046875 = fieldNorm(doc=995)
      0.5 = coord(1/2)
  0.33333334 = coord(2/6)

Abstract: Recently, a new community has started to emerge around the development of new information research methods for searching and analyzing semi-structured and XML like documents. The goal is to handle both content and structural information, and to deal with different types of information content (text, image, etc.). We consider here the task of structured document classification. We propose a generative model able to handle both structure and content which is based on Bayesian networks. We then show how to transform this generative model into a discriminant classifier using the method of Fisher kernel. The model is then extended for dealing with different types of content information (here text and images). The model was tested on three databases: the classical webKB corpus composed of HTML pages, the new INEX corpus which has become a reference in the field of ad-hoc retrieval for XML documents, and a multimedia corpus of Web pages.

Vizine-Goetz, D.: NetLab / OCLC collaboration seeks to improve Web searching (1999) 0.01

0.010520328 = product of:
  0.06312197 = sum of:
    0.06312197 = weight(_text_:searching in 4180) [ClassicSimilarity], result of:
      0.06312197 = score(doc=4180,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.44694576 = fieldWeight in 4180, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.078125 = fieldNorm(doc=4180)
  0.16666667 = coord(1/6)

Golub, K.; Lykke, M.: Automated classification of web pages in hierarchical browsing (2009) 0.01
```
0.009110872 = product of:
  0.05466523 = sum of:
    0.05466523 = weight(_text_:searching in 3614) [ClassicSimilarity], result of:
      0.05466523 = score(doc=3614,freq=6.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.38706642 = fieldWeight in 3614, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3614)
  0.16666667 = coord(1/6)
```
Abstract

Purpose - The purpose of this study is twofold: to investigate whether it is meaningful to use the Engineering Index (Ei) classification scheme for browsing, and then, if proven useful, to investigate the performance of an automated classification algorithm based on the Ei classification scheme. Design/methodology/approach - A user study was conducted in which users solved four controlled searching tasks. The users browsed the Ei classification scheme in order to examine the suitability of the classification systems for browsing. The classification algorithm was evaluated by the users who judged the correctness of the automatically assigned classes. Findings - The study showed that the Ei classification scheme is suited for browsing. Automatically assigned classes were on average partly correct, with some classes working better than others. Success of browsing showed to be correlated and dependent on classification correctness. Research limitations/implications - Further research should address problems of disparate evaluations of one and the same web page. Additional reasons behind browsing failures in the Ei classification scheme also need further investigation. Practical implications - Improvements for browsing were identified: describing class captions and/or listing their subclasses from start; allowing for searching for words from class captions with synonym search (easily provided for Ei since the classes are mapped to thesauri terms); when searching for class captions, returning the hierarchical tree expanded around the class in which caption the search term is found. The need for improvements of classification schemes was also indicated. Originality/value - A user-based evaluation of automated subject classification in the context of browsing has not been conducted before; hence the study also presents new findings concerning methodology.
Reiner, U.: DDC-based search in the data of the German National Bibliography (2008) 0.01
```
0.008926794 = product of:
  0.053560764 = sum of:
    0.053560764 = weight(_text_:searching in 2166) [ClassicSimilarity], result of:
      0.053560764 = score(doc=2166,freq=4.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.37924606 = fieldWeight in 2166, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.046875 = fieldNorm(doc=2166)
  0.16666667 = coord(1/6)
```
Abstract

In 2004, the German National Library began to classify title records of the German National Bibliography according to subject groups based on the divisions of the Dewey Decimal Classification (DDC). Since 2006, all titles of the main series of the German National Bibliography are classified in strict compliance with the DDC. On this basis, an enhanced DDC-based search can be realized - e.g., searching the data of the German National Bibliography for title records using number components of synthesized classification numbers or searching for DDC numbers using unclassified title records. This paper gives an account of the current research and development of the DDC-based search. The work is conducted in the VZG project Colibri that focuses on the automatic analysis of DDC-synthesized numbers and the automatic classification of bibliographic title records.
Choi, B.; Peng, X.: Dynamic and hierarchical classification of Web pages (2004) 0.01
```
0.008926794 = product of:
  0.053560764 = sum of:
    0.053560764 = weight(_text_:searching in 2555) [ClassicSimilarity], result of:
      0.053560764 = score(doc=2555,freq=4.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.37924606 = fieldWeight in 2555, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.046875 = fieldNorm(doc=2555)
  0.16666667 = coord(1/6)
```
Abstract

Automatic classification of Web pages is an effective way to organise the vast amount of information and to assist in retrieving relevant information from the Internet. Although many automatic classification systems have been proposed, most of them ignore the conflict between the fixed number of categories and the growing number of Web pages being added into the systems. They also require searching through all existing categories to make any classification. This article proposes a dynamic and hierarchical classification system that is capable of adding new categories as required, organising the Web pages into a tree structure, and classifying Web pages by searching through only one path of the tree. The proposed single-path search technique reduces the search complexity from (n) to (log(n)). Test results show that the system improves the accuracy of classification by 6 percent in comparison to related systems. The dynamic-category expansion technique also achieves satisfying results for adding new categories into the system as required.
Rose, J.R.; Gasteiger, J.: HORACE: an automatic system for the hierarchical classification of chemical reactions (1994) 0.01
```
0.0073642298 = product of:
  0.044185378 = sum of:
    0.044185378 = weight(_text_:searching in 7696) [ClassicSimilarity], result of:
      0.044185378 = score(doc=7696,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.31286204 = fieldWeight in 7696, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.0546875 = fieldNorm(doc=7696)
  0.16666667 = coord(1/6)
```
Abstract

Describes an automatic classification system for classifying chemical reactions. A detailed study of the classification of chemical reactions, based on topological and physicochemical features, is followed by an analysis of the hierarchical classification produced by the HORACE algorithm (Hierarchical Organization of Reactions through Attribute and Condition Eduction), which combines both approaches in a synergistic manner. The searching and updating of reaction hierarchies is demonstrated with the hierarchies produced for 2 data sets by the HORACE algorithm. Shows that reaction hierarchies provide an efficient access to reaction information and indicate the main reaction types for a given reaction scheme, define the scope of a reaction type, enable searchers to find unusual reactions, and can help in locating the reactions most relevant for a given problem
Koch, T.; Vizine-Goetz, D.: Automatic classification and content navigation support for Web services : DESIRE II cooperates with OCLC (1998) 0.01
```
0.0073642298 = product of:
  0.044185378 = sum of:
    0.044185378 = weight(_text_:searching in 1568) [ClassicSimilarity], result of:
      0.044185378 = score(doc=1568,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.31286204 = fieldWeight in 1568, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1568)
  0.16666667 = coord(1/6)
```
Abstract

Emerging standards in knowledge representation and organization are preparing the way for distributed vocabulary support in Internet search services. NetLab researchers are exploring several innovative solutions for searching and browsing in the subject-based Internet gateway, Electronic Engineering Library, Sweden (EELS). The implementation of the EELS service is described, specifically, the generation of the robot-gathered database 'All' engineering and the automated application of the Ei thesaurus and classification scheme. NetLab and OCLC researchers are collaborating to investigate advanced solutions to automated classification in the DESIRE II context. A plan for furthering the development of distributed vocabulary support in Internet search services is offered.
Sojka, P.; Lee, M.; Rehurek, R.; Hatlapatka, R.; Kucbel, M.; Bouche, T.; Goutorbe, C.; Anghelache, R.; Wojciechowski, K.: Toolset for entity and semantic associations : Final Release (2013) 0.01
```
0.0063121966 = product of:
  0.03787318 = sum of:
    0.03787318 = weight(_text_:searching in 1057) [ClassicSimilarity], result of:
      0.03787318 = score(doc=1057,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.26816747 = fieldWeight in 1057, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.046875 = fieldNorm(doc=1057)
  0.16666667 = coord(1/6)
```
Abstract

In this document we describe the final release of the toolset for entity and semantic associations, integrating two versions (language dependent and language independent) of Unsupervised Document Similarity implemented by MU (using gensim tool) and Citation Indexing, Resolution and Matching (UJF/CMD). We give a brief description of tools, the rationale behind decisions made, and provide elementary evaluation. Tools are integrated in the main project result, EuDML website, and they deliver the needed functionality for exploratory searching and browsing the collected documents. EuDML users and content providers thus benefit from millions of algorithmically generated similarity and citation links, developed using state of the art machine learning and matching methods.
Wu, M.; Liu, Y.-H.; Brownlee, R.; Zhang, X.: Evaluating utility and automatic classification of subject metadata from Research Data Australia (2021) 0.01
```
0.0063121966 = product of:
  0.03787318 = sum of:
    0.03787318 = weight(_text_:searching in 453) [ClassicSimilarity], result of:
      0.03787318 = score(doc=453,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.26816747 = fieldWeight in 453, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.046875 = fieldNorm(doc=453)
  0.16666667 = coord(1/6)
```
Abstract

In this paper, we present a case study of how well subject metadata (comprising headings from an international classification scheme) has been deployed in a national data catalogue, and how often data seekers use subject metadata when searching for data. Through an analysis of user search behaviour as recorded in search logs, we find evidence that users utilise the subject metadata for data discovery. Since approximately half of the records ingested by the catalogue did not include subject metadata at the time of harvest, we experimented with automatic subject classification approaches in order to enrich these records and to provide additional support for user search and data discovery. Our results show that automatic methods work well for well represented categories of subject metadata, and these categories tend to have features that can distinguish themselves from the other categories. Our findings raise implications for data catalogue providers; they should invest more effort to enhance the quality of data records by providing an adequate description of these records for under-represented subject categories.
Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.01
```
0.0056583136 = product of:
  0.03394988 = sum of:
    0.03394988 = product of:
      0.06789976 = sum of:
        0.06789976 = weight(_text_:etc in 316) [ClassicSimilarity], result of:
          0.06789976 = score(doc=316,freq=2.0), product of:
            0.18910104 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03491209 = queryNorm
            0.35906604 = fieldWeight in 316, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.046875 = fieldNorm(doc=316)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC) [10], within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR).
Cosh, K.J.; Burns, R.; Daniel, T.: Content clouds : classifying content in Web 2.0 (2008) 0.01
```
0.0056583136 = product of:
  0.03394988 = sum of:
    0.03394988 = product of:
      0.06789976 = sum of:
        0.06789976 = weight(_text_:etc in 2013) [ClassicSimilarity], result of:
          0.06789976 = score(doc=2013,freq=2.0), product of:
            0.18910104 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03491209 = queryNorm
            0.35906604 = fieldWeight in 2013, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.046875 = fieldNorm(doc=2013)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

Purpose - With increasing amounts of user generated content being produced electronically in the form of wikis, blogs, forums etc. the purpose of this paper is to investigate a new approach to classifying ad hoc content. Design/methodology/approach - The approach applies natural language processing (NLP) tools to automatically extract the content of some text, visualizing the results in a content cloud. Findings - Content clouds share the visual simplicity of a tag cloud, but display the details of an article at a different level of abstraction, providing a complimentary classification. Research limitations/implications - Provides the general approach to creating a content cloud. In the future, the process can be refined and enhanced by further evaluation of results. Further work is also required to better identify closely related articles. Practical implications - Being able to automatically classify the content generated by web users will enable others to find more appropriate content. Originality/value - The approach is original. Other researchers have produced a cloud, simply by using skiplists to filter unwanted words, this paper's approach improves this by applying appropriate NLP techniques.
Adams, K.C.: Word wranglers : Automatic classification tools transform enterprise documents from "bags of words" into knowledge resources (2003) 0.01
```
0.005260164 = product of:
  0.031560984 = sum of:
    0.031560984 = weight(_text_:searching in 1665) [ClassicSimilarity], result of:
      0.031560984 = score(doc=1665,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.22347288 = fieldWeight in 1665, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1665)
  0.16666667 = coord(1/6)
```
Abstract

Taxonomies are an important part of any knowledge management (KM) system, and automatic classification software is emerging as a "killer app" for consumer and enterprise portals. A number of companies such as Inxight Software , Mohomine, Metacode, and others claim to interpret the semantic content of any textual document and automatically classify text on the fly. The promise that software could automatically produce a Yahoo-style directory is a siren call not many IT managers are able to resist. KM needs have grown more complex due to the increasing amount of digital information, the declining effectiveness of keyword searching, and heterogeneous document formats in corporate databases. This environment requires innovative KM tools, and automatic classification technology is an example of this new kind of software. These products can be divided into three categories according to their underlying technology - rules-based, catalog-by-example, and statistical clustering. Evolving trends in this market include framing classification as a cyborg (computer- and human-based) activity and the increasing use of extensible markup language (XML) and support vector machine (SVM) technology. In this article, we'll survey the rapidly changing automatic classification software market and examine the features and capabilities of leading classification products.

Subramanian, S.; Shafer, K.E.: Clustering (2001) 0.00

0.0047301063 = product of:
  0.028380638 = sum of:
    0.028380638 = product of:
      0.056761276 = sum of:
        0.056761276 = weight(_text_:22 in 1046) [ClassicSimilarity], result of:
          0.056761276 = score(doc=1046,freq=2.0), product of:
            0.1222562 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03491209 = queryNorm
            0.46428138 = fieldWeight in 1046, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.09375 = fieldNorm(doc=1046)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Date: 5. 5.2003 14:17:22

Lim, C.S.; Lee, K.J.; Kim, G.C.: Multiple sets of features for automatic genre classification of web documents (2005) 0.00
```
0.004715261 = product of:
  0.028291566 = sum of:
    0.028291566 = product of:
      0.056583133 = sum of:
        0.056583133 = weight(_text_:etc in 1048) [ClassicSimilarity], result of:
          0.056583133 = score(doc=1048,freq=2.0), product of:
            0.18910104 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03491209 = queryNorm
            0.2992217 = fieldWeight in 1048, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1048)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

With the increase of information on the Web, it is difficult to find desired information quickly out of the documents retrieved by a search engine. One way to solve this problem is to classify web documents according to various criteria. Most document classification has been focused on a subject or a topic of a document. A genre or a style is another view of a document different from a subject or a topic. The genre is also a criterion to classify documents. In this paper, we suggest multiple sets of features to classify genres of web documents. The basic set of features, which have been proposed in the previous studies, is acquired from the textual properties of documents, such as the number of sentences, the number of a certain word, etc. However, web documents are different from textual documents in that they contain URL and HTML tags within the pages. We introduce new sets of features specific to web documents, which are extracted from URL and HTML tags. The present work is an attempt to evaluate the performance of the proposed sets of features, and to discuss their characteristics. Finally, we conclude which is an appropriate set of features in automatic genre classification of web documents.
Search Engines and Beyond : Developing efficient knowledge management systems, April 19-20 1999, Boston, Mass (1999) 0.00
```
0.004208131 = product of:
  0.025248786 = sum of:
    0.025248786 = weight(_text_:searching in 2596) [ClassicSimilarity], result of:
      0.025248786 = score(doc=2596,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.1787783 = fieldWeight in 2596, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03125 = fieldNorm(doc=2596)
  0.16666667 = coord(1/6)
```
Content

Ramana Rao (Inxight, Palo Alto, CA) 7 ± 2 Insights on achieving Effective Information Access Session One: Updates and a twelve month perspective Danny Sullivan (Search Engine Watch, US / England) Portalization and other search trends Carol Tenopir (University of Tennessee) Search realities faced by end users and professional searchers Session Two: Today's search engines and beyond Daniel Hoogterp (Retrieval Technologies, McLean, VA) Effective presentation and utilization of search techniques Rick Kenny (Fulcrum Technologies, Ontario, Canada) Beyond document clustering: The knowledge impact statement Gary Stock (Ingenius, Kalamazoo, MI) Automated change monitoring Gary Culliss (Direct Hit, Wellesley Hills, MA) User popularity ranked search engines Byron Dom (IBM, CA) Automatically finding the best pages on the World Wide Web (CLEVER) Peter Tomassi (LookSmart, San Francisco, CA) Adding human intellect to search technology Session Three: Panel discussion: Human v automated categorization and editing Ev Brenner (New York, NY)- Chairman James Callan (University of Massachusetts, MA) Marc Krellenstein (Northern Light Technology, Cambridge, MA) Dan Miller (Ask Jeeves, Berkeley, CA) Session Four: Updates and a twelve month perspective Steve Arnold (AIT, Harrods Creek, KY) Review: The leading edge in search and retrieval software Ellen Voorhees (NIST, Gaithersburg, MD) TREC update Session Five: Search engines now and beyond Intelligent Agents John Snyder (Muscat, Cambridge, England) Practical issues behind intelligent agents Text summarization Therese Firmin, (Dept of Defense, Ft George G. Meade, MD) The TIPSTER/SUMMAC evaluation of automatic text summarization systems Cross language searching Elizabeth Liddy (TextWise, Syracuse, NY) A conceptual interlingua approach to cross-language retrieval. Video search and retrieval Armon Amir (IBM, Almaden, CA) CueVideo: Modular system for automatic indexing and browsing of video/audio Speech recognition Michael Witbrock (Lycos, Waltham, MA) Retrieval of spoken documents Visualization James A. Wise (Integral Visuals, Richland, WA) Information visualization in the new millennium: Emerging science or passing fashion? Text mining David Evans (Claritech, Pittsburgh, PA) Text mining - towards decision support
Piros, A.: Automatic interpretation of complex UDC numbers : towards support for library systems (2015) 0.00
```
0.004208131 = product of:
  0.025248786 = sum of:
    0.025248786 = weight(_text_:searching in 2301) [ClassicSimilarity], result of:
      0.025248786 = score(doc=2301,freq=2.0), product of:
        0.14122958 = queryWeight, product of:
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03491209 = queryNorm
        0.1787783 = fieldWeight in 2301, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.0452914 = idf(docFreq=2103, maxDocs=44218)
          0.03125 = fieldNorm(doc=2301)
  0.16666667 = coord(1/6)
```
Abstract

Analytico-synthetic and faceted classifications, such as Universal Decimal Classification (UDC) express content of documents with complex, pre-combined classification codes. Without classification authority control that would help manage and access structured notations, the use of UDC codes in searching and browsing is limited. Existing UDC parsing solutions are usually created for a particular database system or a specific task and are not widely applicable. The approach described in this paper provides a solution by which the analysis and interpretation of UDC notations would be stored into an intermediate format (in this case, in XML) by automatic means without any data or information loss. Due to its richness, the output file can be converted into different formats, such as standard mark-up and data exchange formats or simple lists of the recommended entry points of a UDC number. The program can also be used to create authority records containing complex UDC numbers which can be comprehensively analysed in order to be retrieved effectively. The Java program, as well as the corresponding schema definition it employs, is under continuous development. The current version of the interpreter software is now available online for testing purposes at the following web site: http://interpreter-eto.rhcloud.com. The future plan is to implement conversion methods for standard formats and to create standard online interfaces in order to make it possible to use the features of software as a service. This would result in the algorithm being able to be employed both in existing and future library systems to analyse UDC numbers without any significant programming effort.

Reiner, U.: Automatische DDC-Klassifizierung von bibliografischen Titeldatensätzen (2009) 0.00

0.0039417557 = product of:
  0.023650533 = sum of:
    0.023650533 = product of:
      0.047301065 = sum of:
        0.047301065 = weight(_text_:22 in 611) [ClassicSimilarity], result of:
          0.047301065 = score(doc=611,freq=2.0), product of:
            0.1222562 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03491209 = queryNorm
            0.38690117 = fieldWeight in 611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=611)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Date: 22. 8.2009 12:54:24

HaCohen-Kerner, Y. et al.: Classification using various machine learning methods and combinations of key-phrases and visual features (2016) 0.00

0.0039417557 = product of:
  0.023650533 = sum of:
    0.023650533 = product of:
      0.047301065 = sum of:
        0.047301065 = weight(_text_:22 in 2748) [ClassicSimilarity], result of:
          0.047301065 = score(doc=2748,freq=2.0), product of:
            0.1222562 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03491209 = queryNorm
            0.38690117 = fieldWeight in 2748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.078125 = fieldNorm(doc=2748)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)

Date: 1. 2.2016 18:25:22

Dolin, R.; Agrawal, D.; El Abbadi, A.; Pearlman, J.: Using automated classification for summarizing and selecting heterogeneous information sources (1998) 0.00
```
0.0028291568 = product of:
  0.01697494 = sum of:
    0.01697494 = product of:
      0.03394988 = sum of:
        0.03394988 = weight(_text_:etc in 1253) [ClassicSimilarity], result of:
          0.03394988 = score(doc=1253,freq=2.0), product of:
            0.18910104 = queryWeight, product of:
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.03491209 = queryNorm
            0.17953302 = fieldWeight in 1253, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.4164915 = idf(docFreq=533, maxDocs=44218)
              0.0234375 = fieldNorm(doc=1253)
      0.5 = coord(1/2)
  0.16666667 = coord(1/6)
```
Abstract

Information retrieval over the Internet increasingly requires the filtering of thousands of heterogeneous information sources. Important sources of information include not only traditional databases with structured data and queries, but also increasing numbers of non-traditional, semi- or unstructured collections such as Web sites, FTP archives, etc. As the number and variability of sources increases, new ways of automatically summarizing, discovering, and selecting collections relevant to a user's query are needed. One such method involves the use of classification schemes, such as the Library of Congress Classification (LCC), within which a collection may be represented based on its content, irrespective of the structure of the actual data or documents. For such a system to be useful in a large-scale distributed environment, it must be easy to use for both collection managers and users. As a result, it must be possible to classify documents automatically within a classification scheme. Furthermore, there must be a straightforward and intuitive interface with which the user may use the scheme to assist in information retrieval (IR). Our work with the Alexandria Digital Library (ADL) Project focuses on geo-referenced information, whether text, maps, aerial photographs, or satellite images. As a result, we have emphasized techniques which work with both text and non-text, such as combined textual and graphical queries, multi-dimensional indexing, and IR methods which are not solely dependent on words or phrases. Part of this work involves locating relevant online sources of information. In particular, we have designed and are currently testing aspects of an architecture, Pharos, which we believe will scale up to 1.000.000 heterogeneous sources. Pharos accommodates heterogeneity in content and format, both among multiple sources as well as within a single source. That is, we consider sources to include Web sites, FTP archives, newsgroups, and full digital libraries; all of these systems can include a wide variety of content and multimedia data formats. Pharos is based on the use of hierarchical classification schemes. These include not only well-known 'subject' (or 'concept') based schemes such as the Dewey Decimal System and the LCC, but also, for example, geographic classifications, which might be constructed as layers of smaller and smaller hierarchical longitude/latitude boxes. Pharos is designed to work with sophisticated queries which utilize subjects, geographical locations, temporal specifications, and other types of information domains. The Pharos architecture requires that hierarchically structured collection metadata be extracted so that it can be partitioned in such a way as to greatly enhance scalability. Automated classification is important to Pharos because it allows information sources to extract the requisite collection metadata automatically that must be distributed.

Search (34 results, page 1 of 2)

Authors

Years

Languages

Types

Themes