Search (13 results, page 1 of 1)

Seki, K.; Mostafa, J.: Gene ontology annotation as text categorization : an empirical study (2008) 0.00
```
0.0039902087 = product of:
  0.02793146 = sum of:
    0.02793146 = weight(_text_:with in 2123) [ClassicSimilarity], result of:
      0.02793146 = score(doc=2123,freq=10.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.2976705 = fieldWeight in 2123, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2123)
  0.14285715 = coord(1/7)
```
Abstract

Gene ontology (GO) consists of three structured controlled vocabularies, i.e., GO domains, developed for describing attributes of gene products, and its annotation is crucial to provide a common gateway to access different model organism databases. This paper explores an effective application of text categorization methods to this highly practical problem in biology. As a first step, we attempt to tackle the automatic GO annotation task posed in the Text Retrieval Conference (TREC) 2004 Genomics Track. Given a pair of genes and an article reference where the genes appear, the task simulates assigning GO domain codes. We approach the problem with careful consideration of the specialized terminology and pay special attention to various forms of gene synonyms, so as to exhaustively locate the occurrences of the target gene. We extract the words around the spotted gene occurrences and used them to represent the gene for GO domain code annotation. We regard the task as a text categorization problem and adopt a variant of kNN with supervised term weighting schemes, making our method among the top-performing systems in the TREC official evaluation. Furthermore, we investigate different feature selection policies in conjunction with the treatment of terms associated with negative instances. Our experiments reveal that round-robin feature space allocation with eliminating negative terms substantially improves performance as GO terms become specific.
Mostafa, J.: Digital image representation and access (1994) 0.00
```
0.0035330812 = product of:
  0.024731567 = sum of:
    0.024731567 = weight(_text_:with in 1102) [ClassicSimilarity], result of:
      0.024731567 = score(doc=1102,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.2635687 = fieldWeight in 1102, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1102)
  0.14285715 = coord(1/7)
```
Abstract

State of the art review of techniques used to generate, store and retrieval digital images. Explains basic terms and concepts related to image representation and describes the differences between bilevel, greyscale, and colour images. Introduces additional image related data, specifically colour standards, correction values, resolution parameters and lookup tables. Illustrates the use of data compression techniques and various image data formats that have been used. Identifies 4 branches of imaging research related to dtaa indexing and modelling: verbal indexing; visual surrogates; image indexing; and data structures. Concludes with a discussion of the state of the art in networking technology with consideration of image distribution, local system requirements and data integrity
Mostafa, J.; Quiroga, L.M.; Palakal, M.: Filtering medical documents using automated and human classification methods (1998) 0.00
```
0.0030283553 = product of:
  0.021198487 = sum of:
    0.021198487 = weight(_text_:with in 2326) [ClassicSimilarity], result of:
      0.021198487 = score(doc=2326,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.22591603 = fieldWeight in 2326, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=2326)
  0.14285715 = coord(1/7)
```
Abstract

The goal of this research is to clarify the role of document classification in information filtering. An important function of classification, in managing computational complexity, is described and illustrated in the context of an existing filtering system. A parameter called classification homogeneity is presented for analyzing unsupervised automated classification by employing human classification as a control. 2 significant components of the automated classification approach, vocabulary discovery and classification scheme generation, are described in detail. Results of classification performance revealed considerable variability in the homogeneity of automatically produced classes. Based on the classification performance, different types of interest profiles were created. Subsequently, these profiles were used to perform filtering sessions. The filtering results showed that with increasing homogeneity, filtering performance improves, and, conversely, with decreasing homogeneity, filtering performance degrades
Mukhopadhyay, S.; Peng, S.; Raje, R.; Palakal, M.; Mostafa, J.: Multi-agent information classification using dynamic acquaintance lists (2003) 0.00
```
0.0030283553 = product of:
  0.021198487 = sum of:
    0.021198487 = weight(_text_:with in 1755) [ClassicSimilarity], result of:
      0.021198487 = score(doc=1755,freq=4.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.22591603 = fieldWeight in 1755, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=1755)
  0.14285715 = coord(1/7)
```
Abstract

There has been considerable interest in recent years in providing automated information services, such as information classification, by means of a society of collaborative agents. These agents augment each other's knowledge structures (e.g., the vocabularies) and assist each other in providing efficient information services to a human user. However, when the number of agents present in the society increases, exhaustive communication and collaboration among agents result in a [arge communication overhead and increased delays in response time. This paper introduces a method to achieve selective interaction with a relatively small number of potentially useful agents, based an simple agent modeling and acquaintance lists. The key idea presented here is that the acquaintance list of an agent, representing a small number of other agents to be collaborated with, is dynamically adjusted. The best acquaintances are automatically discovered using a learning algorithm, based an the past history of collaboration. Experimental results are presented to demonstrate that such dynamically learned acquaintance lists can lead to high quality of classification, while significantly reducing the delay in response time.
Lam, W.; Mostafa, J.: Modeling user interest shift using a Baysian approach (2001) 0.00
```
0.0024982654 = product of:
  0.017487857 = sum of:
    0.017487857 = weight(_text_:with in 2658) [ClassicSimilarity], result of:
      0.017487857 = score(doc=2658,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1863712 = fieldWeight in 2658, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0546875 = fieldNorm(doc=2658)
  0.14285715 = coord(1/7)
```
Abstract

We investigate the modeling of changes in user interest in information filtering systems. A new technique for tracking user interest shifts based on a Bayesian approach is developed. The interest tracker is integrated into a profile learning module of a filtering system. We present an analytical study to establish the rate of convergence for the profile learning with and without the user interest tracking component. We examine the relationship among degree of shift, cost of detection error, and time needed for detection. To study the effect of different patterns of interest shift on system performance we also conducted several filtering experiments. Generally, the findings show that the Bayesian approach is a feasible and effective technique for modeling user interest shift
Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.00
```
0.0021855272 = product of:
  0.01529869 = sum of:
    0.01529869 = weight(_text_:with in 1211) [ClassicSimilarity], result of:
      0.01529869 = score(doc=1211,freq=12.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.16304085 = fieldWeight in 1211, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1211)
  0.14285715 = coord(1/7)
```
Abstract

In this article we present a method for retrieving documents from a digital library through a visual interface based on automatically generated concepts. We used a vocabulary generation algorithm to generate a set of concepts for the digital library and a technique called the max-min distance technique to cluster them. Additionally, the concepts were visualized in a spring embedding graph layout to depict the semantic relationship among them. The resulting graph layout serves as an aid to users for retrieving documents. An online archive containing the contents of D-Lib Magazine from July 1995 to May 2002 was used to test the utility of an implemented retrieval and visualization system. We believe that the method developed and tested can be applied to many different domains to help users get a better understanding of online document collections and to minimize users' cognitive load during execution of search tasks. Over the past few years, the volume of information available through the World Wide Web has been expanding exponentially. Never has so much information been so readily available and shared among so many people. Unfortunately, the unstructured nature and huge volume of information accessible over networks have made it hard for users to sift through and find relevant information. To deal with this problem, information retrieval (IR) techniques have gained more intensive attention from both industrial and academic researchers. Numerous IR techniques have been developed to help deal with the information overload problem. These techniques concentrate on mathematical models and algorithms for retrieval. Popular IR models such as the Boolean model, the vector-space model, the probabilistic model and their variants are well established.
From the user's perspective, however, it is still difficult to use current information retrieval systems. Users frequently have problems expressing their information needs and translating those needs into queries. This is partly due to the fact that information needs cannot be expressed appropriately in systems terms. It is not unusual for users to input search terms that are different from the index terms information systems use. Various methods have been proposed to help users choose search terms and articulate queries. One widely used approach is to incorporate into the information system a thesaurus-like component that represents both the important concepts in a particular subject area and the semantic relationships among those concepts. Unfortunately, the development and use of thesauri is not without its own problems. The thesaurus employed in a specific information system has often been developed for a general subject area and needs significant enhancement to be tailored to the information system where it is to be used. This thesaurus development process, if done manually, is both time consuming and labor intensive. Usage of a thesaurus in searching is complex and may raise barriers for the user. For illustration purposes, let us consider two scenarios of thesaurus usage. In the first scenario the user inputs a search term and the thesaurus then displays a matching set of related terms. Without an overview of the thesaurus - and without the ability to see the matching terms in the context of other terms - it may be difficult to assess the quality of the related terms in order to select the correct term. In the second scenario the user browses the whole thesaurus, which is organized as in an alphabetically ordered list. The problem with this approach is that the list may be long, and neither does it show users the global semantic relationship among all the listed terms.
Nevertheless, because thesaurus use has shown to improve retrieval, for our method we integrate functions in the search interface that permit users to explore built-in search vocabularies to improve retrieval from digital libraries. Our method automatically generates the terms and their semantic relationships representing relevant topics covered in a digital library. We call these generated terms the "concepts", and the generated terms and their semantic relationships we call the "concept space". Additionally, we used a visualization technique to display the concept space and allow users to interact with this space. The automatically generated term set is considered to be more representative of subject area in a corpus than an "externally" imposed thesaurus, and our method has the potential of saving a significant amount of time and labor for those who have been manually creating thesauri as well. Information visualization is an emerging discipline and developed very quickly in the last decade. With growing volumes of documents and associated complexities, information visualization has become increasingly important. Researchers have found information visualization to be an effective way to use and understand information while minimizing a user's cognitive load. Our work was based on an algorithmic approach of concept discovery and association. Concepts are discovered using an algorithm based on an automated thesaurus generation procedure. Subsequently, similarities among terms are computed using the cosine measure, and the associations among terms are established using a method known as max-min distance clustering. The concept space is then visualized in a spring embedding graph, which roughly shows the semantic relationships among concepts in a 2-D visual representation. The semantic space of the visualization is used as a medium for users to retrieve the desired documents. In the remainder of this article, we present our algorithmic approach of concept generation and clustering, followed by description of the visualization technique and interactive interface. The paper ends with key conclusions and discussions on future work.
Mostafa, J.; Dillon, A.: Design and evaluation of a user interface supporting multiple image query models (1996) 0.00
```
0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 7432) [ClassicSimilarity], result of:
      0.014989593 = score(doc=7432,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 7432, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=7432)
  0.14285715 = coord(1/7)
```
Abstract

For effective access to images, the design of the database interface must be based on principles that match the actual querying needs of users. Analysis of this design problem reveals that the query language must support utilization of both visual and verbal clues. The ViewFinder interface, designed as a client to a database server, supports querying based on both types of clues. Presents details of ViewFinder design. Describes results of usability analysis performed on ViweFinder with a group of 18 users. High search success rates were achieved (greater than 80%) through both types of querying means (visual and verbal). Users generally used more verbal clues than visual clues in searches
Mostafa, J.: Document search interface design : background and introduction to special topic section (2004) 0.00
```
0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 2503) [ClassicSimilarity], result of:
      0.014989593 = score(doc=2503,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 2503, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=2503)
  0.14285715 = coord(1/7)
```
Abstract

A library user searching for high-quality and authoritative information today is confronted with thousands of resources that cover a wide variety of topics. The heterogeneity factor alone can be a major obstacle for the user to select appropriate resources to search. Depending an the information need, the user may have to navigate among resources that are in different formats (bibliographic versus full-text), are stored in different media (text versus images), have different levels of coverage (news versus scholarly reports), or are published in different languages. Beyond the heterogeneity factor, the user faces specific challenges related to the search experience itself. These factors and their impact an searching can be best described using a fourphase framework, namely: formulation, action, presentation, and refinement (Shneiderman, Byrd, & Croft, 1998). Certain key functions for document search interfaces are described below in the context of these four phases. Following the description, highlights from the contributed papers are discussed.
Mukhopadhyay, S.; Peng, S.; Raje, R.; Mostafa, J.; Palakal, M.: Distributed multi-agent information filtering : a comparative study (2005) 0.00
```
0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 3559) [ClassicSimilarity], result of:
      0.014989593 = score(doc=3559,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 3559, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=3559)
  0.14285715 = coord(1/7)
```
Abstract

Information filtering is a technique to identify, in large collections, information that is relevant according to some criteria (e.g., a user's personal interests, or a research project objective). As such, it is a key technology for providing efficient user services in any large-scale information infrastructure, e.g., digital libraries. To provide large-scale Information filtering services, both computational and knowledge management issues need to be addressed. A centralized (single-agent) approach to information filtering suffers from serious drawbacks in terms of speed, accuracy, and economic considerations, and becomes unrealistic even for medium-scale applications. In this article, we discuss two distributed (multiagent) information filtering approaches, that are distributed with respect to knowledge or functionality, to overcome the limitations of single-agent centralized information filtering. Large-scale experimental studies involving the weIl-known TREC data set are also presented to illustrate the advantages of distributed filtering as weIl as to compare the different distributed approaches.
Mongin, L.; Fu, Y.Y.; Mostafa, J.: Open Archives data Service prototype and automated subject indexing using D-Lib archive content as a testbed (2003) 0.00
```
0.0021413704 = product of:
  0.014989593 = sum of:
    0.014989593 = weight(_text_:with in 1167) [ClassicSimilarity], result of:
      0.014989593 = score(doc=1167,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.15974675 = fieldWeight in 1167, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.046875 = fieldNorm(doc=1167)
  0.14285715 = coord(1/7)
```
Abstract

The Indiana University School of Library and Information Science opened a new research laboratory in January 2003; The Indiana University School of Library and Information Science Information Processing Laboratory [IU IP Lab]. The purpose of the new laboratory is to facilitate collaboration between scientists in the department in the areas of information retrieval (IR) and information visualization (IV) research. The lab has several areas of focus. These include grid and cluster computing, and a standard Java-based software platform to support plug and play research datasets, a selection of standard IR modules and standard IV algorithms. Future development includes software to enable researchers to contribute datasets, IR algorithms, and visualization algorithms into the standard environment. We decided early on to use OAI-PMH as a resource discovery tool because it is consistent with our mission.
Quiroga, L.M.; Mostafa, J.: ¬An experiment in building profiles in information filtering : the role of context of user relevance feedback (2002) 0.00
```
0.0017844755 = product of:
  0.012491328 = sum of:
    0.012491328 = weight(_text_:with in 2579) [ClassicSimilarity], result of:
      0.012491328 = score(doc=2579,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1331223 = fieldWeight in 2579, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2579)
  0.14285715 = coord(1/7)
```
Abstract

An experiment was conducted to see how relevance feedback could be used to build and adjust profiles to improve the performance of filtering systems. Data was collected during the system interaction of 18 graduate students with SIFTER (Smart Information Filtering Technology for Electronic Resources), a filtering system that ranks incoming information based on users' profiles. The data set came from a collection of 6000 records concerning consumer health. In the first phase of the study, three different modes of profile acquisition were compared. The explicit mode allowed users to directly specify the profile; the implicit mode utilized relevance feedback to create and refine the profile; and the combined mode allowed users to initialize the profile and to continuously refine it using relevance feedback. Filtering performance, measured in terms of Normalized Precision, showed that the three approaches were significantly different ( [small alpha, Greek] =0.05 and p =0.012). The explicit mode of profile acquisition consistently produced superior results. Exclusive reliance on relevance feedback in the implicit mode resulted in inferior performance. The low performance obtained by the implicit acquisition mode motivated the second phase of the study, which aimed to clarify the role of context in relevance feedback judgments. An inductive content analysis of thinking aloud protocols showed dimensions that were highly situational, establishing the importance context plays in feedback relevance assessments. Results suggest the need for better representation of documents, profiles, and relevance feedback mechanisms that incorporate dimensions identified in this research.
Zhang, Y.; Wu, D.; Hagen, L.; Song, I.-Y.; Mostafa, J.; Oh, S.; Anderson, T.; Shah, C.; Bishop, B.W.; Hopfgartner, F.; Eckert, K.; Federer, L.; Saltz, J.S.: Data science curriculum in the iField (2023) 0.00
```
0.0017844755 = product of:
  0.012491328 = sum of:
    0.012491328 = weight(_text_:with in 964) [ClassicSimilarity], result of:
      0.012491328 = score(doc=964,freq=2.0), product of:
        0.09383348 = queryWeight, product of:
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.038938753 = queryNorm
        0.1331223 = fieldWeight in 964, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          2.409771 = idf(docFreq=10797, maxDocs=44218)
          0.0390625 = fieldNorm(doc=964)
  0.14285715 = coord(1/7)
```
Abstract

Many disciplines, including the broad Field of Information (iField), offer Data Science (DS) programs. There have been significant efforts exploring an individual discipline's identity and unique contributions to the broader DS education landscape. To advance DS education in the iField, the iSchool Data Science Curriculum Committee (iDSCC) was formed and charged with building and recommending a DS education framework for iSchools. This paper reports on the research process and findings of a series of studies to address important questions: What is the iField identity in the multidisciplinary DS education landscape? What is the status of DS education in iField schools? What knowledge and skills should be included in the core curriculum for iField DS education? What are the jobs available for DS graduates from the iField? What are the differences between graduate-level and undergraduate-level DS education? Answers to these questions will not only distinguish an iField approach to DS education but also define critical components of DS curriculum. The results will inform individual DS programs in the iField to develop curriculum to support undergraduate and graduate DS education in their local context.

Mostafa, J.: Bessere Suchmaschinen für das Web (2006) 0.00

7.536662E-4 = product of:
  0.0052756635 = sum of:
    0.0052756635 = product of:
      0.010551327 = sum of:
        0.010551327 = weight(_text_:22 in 4871) [ClassicSimilarity], result of:
          0.010551327 = score(doc=4871,freq=2.0), product of:
            0.13635688 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.038938753 = queryNorm
            0.07738023 = fieldWeight in 4871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.015625 = fieldNorm(doc=4871)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Date: 22. 1.2006 18:34:49

Search (13 results, page 1 of 1)

Authors

Years

Languages

Types

Themes