Search (17 results, page 1 of 1)

Zhang, J.; An, L.; Tang, T.; Hong, Y.: Visual health subject directory analysis based on users' traversal activities (2009) 0.01
```
0.0067155454 = product of:
  0.047008816 = sum of:
    0.047008816 = weight(_text_:based in 3112) [ClassicSimilarity], result of:
      0.047008816 = score(doc=3112,freq=8.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.39947033 = fieldWeight in 3112, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.046875 = fieldNorm(doc=3112)
  0.14285715 = coord(1/7)
```
Abstract

Concerns about health issues cover a wide spectrum. Consumer health information, which has become more available on the Internet, plays an extremely important role in addressing these concerns. A subject directory as an information organization and browsing mechanism is widely used in consumer health-related Websites. In this study we employed the information visualization technique Self-Organizing Map (SOM) in combination with a new U-matrix algorithm to analyze health subject clusters through a Web transaction log. An experimental study was conducted to test the proposed methods. The findings show that the clusters identified from the same cells based on path-length-1 outperformed both the clusters from the adjacent cells based on path-length-1 and the clusters from the same cells based on path-length-2 in the visual SOM display. The U-matrix method successfully distinguished the irrelevant subjects situated in the adjacent cells with different colors in the SOM display. The findings of this study lead to a better understanding of the health-related subject relationship from the users' traversal perspective.
Zhang, J.; Wolfram, D.; Wang, P.; Hong, Y.; Gillis, R.: Visualization of health-subject analysis based on query term co-occurrences (2008) 0.01
```
0.00625684 = product of:
  0.04379788 = sum of:
    0.04379788 = weight(_text_:based in 2376) [ClassicSimilarity], result of:
      0.04379788 = score(doc=2376,freq=10.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.37218451 = fieldWeight in 2376, product of:
          3.1622777 = tf(freq=10.0), with freq of:
            10.0 = termFreq=10.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2376)
  0.14285715 = coord(1/7)
```
Abstract

A multidimensional-scaling approach is used to analyze frequently used medical-topic terms in queries submitted to a Web-based consumer health information system. Based on a year-long transaction log file, five medical focus keywords (stomach, hip, stroke, depression, and cholesterol) and their co-occurring query terms are analyzed. An overlap-coefficient similarity measure and a conversion measure are used to calculate the proximity of terms to one another based on their co-occurrences in queries. The impact of the dimensionality of the visual configuration, the cutoff point of term co-occurrence for inclusion in the analysis, and the Minkowski metric power k on the stress value are discussed. A visual clustering of groups of terms based on the proximity within each focus-keyword group is also conducted. Term distributions within each visual configuration are characterized and are compared with formal medical vocabulary. This investigation reveals that there are significant differences between consumer health query-term usage and more formal medical terminology used by medical professionals when describing the same medical subject. Future directions are discussed.
Wolfram, D.; Zhang, J.: ¬An investigation of the influence of indexing exhaustivity and term distributions on a document space (2002) 0.01
```
0.005596288 = product of:
  0.039174013 = sum of:
    0.039174013 = weight(_text_:based in 5238) [ClassicSimilarity], result of:
      0.039174013 = score(doc=5238,freq=8.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.33289194 = fieldWeight in 5238, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5238)
  0.14285715 = coord(1/7)
```
Abstract

Wolfram and Zhang are interested in the effect of different indexing exhaustivity, by which they mean the number of terms chosen, and of different index term distributions and different term weighting methods on the resulting document cluster organization. The Distance Angle Retrieval Environment, DARE, which provides a two dimensional display of retrieved documents was used to represent the document clusters based upon a document's distance from the searcher's main interest, and on the angle formed by the document, a point representing a minor interest, and the point representing the main interest. If the centroid and the origin of the document space are assigned as major and minor points the average distance between documents and the centroid can be measured providing an indication of cluster organization. in the form of a size normalized similarity measure. Using 500 records from NTIS and nine models created by intersecting low, observed, and high exhaustivity levels (based upon a negative binomial distribution) with shallow, observed, and steep term distributions (based upon a Zipf distribution) simulation runs were preformed using inverse document frequency, inter-document term frequency, and inverse document frequency based upon both inter and intra-document frequencies. Low exhaustivity and shallow distributions result in a more dense document space and less effective retrieval. High exhaustivity and steeper distributions result in a more diffuse space.
Zhang, J.; Zhai, S.; Liu, H.; Stevenson, J.A.: Social network analysis on a topic-based navigation guidance system in a public health portal (2016) 0.00
```
0.0048465277 = product of:
  0.033925693 = sum of:
    0.033925693 = weight(_text_:based in 2887) [ClassicSimilarity], result of:
      0.033925693 = score(doc=2887,freq=6.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.28829288 = fieldWeight in 2887, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2887)
  0.14285715 = coord(1/7)
```
Abstract

We investigated a topic-based navigation guidance system in the World Health Organization portal, compared the link connection network and the semantic connection network derived from the guidance system, analyzed the characteristics of the 2 networks from the perspective of the node centrality (in_closeness, out_closeness, betweenness, in_degree, and out_degree), and provided the suggestions to optimize and enhance the topic-based navigation guidance system. A mixed research method that combines the social network analysis method, clustering analysis method, and inferential analysis methods was used. The clustering analysis results of the link connection network were quite different from those of the semantic connection network. There were significant differences between the link connection network and the semantic network in terms of density and centrality. Inferential analysis results show that there were no strong correlations between the centrality of a node and its topic information characteristics. Suggestions for enhancing the navigation guidance system are discussed in detail. Future research directions, such as application of the same research method presented in this study to other similar public health portals, are also included.
Li, D.; Luo, Z.; Ding, Y.; Tang, J.; Sun, G.G.-Z.; Dai, X.; Du, J.; Zhang, J.; Kong, S.: User-level microblogging recommendation incorporating social influence (2017) 0.00
```
0.0048465277 = product of:
  0.033925693 = sum of:
    0.033925693 = weight(_text_:based in 3426) [ClassicSimilarity], result of:
      0.033925693 = score(doc=3426,freq=6.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.28829288 = fieldWeight in 3426, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3426)
  0.14285715 = coord(1/7)
```
Abstract

With the information overload of user-generated content in microblogging, users find it extremely challenging to browse and find valuable information in their first attempt. In this paper we propose a microblogging recommendation algorithm, TSI-MR (Topic-Level Social Influence-based Microblogging Recommendation), which can significantly improve users' microblogging experiences. The main innovation of this proposed algorithm is that we consider social influences and their indirect structural relationships, which are largely based on social status theory, from the topic level. The primary advantage of this approach is that it can build an accurate description of latent relationships between two users with weak connections, which can improve the performance of the model; furthermore, it can solve sparsity problems of training data to a certain extent. The realization of the model is mainly based on Factor Graph. We also applied a distributed strategy to further improve the efficiency of the model. Finally, we use data from Tencent Weibo, one of the most popular microblogging services in China, to evaluate our methods. The results show that incorporating social influence can improve microblogging performance considerably, and outperform the baseline methods.
Zhang, J.; Nguyen, T.: WebStar: a visualization model for hyperlink structures (2005) 0.00
```
0.0047486075 = product of:
  0.03324025 = sum of:
    0.03324025 = weight(_text_:based in 1056) [ClassicSimilarity], result of:
      0.03324025 = score(doc=1056,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.28246817 = fieldWeight in 1056, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.046875 = fieldNorm(doc=1056)
  0.14285715 = coord(1/7)
```
Abstract

The authors introduce an information visualization model, WebStar, for hyperlink-based information systems. Hyperlinks within a hyperlink-based document can be visualized in a two-dimensional visual space. All links are projected within a display sphere in the visual space. The relationship between a specified central document and its hyperlinked documents is visually presented in the visual space. In addition, users are able to define a group of subjects and to observe relevance between each subject and all hyperlinked documents via movement of that subject around the display sphere center. WebStar allows users to dynamically change an interest center during navigation. A retrieval mechanism is developed to control retrieved results in the visual space. Impact of movement of a subject on the visual document distribution is analyzed. An ambiguity problem caused by projection is discussed. Potential applications of this visualization model in information retrieval are included. Future research directions on the topic are addressed.
Wolfram, D.; Wang, P.; Zhang, J.: Identifying Web search session patterns using cluster analysis : a comparison of three search environments (2009) 0.00
```
0.0047486075 = product of:
  0.03324025 = sum of:
    0.03324025 = weight(_text_:based in 2796) [ClassicSimilarity], result of:
      0.03324025 = score(doc=2796,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.28246817 = fieldWeight in 2796, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.046875 = fieldNorm(doc=2796)
  0.14285715 = coord(1/7)
```
Abstract

Session characteristics taken from large transaction logs of three Web search environments (academic Web site, public search engine, consumer health information portal) were modeled using cluster analysis to determine if coherent session groups emerged for each environment and whether the types of session groups are similar across the three environments. The analysis revealed three distinct clusters of session behaviors common to each environment: hit and run sessions on focused topics, relatively brief sessions on popular topics, and sustained sessions using obscure terms with greater query modification. The findings also revealed shifts in session characteristics over time for one of the datasets, away from hit and run sessions toward more popular search topics. A better understanding of session characteristics can help system designers to develop more responsive systems to support search features that cater to identifiable groups of searchers based on their search behaviors. For example, the system may identify struggling searchers based on session behaviors that match those identified in the current study to provide context sensitive help.
Zhang, J.: ¬A representational analysis of relational information displays (1996) 0.00
```
0.00447703 = product of:
  0.03133921 = sum of:
    0.03133921 = weight(_text_:based in 6403) [ClassicSimilarity], result of:
      0.03133921 = score(doc=6403,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.26631355 = fieldWeight in 6403, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=6403)
  0.14285715 = coord(1/7)
```
Abstract

Analyses graphic and tabular displays under a common, unified form - relational information displays (RIDs) which are displays that represent relations between dimensions. A representational taxonomy is developed that classifies all RIDs and serves as a framework for systematic studies of RIDs. Develops a taxonomy of RIDs which can classifiy the majority of dimension based display tasks and analyzes the relation between representations of displays and structures of tasks in terms of a mapping principle
Zhang, J.; Korfhage, R.R.: ¬A distance and angle similarity measure method (1999) 0.00
```
0.00447703 = product of:
  0.03133921 = sum of:
    0.03133921 = weight(_text_:based in 3915) [ClassicSimilarity], result of:
      0.03133921 = score(doc=3915,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.26631355 = fieldWeight in 3915, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=3915)
  0.14285715 = coord(1/7)
```
Abstract

This article presents a distance and angle similarity measure. The integrated similarity measure takes the strenghts of both the distance and direction of measured documents into account. This article analyzes the features of the similarity measure by comparing it with the traditional distance-based similarity measure and the cosine measure, providing the iso-similarity contour, investigating the impacts of the parameters and variables on the new similarity measure. It also gives the further research issues on the topic

Zhang, J.; Korfhage, R.R.: DARE: Distance and Angle Retrieval Environment : A tale of the two measures (1999) 0.00

0.00447703 = product of:
  0.03133921 = sum of:
    0.03133921 = weight(_text_:based in 3916) [ClassicSimilarity], result of:
      0.03133921 = score(doc=3916,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.26631355 = fieldWeight in 3916, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0625 = fieldNorm(doc=3916)
  0.14285715 = coord(1/7)

Abstract: This article presents a visualization tool for information retrieval. Some retrieval evaluation models are interpreted in the two-dimensional space comprising direction and distance. The two different similarity measures-angle and distance-are displayed in the visual space. A new retrieval means based on the visual retrieval tool, the controlling bar, is developed for a search

Zhang, J.; Zhao, Y.: ¬A user term visualization analysis based on a social question and answer log (2013) 0.00
```
0.003957173 = product of:
  0.02770021 = sum of:
    0.02770021 = weight(_text_:based in 2715) [ClassicSimilarity], result of:
      0.02770021 = score(doc=2715,freq=4.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.23539014 = fieldWeight in 2715, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2715)
  0.14285715 = coord(1/7)
```
Abstract

The authors of this paper investigate terms of consumers' diabetes based on a log from the Yahoo!Answers social question and answers (Q&A) forum, ascertain characteristics and relationships among terms related to diabetes from the consumers' perspective, and reveal users' diabetes information seeking patterns. In this study, the log analysis method, data coding method, and visualization multiple-dimensional scaling analysis method were used for analysis. The visual analyses were conducted at two levels: terms analysis within a category and category analysis among the categories in the schema. The findings show that the average number of words per question was 128.63, the average number of sentences per question was 8.23, the average number of words per response was 254.83, and the average number of sentences per response was 16.01. There were 12 categories (Cause & Pathophysiology, Sign & Symptom, Diagnosis & Test, Organ & Body Part, Complication & Related Disease, Medication, Treatment, Education & Info Resource, Affect, Social & Culture, Lifestyle, and Nutrient) in the diabetes related schema which emerged from the data coding analysis. The analyses at the two levels show that terms and categories were clustered and patterns were revealed. Future research directions are also included.
Gao, J.; Zhang, J.: Clustered SVD strategies in latent semantic indexing (2005) 0.00
```
0.0039174017 = product of:
  0.02742181 = sum of:
    0.02742181 = weight(_text_:based in 1166) [ClassicSimilarity], result of:
      0.02742181 = score(doc=1166,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.23302436 = fieldWeight in 1166, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0546875 = fieldNorm(doc=1166)
  0.14285715 = coord(1/7)
```
Abstract

The text retrieval method using latent semantic indexing (LSI) technique with truncated singular value decomposition (SVD) has been intensively studied in recent years. The SVD reduces the noise contained in the original representation of the term-document matrix and improves the information retrieval accuracy. Recent studies indicate that SVD is mostly useful for small homogeneous data collections. For large inhomogeneous datasets, the performance of the SVD based text retrieval technique may deteriorate. We propose to partition a large inhomogeneous dataset into several smaller ones with clustered structure, on which we apply the truncated SVD. Our experimental results show that the clustered SVD strategies may enhance the retrieval accuracy and reduce the computing and storage costs.
Wolfram, D.; Zhang, J.: ¬The influence of indexing practices and weighting algorithms on document spaces (2008) 0.00
```
0.0033577727 = product of:
  0.023504408 = sum of:
    0.023504408 = weight(_text_:based in 1963) [ClassicSimilarity], result of:
      0.023504408 = score(doc=1963,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.19973516 = fieldWeight in 1963, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.046875 = fieldNorm(doc=1963)
  0.14285715 = coord(1/7)
```
Abstract

Index modeling and computer simulation techniques are used to examine the influence of indexing frequency distributions, indexing exhaustivity distributions, and three weighting methods on hypothetical document spaces in a vector-based information retrieval (IR) system. The way documents are indexed plays an important role in retrieval. The authors demonstrate the influence of different indexing characteristics on document space density (DSD) changes and document space discriminative capacity for IR. Document environments that contain a relatively higher percentage of infrequently occurring terms provide lower density outcomes than do environments where a higher percentage of frequently occurring terms exists. Different indexing exhaustivity levels, however, have little influence on the document space densities. A weighting algorithm that favors higher weights for infrequently occurring terms results in the lowest overall document space densities, which allows documents to be more readily differentiated from one another. This in turn can positively influence IR. The authors also discuss the influence on outcomes using two methods of normalization of term weights (i.e., means and ranges) for the different weighting methods.
Zhang, J.; Yu, Q.; Zheng, F.; Long, C.; Lu, Z.; Duan, Z.: Comparing keywords plus of WOS and author keywords : a case study of patient adherence research (2016) 0.00
```
0.0033577727 = product of:
  0.023504408 = sum of:
    0.023504408 = weight(_text_:based in 2857) [ClassicSimilarity], result of:
      0.023504408 = score(doc=2857,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.19973516 = fieldWeight in 2857, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.046875 = fieldNorm(doc=2857)
  0.14285715 = coord(1/7)
```
Abstract

Bibliometric analysis based on literature in the Web of Science (WOS) has become an increasingly popular method for visualizing the structure of scientific fields. Keywords Plus and Author Keywords are commonly selected as units of analysis, despite the limited research evidence demonstrating the effectiveness of Keywords Plus. This study was conceived to evaluate the efficacy of Keywords Plus as a parameter for capturing the content and scientific concepts presented in articles. Using scientific papers about patient adherence that were retrieved from WOS, a comparative assessment of Keywords Plus and Author Keywords was performed at the scientific field level and the document level, respectively. Our search yielded more Keywords Plus terms than Author Keywords, and the Keywords Plus terms were more broadly descriptive. Keywords Plus is as effective as Author Keywords in terms of bibliometric analysis investigating the knowledge structure of scientific fields, but it is less comprehensive in representing an article's content.
Zhang, J.; Wolfram, D.: Visualization of term discrimination analysis (2001) 0.00
```
0.002798144 = product of:
  0.019587006 = sum of:
    0.019587006 = weight(_text_:based in 5210) [ClassicSimilarity], result of:
      0.019587006 = score(doc=5210,freq=2.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.16644597 = fieldWeight in 5210, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5210)
  0.14285715 = coord(1/7)
```
Abstract

Zang and Wolfram compute the discrimination value for terms as the difference between the centroid value of all terms in the corpus and that value without the term in question, and suggest selection be made by comparing density changes with a visualization tool. The Distance Angle Retrieval Environment (DARE) visually projects a document or term space by presenting distance similarity on the X axis and angular similarity on the Y axis. Thus a document icon appearing close to the X axis would be relevant to reference points in terms of a distance similarity measure, while those close to the Y axis are relevant to reference points in terms of an angle based measure. Using 450 Associated Press news reports indexed by 44 distinct terms, the removal of the term ``Yeltsin'' causes the cluster to fall on the Y axis indicating a good discriminator. For an angular measure, cosine say, movement along the X axis to the left will signal good discrimination, as movement to the right will signal poor discrimination. A term density space could also be used. Most terms are shown to be indifferent discriminators. Different measures result in different choices as good and poor discriminators, as does the use of a term space rather than a document space. The visualization approach is clearly feasible, and provides some additional insights not found in the computation of a discrimination value.
Zhang, J.; Mostafa, J.; Tripathy, H.: Information retrieval by semantic analysis and visualization of the concept space of D-Lib® magazine (2002) 0.00
```
0.0024232639 = product of:
  0.016962847 = sum of:
    0.016962847 = weight(_text_:based in 1211) [ClassicSimilarity], result of:
      0.016962847 = score(doc=1211,freq=6.0), product of:
        0.11767787 = queryWeight, product of:
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.03905679 = queryNorm
        0.14414644 = fieldWeight in 1211, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.0129938 = idf(docFreq=5906, maxDocs=44218)
          0.01953125 = fieldNorm(doc=1211)
  0.14285715 = coord(1/7)
```
Abstract

In this article we present a method for retrieving documents from a digital library through a visual interface based on automatically generated concepts. We used a vocabulary generation algorithm to generate a set of concepts for the digital library and a technique called the max-min distance technique to cluster them. Additionally, the concepts were visualized in a spring embedding graph layout to depict the semantic relationship among them. The resulting graph layout serves as an aid to users for retrieving documents. An online archive containing the contents of D-Lib Magazine from July 1995 to May 2002 was used to test the utility of an implemented retrieval and visualization system. We believe that the method developed and tested can be applied to many different domains to help users get a better understanding of online document collections and to minimize users' cognitive load during execution of search tasks. Over the past few years, the volume of information available through the World Wide Web has been expanding exponentially. Never has so much information been so readily available and shared among so many people. Unfortunately, the unstructured nature and huge volume of information accessible over networks have made it hard for users to sift through and find relevant information. To deal with this problem, information retrieval (IR) techniques have gained more intensive attention from both industrial and academic researchers. Numerous IR techniques have been developed to help deal with the information overload problem. These techniques concentrate on mathematical models and algorithms for retrieval. Popular IR models such as the Boolean model, the vector-space model, the probabilistic model and their variants are well established.
Nevertheless, because thesaurus use has shown to improve retrieval, for our method we integrate functions in the search interface that permit users to explore built-in search vocabularies to improve retrieval from digital libraries. Our method automatically generates the terms and their semantic relationships representing relevant topics covered in a digital library. We call these generated terms the "concepts", and the generated terms and their semantic relationships we call the "concept space". Additionally, we used a visualization technique to display the concept space and allow users to interact with this space. The automatically generated term set is considered to be more representative of subject area in a corpus than an "externally" imposed thesaurus, and our method has the potential of saving a significant amount of time and labor for those who have been manually creating thesauri as well. Information visualization is an emerging discipline and developed very quickly in the last decade. With growing volumes of documents and associated complexities, information visualization has become increasingly important. Researchers have found information visualization to be an effective way to use and understand information while minimizing a user's cognitive load. Our work was based on an algorithmic approach of concept discovery and association. Concepts are discovered using an algorithm based on an automated thesaurus generation procedure. Subsequently, similarities among terms are computed using the cosine measure, and the associations among terms are established using a method known as max-min distance clustering. The concept space is then visualized in a spring embedding graph, which roughly shows the semantic relationships among concepts in a 2-D visual representation. The semantic space of the visualization is used as a medium for users to retrieve the desired documents. In the remainder of this article, we present our algorithmic approach of concept generation and clustering, followed by description of the visualization technique and interactive interface. The paper ends with key conclusions and discussions on future work.

Zhang, J.; Zeng, M.L.: ¬A new similarity measure for subject hierarchical structures (2014) 0.00

0.0018898771 = product of:
  0.013229139 = sum of:
    0.013229139 = product of:
      0.026458278 = sum of:
        0.026458278 = weight(_text_:22 in 1778) [ClassicSimilarity], result of:
          0.026458278 = score(doc=1778,freq=2.0), product of:
            0.13677022 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.03905679 = queryNorm
            0.19345059 = fieldWeight in 1778, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1778)
      0.5 = coord(1/2)
  0.14285715 = coord(1/7)

Date: 8. 4.2015 16:22:13

Search (17 results, page 1 of 1)

Authors

Years

Types

Themes