Search (60 results, page 1 of 3)

  • × theme_ss:"Retrievalalgorithmen"
  1. Witschel, H.F.: Global term weights in distributed environments (2008) 0.08
    0.077810265 = product of:
      0.116715394 = sum of:
        0.096151136 = weight(_text_:reference in 2096) [ClassicSimilarity], result of:
          0.096151136 = score(doc=2096,freq=6.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.4671295 = fieldWeight in 2096, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.046875 = fieldNorm(doc=2096)
        0.020564256 = product of:
          0.041128512 = sum of:
            0.041128512 = weight(_text_:22 in 2096) [ClassicSimilarity], result of:
              0.041128512 = score(doc=2096,freq=2.0), product of:
                0.17717063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050593734 = queryNorm
                0.23214069 = fieldWeight in 2096, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2096)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    This paper examines the estimation of global term weights (such as IDF) in information retrieval scenarios where a global view on the collection is not available. In particular, the two options of either sampling documents or of using a reference corpus independent of the target retrieval collection are compared using standard IR test collections. In addition, the possibility of pruning term lists based on frequency is evaluated. The results show that very good retrieval performance can be reached when just the most frequent terms of a collection - an "extended stop word list" - are known and all terms which are not in that list are treated equally. However, the list cannot always be fully estimated from a general-purpose reference corpus, but some "domain-specific stop words" need to be added. A good solution for achieving this is to mix estimates from small samples of the target retrieval collection with ones derived from a reference corpus.
    Date
    1. 8.2008 9:44:22
  2. Guerrero-Bote, V.P.; Moya Anegón, F. de; Herrero Solana, V.: Document organization using Kohonen's algorithm (2002) 0.07
    0.073703736 = product of:
      0.1105556 = sum of:
        0.074017175 = weight(_text_:reference in 2564) [ClassicSimilarity], result of:
          0.074017175 = score(doc=2564,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.35959643 = fieldWeight in 2564, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0625 = fieldNorm(doc=2564)
        0.036538422 = product of:
          0.073076844 = sum of:
            0.073076844 = weight(_text_:database in 2564) [ClassicSimilarity], result of:
              0.073076844 = score(doc=2564,freq=2.0), product of:
                0.20452234 = queryWeight, product of:
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.050593734 = queryNorm
                0.35730496 = fieldWeight in 2564, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.0625 = fieldNorm(doc=2564)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The classification of documents from a bibliographic database is a task that is linked to processes of information retrieval based on partial matching. A method is described of vectorizing reference documents from LISA which permits their topological organization using Kohonen's algorithm. As an example a map is generated of 202 documents from LISA, and an analysis is made of the possibilities of this type of neural network with respect to the development of information retrieval systems based on graphical browsing.
  3. Joss, M.W.; Wszola, S.: ¬The engines that can : text search and retrieval software, their strategies, and vendors (1996) 0.05
    0.05071809 = product of:
      0.07607713 = sum of:
        0.055512875 = weight(_text_:reference in 5123) [ClassicSimilarity], result of:
          0.055512875 = score(doc=5123,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.2696973 = fieldWeight in 5123, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.046875 = fieldNorm(doc=5123)
        0.020564256 = product of:
          0.041128512 = sum of:
            0.041128512 = weight(_text_:22 in 5123) [ClassicSimilarity], result of:
              0.041128512 = score(doc=5123,freq=2.0), product of:
                0.17717063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050593734 = queryNorm
                0.23214069 = fieldWeight in 5123, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5123)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Traces the development of text searching and retrieval software designed to cope with the increasing demands made by the storage and handling of large amounts of data, recorded on high data storage media, from CD-ROM to multi gigabyte storage media and online information services, with particular reference to the need to cope with graphics as well as conventional ASCII text. Includes details of: Boolean searching, fuzzy searching and matching; relevance ranking; proximity searching and improved strategies for dealing with text searching in very large databases. Concludes that the best searching tools for CD-ROM publishers are those optimized for searching and retrieval on CD-ROM. CD-ROM drives have relatively lower random seek times than hard discs and so the software most appropriate to the medium is that which can effectively arrange the indexes and text on the CD-ROM to avoid continuous random access searching. Lists and reviews a selection of software packages designed to achieve the sort of results required for rapid CD-ROM searching
    Date
    12. 9.1996 13:56:22
  4. Bauckhage, C.: Marginalizing over the PageRank damping factor (2014) 0.03
    0.03084049 = product of:
      0.09252147 = sum of:
        0.09252147 = weight(_text_:reference in 928) [ClassicSimilarity], result of:
          0.09252147 = score(doc=928,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.44949555 = fieldWeight in 928, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.078125 = fieldNorm(doc=928)
      0.33333334 = coord(1/3)
    
    Abstract
    In this note, we show how to marginalize over the damping parameter of the PageRank equation so as to obtain a parameter-free version known as TotalRank. Our discussion is meant as a reference and intended to provide a guided tour towards an interesting result that has applications in information retrieval and classification.
  5. Shiri, A.A.; Revie, C.: Query expansion behavior within a thesaurus-enhanced search environment : a user-centered evaluation (2006) 0.03
    0.02664893 = product of:
      0.079946786 = sum of:
        0.079946786 = sum of:
          0.045673028 = weight(_text_:database in 56) [ClassicSimilarity], result of:
            0.045673028 = score(doc=56,freq=2.0), product of:
              0.20452234 = queryWeight, product of:
                4.042444 = idf(docFreq=2109, maxDocs=44218)
                0.050593734 = queryNorm
              0.2233156 = fieldWeight in 56, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.042444 = idf(docFreq=2109, maxDocs=44218)
                0.0390625 = fieldNorm(doc=56)
          0.034273762 = weight(_text_:22 in 56) [ClassicSimilarity], result of:
            0.034273762 = score(doc=56,freq=2.0), product of:
              0.17717063 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.050593734 = queryNorm
              0.19345059 = fieldWeight in 56, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=56)
      0.33333334 = coord(1/3)
    
    Abstract
    The study reported here investigated the query expansion behavior of end-users interacting with a thesaurus-enhanced search system on the Web. Two groups, namely academic staff and postgraduate students, were recruited into this study. Data were collected from 90 searches performed by 30 users using the OVID interface to the CAB abstracts database. Data-gathering techniques included questionnaires, screen capturing software, and interviews. The results presented here relate to issues of search-topic and search-term characteristics, number and types of expanded queries, usefulness of thesaurus terms, and behavioral differences between academic staff and postgraduate students in their interaction. The key conclusions drawn were that (a) academic staff chose more narrow and synonymous terms than did postgraduate students, who generally selected broader and related terms; (b) topic complexity affected users' interaction with the thesaurus in that complex topics required more query expansion and search term selection; (c) users' prior topic-search experience appeared to have a significant effect on their selection and evaluation of thesaurus terms; (d) in 50% of the searches where additional terms were suggested from the thesaurus, users stated that they had not been aware of the terms at the beginning of the search; this observation was particularly noticeable in the case of postgraduate students.
    Date
    22. 7.2006 16:32:43
  6. Jones, G.; Robertson, A.M.; Willett, P.: ¬An introduction to genetic algorithms and to their use in information retrieval (1994) 0.02
    0.024672393 = product of:
      0.074017175 = sum of:
        0.074017175 = weight(_text_:reference in 7415) [ClassicSimilarity], result of:
          0.074017175 = score(doc=7415,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.35959643 = fieldWeight in 7415, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0625 = fieldNorm(doc=7415)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper provides an introduction to genetic algorithms, a new approach to the investigation of computationally-intensive problems that may be insoluble using conventional, deterministic approaches. A genetic algorithm takes an initial set of possible starting solutions and then iteratively improves theses solutions using operators that are analogous to those involved in Darwinian evolution. The approach is illusrated by reference to several problems in information retrieval
  7. Pfeifer, U.; Pennekamp, S.: Incremental processing of vague queries in interactive retrieval systems (1997) 0.02
    0.024672393 = product of:
      0.074017175 = sum of:
        0.074017175 = weight(_text_:reference in 735) [ClassicSimilarity], result of:
          0.074017175 = score(doc=735,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.35959643 = fieldWeight in 735, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0625 = fieldNorm(doc=735)
      0.33333334 = coord(1/3)
    
    Abstract
    The application of information retrieval techniques in interactive environments requires systems capable of effeciently processing vague queries. To reach reasonable response times, new data structures and algorithms have to be developed. In this paper we describe an approach taking advantage of the conditions of interactive usage and special access paths. To have a reference we investigate text queries and compared our algorithms to the well known 'Buckley/Lewit' algorithm. We achieved significant improvements for the response times
  8. Watters, C.; Amoudi, A.: Geosearcher : location-based ranking of search engine results (2003) 0.02
    0.02180752 = product of:
      0.06542256 = sum of:
        0.06542256 = weight(_text_:reference in 5152) [ClassicSimilarity], result of:
          0.06542256 = score(doc=5152,freq=4.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.31784135 = fieldWeight in 5152, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5152)
      0.33333334 = coord(1/3)
    
    Abstract
    Waters and Amoudi describe GeoSearcher, a prototype ranking program that arranges search engine results along a geo-spatial dimension without the provision of geo-spatial meta-tags or the use of geo-spatial feature extraction. GeoSearcher uses URL analysis, IptoLL, Whois, and the Getty Thesaurus of Geographic Names to determine site location. It accepts the first 200 sites returned by a search engine, identifies the coordinates, calculates their distance from a reference point and ranks in ascending order by this value. For any retrieved site the system checks if it has already been located in the current session, then sends the domain name to Whois to generate a return of a two letter country code and an area code. With no success the name is stripped one level and resent. If this fails the top level domain is tested for being a country code. Any remaining unmatched names go to IptoLL. Distance is calculated using the center point of the geographic area and a provided reference location. A test run on a set of 100 URLs from a search was successful in locating 90 sites. Eighty three pages could be manually found and 68 had sufficient information to verify location determination. Of these 65 ( 95%) had been assigned reasonably correct geographic locations. A random set of URLs used instead of a search result, yielded 80% success.
  9. Van der Veer Martens, B.; Fleet, C. van: Opening the black box of "relevance work" : a domain analysis (2012) 0.02
    0.018504292 = product of:
      0.055512875 = sum of:
        0.055512875 = weight(_text_:reference in 247) [ClassicSimilarity], result of:
          0.055512875 = score(doc=247,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.2696973 = fieldWeight in 247, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.046875 = fieldNorm(doc=247)
      0.33333334 = coord(1/3)
    
    Abstract
    In response to Hjørland's recent call for a reconceptualization of the foundations of relevance, we suggest that the sociocognitive aspects of intermediation by information agencies, such as archives and libraries, are a necessary and unexplored part of the infrastructure of the subject knowledge domains central to his recommended "view of relevance informed by a social paradigm" (2010, p. 217). From a comparative analysis of documents from 39 graduate-level introductory courses in archives, reference, and strategic/competitive intelligence taught in 13 American Library Association-accredited library and information science (LIS) programs, we identify four defining sociocognitive dimensions of "relevance work" in information agencies within Hjørland's proposed framework for relevance: tasks, time, systems, and assessors. This study is intended to supply sociocognitive content from within the relevance work domain to support further domain analytic research, and to emphasize the importance of intermediary relevance work for all subject knowledge domains.
  10. Voorhees, E.M.: Implementing agglomerative hierarchic clustering algorithms for use in document retrieval (1986) 0.02
    0.01827934 = product of:
      0.05483802 = sum of:
        0.05483802 = product of:
          0.10967604 = sum of:
            0.10967604 = weight(_text_:22 in 402) [ClassicSimilarity], result of:
              0.10967604 = score(doc=402,freq=2.0), product of:
                0.17717063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050593734 = queryNorm
                0.61904186 = fieldWeight in 402, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.125 = fieldNorm(doc=402)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information processing and management. 22(1986) no.6, S.465-476
  11. Ojala, M.: Commands that RANKle (1997) 0.02
    0.017224379 = product of:
      0.051673137 = sum of:
        0.051673137 = product of:
          0.10334627 = sum of:
            0.10334627 = weight(_text_:database in 428) [ClassicSimilarity], result of:
              0.10334627 = score(doc=428,freq=4.0), product of:
                0.20452234 = queryWeight, product of:
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.050593734 = queryNorm
                0.5053055 = fieldWeight in 428, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.0625 = fieldNorm(doc=428)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Examines the RANK command on DIALOG using a statistical analysis of articles in DATABASE as an example. The RANK command was used to find authors, company names, and length of articles. Use of the command revealed a number of complexities and revealed some problematic indexing on the part of the database producers. The LEXIS-NEXIS RANK command was also used, but this fulfils a different function to the command of the same name in DIALOG
  12. Smeaton, A.F.; Rijsbergen, C.J. van: ¬The retrieval effects of query expansion on a feedback document retrieval system (1983) 0.02
    0.015994422 = product of:
      0.047983266 = sum of:
        0.047983266 = product of:
          0.09596653 = sum of:
            0.09596653 = weight(_text_:22 in 2134) [ClassicSimilarity], result of:
              0.09596653 = score(doc=2134,freq=2.0), product of:
                0.17717063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050593734 = queryNorm
                0.5416616 = fieldWeight in 2134, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=2134)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    30. 3.2001 13:32:22
  13. Back, J.: ¬An evaluation of relevancy ranking techniques used by Internet search engines (2000) 0.02
    0.015994422 = product of:
      0.047983266 = sum of:
        0.047983266 = product of:
          0.09596653 = sum of:
            0.09596653 = weight(_text_:22 in 3445) [ClassicSimilarity], result of:
              0.09596653 = score(doc=3445,freq=2.0), product of:
                0.17717063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050593734 = queryNorm
                0.5416616 = fieldWeight in 3445, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3445)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    25. 8.2005 17:42:22
  14. Jacso, P.: Testing the calculation of a realistic h-index in Google Scholar, Scopus, and Web of Science for F. W. Lancaster (2008) 0.02
    0.015420245 = product of:
      0.046260733 = sum of:
        0.046260733 = weight(_text_:reference in 5586) [ClassicSimilarity], result of:
          0.046260733 = score(doc=5586,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.22474778 = fieldWeight in 5586, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5586)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper focuses on the practical limitations in the content and software of the databases that are used to calculate the h-index for assessing the publishing productivity and impact of researchers. To celebrate F. W. Lancaster's biological age of seventy-five, and "scientific age" of forty-five, this paper discusses the related features of Google Scholar, Scopus, and Web of Science (WoS), and demonstrates in the latter how a much more realistic and fair h-index can be computed for F. W. Lancaster than the one produced automatically. Browsing and searching the cited reference index of the 1945-2007 edition of WoS, which in my estimate has over a hundred million "orphan references" that have no counterpart master records to be attached to, and "stray references" that cite papers which do have master records but cannot be identified by the matching algorithm because of errors of omission and commission in the references of the citing works, can bring up hundreds of additional cited references given to works of an accomplished author but are ignored in the automatic process of calculating the h-index. The partially manual process doubled the h-index value for F. W. Lancaster from 13 to 26, which is a much more realistic value for an information scientist and professor of his stature.
  15. Wollf, J.G.: ¬A scalable technique for best-match retrieval of sequential information using metrics-guided search (1994) 0.02
    0.0150713315 = product of:
      0.045213994 = sum of:
        0.045213994 = product of:
          0.09042799 = sum of:
            0.09042799 = weight(_text_:database in 5334) [ClassicSimilarity], result of:
              0.09042799 = score(doc=5334,freq=4.0), product of:
                0.20452234 = queryWeight, product of:
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.050593734 = queryNorm
                0.44214234 = fieldWeight in 5334, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5334)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Describes a new technique for retrieving information by finding the best match or matches between a textual query and a textual database. The technique uses principles of beam search with a measure of probability to guide the search and prune the search tree. Unlike many methods for comparing strings, the method gives a set of alternative matches, graded by the quality of the matching. The new technique is embodies in a software simulation SP21 which runs on a conventional computer. Presnts examples showing best-match retrieval of information from a textual database. Presents analytic and emprirical evidence on the performance of the technique. It lends itself well to parallel processing. Discusses planned developments
  16. Fuhr, N.: Ranking-Experimente mit gewichteter Indexierung (1986) 0.01
    0.013709504 = product of:
      0.041128512 = sum of:
        0.041128512 = product of:
          0.082257025 = sum of:
            0.082257025 = weight(_text_:22 in 58) [ClassicSimilarity], result of:
              0.082257025 = score(doc=58,freq=2.0), product of:
                0.17717063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050593734 = queryNorm
                0.46428138 = fieldWeight in 58, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=58)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    14. 6.2015 22:12:44
  17. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.01
    0.013709504 = product of:
      0.041128512 = sum of:
        0.041128512 = product of:
          0.082257025 = sum of:
            0.082257025 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.082257025 = score(doc=2051,freq=2.0), product of:
                0.17717063 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.050593734 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    14. 6.2015 22:12:56
  18. Calegari, S.; Sanchez, E.: Object-fuzzy concept network : an enrichment of ontologies in semantic information retrieval (2008) 0.01
    0.0131846685 = product of:
      0.039554004 = sum of:
        0.039554004 = product of:
          0.07910801 = sum of:
            0.07910801 = weight(_text_:database in 2393) [ClassicSimilarity], result of:
              0.07910801 = score(doc=2393,freq=6.0), product of:
                0.20452234 = queryWeight, product of:
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.050593734 = queryNorm
                0.38679397 = fieldWeight in 2393, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2393)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This article shows how a fuzzy ontology-based approach can improve semantic documents retrieval. After formally defining a fuzzy ontology and a fuzzy knowledge base, a special type of new fuzzy relationship called (semantic) correlation, which links the concepts or entities in a fuzzy ontology, is discussed. These correlations, first assigned by experts, are updated after querying or when a document has been inserted into a database. Moreover, in order to define a dynamic knowledge of a domain adapting itself to the context, it is shown how to handle a tradeoff between the correct definition of an object, taken in the ontology structure, and the actual meaning assigned by individuals. The notion of a fuzzy concept network is extended, incorporating database objects so that entities and documents can similarly be represented in the network. Information retrieval (IR) algorithm, using an object-fuzzy concept network (O-FCN), is introduced and described. This algorithm allows us to derive a unique path among the entities involved in the query to obtain maxima semantic associations in the knowledge domain. Finally, the study has been validated by querying a database using fuzzy recall, fuzzy precision, and coefficient variant measures in the crisp and fuzzy cases.
  19. Aigrain, P.; Longueville, V.: ¬A model for the evaluation of expansion techniques in information retrieval systems (1994) 0.01
    0.012918284 = product of:
      0.03875485 = sum of:
        0.03875485 = product of:
          0.0775097 = sum of:
            0.0775097 = weight(_text_:database in 5331) [ClassicSimilarity], result of:
              0.0775097 = score(doc=5331,freq=4.0), product of:
                0.20452234 = queryWeight, product of:
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.050593734 = queryNorm
                0.37897915 = fieldWeight in 5331, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.042444 = idf(docFreq=2109, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5331)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    We describe an evaluation model for expansion systems in information retrieval, that is, systems expanding a user selection of documents in order to provide the user with a larger set of documents sharing the same or related chracteristics. Our model leads to a test protocal and practical estimates of the efficieny of an expansion system provided that it is possible for a sample of users to exhaustively scan the content of a subset of the database in order to decide which documents would have been selected by an 'ideal' expansion system. This condition is met only by databases whose unit contents can be quickly apprehended, such as still image databases or synthetic bibliographical references. We compare our model with other types of possible indicators, and discuss the precision to which our measure can be estimated, using data from experimentation with an image database system developed by our research team
  20. Henzinger, M.R.: Link analysis in Web information retrieval (2000) 0.01
    0.012336196 = product of:
      0.037008587 = sum of:
        0.037008587 = weight(_text_:reference in 801) [ClassicSimilarity], result of:
          0.037008587 = score(doc=801,freq=2.0), product of:
            0.205834 = queryWeight, product of:
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.050593734 = queryNorm
            0.17979822 = fieldWeight in 801, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.0683694 = idf(docFreq=2055, maxDocs=44218)
              0.03125 = fieldNorm(doc=801)
      0.33333334 = coord(1/3)
    
    Content
    The goal of information retrieval is to find all documents relevant for a user query in a collection of documents. Decades of research in information retrieval were successful in developing and refining techniques that are solely word-based (see e.g., [2]). With the advent of the web new sources of information became available, one of them being the hyperlinks between documents and records of user behavior. To be precise, hypertexts (i.e., collections of documents connected by hyperlinks) have existed and have been studied for a long time. What was new was the large number of hyperlinks created by independent individuals. Hyperlinks provide a valuable source of information for web information retrieval as we will show in this article. This area of information retrieval is commonly called link analysis. Why would one expect hyperlinks to be useful? Ahyperlink is a reference of a web page B that is contained in a web page A. When the hyperlink is clicked on in a web browser, the browser displays page B. This functionality alone is not helpful for web information retrieval. However, the way hyperlinks are typically used by authors of web pages can give them valuable information content. Typically, authors create links because they think they will be useful for the readers of the pages. Thus, links are usually either navigational aids that, for example, bring the reader back to the homepage of the site, or links that point to pages whose content augments the content of the current page. The second kind of links tend to point to high-quality pages that might be on the same topic as the page containing the link.

Years

Languages

  • e 55
  • d 5

Types

  • a 55
  • m 3
  • el 2
  • r 1
  • s 1
  • More… Less…