Search (2989 results, page 1 of 150)

  • × year_i:[2000 TO 2010}
  1. Hotho, A.; Bloehdorn, S.: Data Mining 2004 : Text classification by boosting weak learners based on terms and concepts (2004) 0.15
    0.1495964 = sum of:
      0.081033945 = product of:
        0.24310184 = sum of:
          0.24310184 = weight(_text_:3a in 562) [ClassicSimilarity], result of:
            0.24310184 = score(doc=562,freq=2.0), product of:
              0.43255165 = queryWeight, product of:
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.051020417 = queryNorm
              0.56201804 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.478011 = idf(docFreq=24, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.33333334 = coord(1/3)
      0.04782477 = weight(_text_:data in 562) [ClassicSimilarity], result of:
        0.04782477 = score(doc=562,freq=4.0), product of:
          0.16132914 = queryWeight, product of:
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.051020417 = queryNorm
          0.29644224 = fieldWeight in 562, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            3.1620505 = idf(docFreq=5088, maxDocs=44218)
            0.046875 = fieldNorm(doc=562)
      0.020737685 = product of:
        0.04147537 = sum of:
          0.04147537 = weight(_text_:22 in 562) [ClassicSimilarity], result of:
            0.04147537 = score(doc=562,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.23214069 = fieldWeight in 562, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=562)
        0.5 = coord(1/2)
    
    Content
    Vgl.: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CEAQFjAA&url=http%3A%2F%2Fciteseerx.ist.psu.edu%2Fviewdoc%2Fdownload%3Fdoi%3D10.1.1.91.4940%26rep%3Drep1%26type%3Dpdf&ei=dOXrUMeIDYHDtQahsIGACg&usg=AFQjCNHFWVh6gNPvnOrOS9R3rkrXCNVD-A&sig2=5I2F5evRfMnsttSgFF9g7Q&bvm=bv.1357316858,d.Yms.
    Date
    8. 1.2013 10:22:32
    Source
    Proceedings of the 4th IEEE International Conference on Data Mining (ICDM 2004), 1-4 November 2004, Brighton, UK
  2. Näppilä, T.; Järvelin, K.; Niemi, T.: ¬A tool for data cube construction from structurally heterogeneous XML documents (2008) 0.11
    0.113244854 = product of:
      0.16986728 = sum of:
        0.08911619 = weight(_text_:data in 1369) [ClassicSimilarity], result of:
          0.08911619 = score(doc=1369,freq=20.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.5523875 = fieldWeight in 1369, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1369)
        0.08075108 = sum of:
          0.04618826 = weight(_text_:processing in 1369) [ClassicSimilarity], result of:
            0.04618826 = score(doc=1369,freq=2.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.22363065 = fieldWeight in 1369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1369)
          0.03456281 = weight(_text_:22 in 1369) [ClassicSimilarity], result of:
            0.03456281 = score(doc=1369,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.19345059 = fieldWeight in 1369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1369)
      0.6666667 = coord(2/3)
    
    Abstract
    Data cubes for OLAP (On-Line Analytical Processing) often need to be constructed from data located in several distributed and autonomous information sources. Such a data integration process is challenging due to semantic, syntactic, and structural heterogeneity among the data. While XML (extensible markup language) is the de facto standard for data exchange, the three types of heterogeneity remain. Moreover, popular path-oriented XML query languages, such as XQuery, require the user to know in much detail the structure of the documents to be processed and are, thus, effectively impractical in many real-world data integration tasks. Several Lowest Common Ancestor (LCA)-based XML query evaluation strategies have recently been introduced to provide a more structure-independent way to access XML documents. We shall, however, show that this approach leads in the context of certain - not uncommon - types of XML documents to undesirable results. This article introduces a novel high-level data extraction primitive that utilizes the purpose-built Smallest Possible Context (SPC) query evaluation strategy. We demonstrate, through a system prototype for OLAP data cube construction and a sample application in informetrics, that our approach has real advantages in data integration.
    Date
    9. 2.2008 17:22:42
  3. Between data science and applied data analysis : Proceedings of the 26th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Mannheim, July 22-24, 2002 (2003) 0.11
    0.105747774 = product of:
      0.15862165 = sum of:
        0.11714628 = weight(_text_:data in 4606) [ClassicSimilarity], result of:
          0.11714628 = score(doc=4606,freq=6.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.7261322 = fieldWeight in 4606, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.09375 = fieldNorm(doc=4606)
        0.04147537 = product of:
          0.08295074 = sum of:
            0.08295074 = weight(_text_:22 in 4606) [ClassicSimilarity], result of:
              0.08295074 = score(doc=4606,freq=2.0), product of:
                0.1786648 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051020417 = queryNorm
                0.46428138 = fieldWeight in 4606, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4606)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Series
    Studies in classification, data analysis, and knowledge organization
  4. Wackerow, J.: ¬The Data Documentation Initiative (DDI) (2008) 0.10
    0.10393649 = product of:
      0.15590474 = sum of:
        0.08598673 = weight(_text_:data in 2662) [ClassicSimilarity], result of:
          0.08598673 = score(doc=2662,freq=38.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.5329895 = fieldWeight in 2662, product of:
              6.164414 = tf(freq=38.0), with freq of:
                38.0 = termFreq=38.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.02734375 = fieldNorm(doc=2662)
        0.069918014 = sum of:
          0.045724045 = weight(_text_:processing in 2662) [ClassicSimilarity], result of:
            0.045724045 = score(doc=2662,freq=4.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.22138305 = fieldWeight in 2662, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2662)
          0.024193967 = weight(_text_:22 in 2662) [ClassicSimilarity], result of:
            0.024193967 = score(doc=2662,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.1354154 = fieldWeight in 2662, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.02734375 = fieldNorm(doc=2662)
      0.6666667 = coord(2/3)
    
    Abstract
    The Data Documentation Initiative (DDI) is an international effort to establish an XML-based standard for the compilation, presentation, and exchange of documentation for datasets in the social and behavioral sciences. The most recent version 3.0 of the DDI supports a rich and structured set of metadata elements that not only fully informs a potential data analyst about a given dataset but also facilitates computer processing of the data. Moreover, data producers will find that by adopting the DDI standard they can produce better and more complete documentation as a natural step in designing and fielding computer-assisted interviewing. DDI 3.0 embraces the full life cycle of the data from conception, through development of the data collection instrument, collection and cleaning of data, production of data products, distribution, preservation, and reuse or analysis of the data. DDI 3.0 is designed to facilitate sharing schemes for concepts, questions, coding, and variables within organizations or throughout the social science research community. Comparison through direct inheritance as in the case of comparisonby- design or through the mapping of items like variables or categories allow capture of the harmonization processes used in creating integrated files in an uniform and machine-actionable way. DDI 3.0 is providing the structural support needed to facilitate comparative survey work in a way that was previously unavailable in an open, non-proprietary system. A specific DDI module allows for the capture and expression of native Dublin Core elements (DCMES), used either as references or as descriptions of a particular set of metadata. This module uses the simple Dublin Core namespace represented as XML Schema following the guidelines for implementing Dublin Core in XML. In DDI, the Dublin Core is not used as the primary citation mechanism - this module is included to support applications which understand the Dublin Core XML, but which do not understand DDI. This module is used wherever citations are permitted within DDI 3.0 (like citations of a study description or of other material). DDI 3.0 is aligned with other metadata standards as well: with SDMX (time-series data) for exchanging aggregate data, with ISO/IEC 11179 (metadata registry) for building data registries such as question, variable, and concept banks, and with FGDC and ISO 19115 (geographic standards) for supporting GIS users. DDI 3.0 is described in a conceptual model which is also expressed in the Universal Modeling Language (UML). Modular XML Schemas are derived from the conceptual model. Many elements support computer processing - that is, it will go beyond being "human readable", and move toward the goal of being "machine-actionable". The final release of DDI 3.0 has been published on April 28th 2008. The standard was developed by the DDI Alliance, an international group encompassing data archives and research institutions from several countries in Western Europe and North America. Earlier versions of DDI provide examples of institutions and applications: the Inter-university Consortium for Political and Social Research (ICPSR) Data Catalog, the Council of European Social Science Data Services (CESSDA) Data Portal, the Dataverse Network, the International Household Survey Network (IHSN), NESSTAR Software for publishing data on the Web and online analysis, and the Microdata Management Toolkit (by the World Bank Data Group for IHSN).
    Source
    Metadata for semantic and social applications : proceedings of the International Conference on Dublin Core and Metadata Applications, Berlin, 22 - 26 September 2008, DC 2008: Berlin, Germany / ed. by Jane Greenberg and Wolfgang Klas
  5. Trotman, A.: Searching structured documents (2004) 0.10
    0.10166995 = product of:
      0.15250492 = sum of:
        0.03945342 = weight(_text_:data in 2538) [ClassicSimilarity], result of:
          0.03945342 = score(doc=2538,freq=2.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.24455236 = fieldWeight in 2538, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2538)
        0.113051504 = sum of:
          0.06466357 = weight(_text_:processing in 2538) [ClassicSimilarity], result of:
            0.06466357 = score(doc=2538,freq=2.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.3130829 = fieldWeight in 2538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
          0.048387934 = weight(_text_:22 in 2538) [ClassicSimilarity], result of:
            0.048387934 = score(doc=2538,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.2708308 = fieldWeight in 2538, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2538)
      0.6666667 = coord(2/3)
    
    Abstract
    Structured document interchange formats such as XML and SGML are ubiquitous, however, information retrieval systems supporting structured searching are not. Structured searching can result in increased precision. A search for the author "Smith" in an unstructured corpus of documents specializing in iron-working could have a lower precision than a structured search for "Smith as author" in the same corpus. Analysis of XML retrieval languages identifies additional functionality that must be supported including searching at, and broken across multiple nodes in the document tree. A data structure is developed to support structured document searching. Application of this structure to information retrieval is then demonstrated. Document ranking is examined and adapted specifically for structured searching.
    Date
    14. 8.2004 10:39:22
    Source
    Information processing and management. 40(2004) no.4, S.619-632
  6. Carvalho, J.R. de; Cordeiro, M.I.; Lopes, A.; Vieira, M.: Meta-information about MARC : an XML framework for validation, explanation and help systems (2004) 0.10
    0.10166995 = product of:
      0.15250492 = sum of:
        0.03945342 = weight(_text_:data in 2848) [ClassicSimilarity], result of:
          0.03945342 = score(doc=2848,freq=2.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.24455236 = fieldWeight in 2848, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2848)
        0.113051504 = sum of:
          0.06466357 = weight(_text_:processing in 2848) [ClassicSimilarity], result of:
            0.06466357 = score(doc=2848,freq=2.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.3130829 = fieldWeight in 2848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2848)
          0.048387934 = weight(_text_:22 in 2848) [ClassicSimilarity], result of:
            0.048387934 = score(doc=2848,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.2708308 = fieldWeight in 2848, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0546875 = fieldNorm(doc=2848)
      0.6666667 = coord(2/3)
    
    Abstract
    This article proposes a schema for meta-information about MARC that can express at a fairly comprehensive level the syntactic and semantic aspects of MARC formats in XML, including not only rules but also all texts and examples that are conveyed by MARC documentation. It can be thought of as an XML version of the MARC or UNIMARC manuals, for both machine and human usage. The article explains how such a schema can be the central piece of a more complete framework, to be used in conjunction with "slim" record formats, providing a rich environment for the automated processing of bibliographic data.
    Source
    Library hi tech. 22(2004) no.2, S.131-137
  7. Moore, R.W.: Management of very large distributed shared collections (2009) 0.10
    0.09930443 = product of:
      0.14895664 = sum of:
        0.12476268 = weight(_text_:data in 3845) [ClassicSimilarity], result of:
          0.12476268 = score(doc=3845,freq=20.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.7733425 = fieldWeight in 3845, product of:
              4.472136 = tf(freq=20.0), with freq of:
                20.0 = termFreq=20.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3845)
        0.024193967 = product of:
          0.048387934 = sum of:
            0.048387934 = weight(_text_:22 in 3845) [ClassicSimilarity], result of:
              0.048387934 = score(doc=3845,freq=2.0), product of:
                0.1786648 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.051020417 = queryNorm
                0.2708308 = fieldWeight in 3845, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=3845)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Large scientific collections may be managed as data grids for sharing data, digital libraries for publishing data, persistent archives for preserving data, or as real-time data repositories for sensor data. Despite the multiple types of data management objectives, it is possible to build each system from generic software infrastructure. This entry examines the requirements driving the management of large data collections, the concepts on which current data management systems are based, and the current research initiatives for managing distributed data collections.
    Date
    27. 8.2011 14:22:57
  8. Nirenburg, S.; Raskin, V.: Ontological semantics (2004) 0.10
    0.09571361 = product of:
      0.14357041 = sum of:
        0.07890684 = weight(_text_:data in 1437) [ClassicSimilarity], result of:
          0.07890684 = score(doc=1437,freq=8.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.48910472 = fieldWeight in 1437, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0546875 = fieldNorm(doc=1437)
        0.06466357 = product of:
          0.12932713 = sum of:
            0.12932713 = weight(_text_:processing in 1437) [ClassicSimilarity], result of:
              0.12932713 = score(doc=1437,freq=8.0), product of:
                0.20653816 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.051020417 = queryNorm
                0.6261658 = fieldWeight in 1437, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=1437)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    LCSH
    Semantics / Data processing
    Discourse analysis / Data processing
    Subject
    Semantics / Data processing
    Discourse analysis / Data processing
  9. Larkey, L.S.; Connell, M.E.: Structured queries, language modelling, and relevance modelling in cross-language information retrieval (2005) 0.10
    0.09516283 = product of:
      0.14274424 = sum of:
        0.028181016 = weight(_text_:data in 1022) [ClassicSimilarity], result of:
          0.028181016 = score(doc=1022,freq=2.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.17468026 = fieldWeight in 1022, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1022)
        0.11456323 = sum of:
          0.080000415 = weight(_text_:processing in 1022) [ClassicSimilarity], result of:
            0.080000415 = score(doc=1022,freq=6.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.38733965 = fieldWeight in 1022, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1022)
          0.03456281 = weight(_text_:22 in 1022) [ClassicSimilarity], result of:
            0.03456281 = score(doc=1022,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.19345059 = fieldWeight in 1022, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=1022)
      0.6666667 = coord(2/3)
    
    Abstract
    Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries--one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus. We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage.
    Date
    26.12.2007 20:22:11
    Source
    Information processing and management. 41(2005) no.3, S.457-474
  10. Kurth, M.; Ruddy, D.; Rupp, N.: Repurposing MARC metadata : using digital project experience to develop a metadata management design (2004) 0.09
    0.08714567 = product of:
      0.1307185 = sum of:
        0.033817217 = weight(_text_:data in 4748) [ClassicSimilarity], result of:
          0.033817217 = score(doc=4748,freq=2.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.2096163 = fieldWeight in 4748, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=4748)
        0.09690128 = sum of:
          0.055425912 = weight(_text_:processing in 4748) [ClassicSimilarity], result of:
            0.055425912 = score(doc=4748,freq=2.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.26835677 = fieldWeight in 4748, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.046875 = fieldNorm(doc=4748)
          0.04147537 = weight(_text_:22 in 4748) [ClassicSimilarity], result of:
            0.04147537 = score(doc=4748,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.23214069 = fieldWeight in 4748, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=4748)
      0.6666667 = coord(2/3)
    
    Abstract
    Metadata and information technology staff in libraries that are building digital collections typically extract and manipulate MARC metadata sets to provide access to digital content via non-MARC schemes. Metadata processing in these libraries involves defining the relationships between metadata schemes, moving metadata between schemes, and coordinating the intellectual activity and physical resources required to create and manipulate metadata. Actively managing the non-MARC metadata resources used to build digital collections is something most of these libraries have only begun to do. This article proposes strategies for managing MARC metadata repurposing efforts as the first step in a coordinated approach to library metadata management. Guided by lessons learned from Cornell University library mapping and transformation activities, the authors apply the literature of data resource management to library metadata management and propose a model for managing MARC metadata repurposing processes through the implementation of a metadata management design.
    Source
    Library hi tech. 22(2004) no.2, S.144-152
  11. Mingers, J.; Burrell, Q.L.: Modeling citation behavior in Management Science journals (2006) 0.09
    0.08714567 = product of:
      0.1307185 = sum of:
        0.033817217 = weight(_text_:data in 994) [ClassicSimilarity], result of:
          0.033817217 = score(doc=994,freq=2.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.2096163 = fieldWeight in 994, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=994)
        0.09690128 = sum of:
          0.055425912 = weight(_text_:processing in 994) [ClassicSimilarity], result of:
            0.055425912 = score(doc=994,freq=2.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.26835677 = fieldWeight in 994, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.046875 = fieldNorm(doc=994)
          0.04147537 = weight(_text_:22 in 994) [ClassicSimilarity], result of:
            0.04147537 = score(doc=994,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.23214069 = fieldWeight in 994, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.046875 = fieldNorm(doc=994)
      0.6666667 = coord(2/3)
    
    Abstract
    Citation rates are becoming increasingly important in judging the research quality of journals, institutions and departments, and individual faculty. This paper looks at the pattern of citations across different management science journals and over time. A stochastic model is proposed which views the generating mechanism of citations as a gamma mixture of Poisson processes generating overall a negative binomial distribution. This is tested empirically with a large sample of papers published in 1990 from six management science journals and found to fit well. The model is extended to include obsolescence, i.e., that the citation rate for a paper varies over its cited lifetime. This leads to the additional citations distribution which shows that future citations are a linear function of past citations with a time-dependent and decreasing slope. This is also verified empirically in a way that allows different obsolescence functions to be fitted to the data. Conclusions concerning the predictability of future citations, and future research in this area are discussed.
    Date
    26.12.2007 19:22:05
    Source
    Information processing and management. 42(2006) no.6, S.1451-1464
  12. Dobrev, P.; Kalaydjiev, O.; Angelova, G.: From conceptual structures to semantic interoperability of content (2007) 0.09
    0.086374685 = product of:
      0.12956202 = sum of:
        0.04881095 = weight(_text_:data in 4607) [ClassicSimilarity], result of:
          0.04881095 = score(doc=4607,freq=6.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.30255508 = fieldWeight in 4607, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4607)
        0.08075108 = sum of:
          0.04618826 = weight(_text_:processing in 4607) [ClassicSimilarity], result of:
            0.04618826 = score(doc=4607,freq=2.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.22363065 = fieldWeight in 4607, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4607)
          0.03456281 = weight(_text_:22 in 4607) [ClassicSimilarity], result of:
            0.03456281 = score(doc=4607,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.19345059 = fieldWeight in 4607, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=4607)
      0.6666667 = coord(2/3)
    
    Abstract
    Smart applications behave intelligently because they understand at least partially the context where they operate. To do this, they need not only a formal domain model but also formal descriptions of the data they process and their own operational behaviour. Interoperability of smart applications is based on formalised definitions of all their data and processes. This paper studies the semantic interoperability of data in the case of eLearning and describes an experiment and its assessment. New content is imported into a knowledge-based learning environment without real updates of the original domain model, which is encoded as a knowledge base of conceptual graphs. A component called mediator enables the import by assigning dummy metadata annotations for the imported items. However, some functionality of the original system is lost, when processing the imported content, due to the lack of proper metadata annotation which cannot be associated fully automatically. So the paper presents an interoperability scenario when appropriate content items are viewed from the perspective of the original world and can be (partially) reused there.
    Source
    Conceptual structures: knowledge architectures for smart applications: 15th International Conference on Conceptual Structures, ICCS 2007, Sheffield, UK, July 22 - 27, 2007 ; proceedings. Eds.: U. Priss u.a
  13. Dang, X.H.; Ong. K.-L.: Knowledge discovery in data streams (2009) 0.09
    0.08610974 = product of:
      0.1291646 = sum of:
        0.10145165 = weight(_text_:data in 3829) [ClassicSimilarity], result of:
          0.10145165 = score(doc=3829,freq=18.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.6288489 = fieldWeight in 3829, product of:
              4.2426405 = tf(freq=18.0), with freq of:
                18.0 = termFreq=18.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=3829)
        0.027712956 = product of:
          0.055425912 = sum of:
            0.055425912 = weight(_text_:processing in 3829) [ClassicSimilarity], result of:
              0.055425912 = score(doc=3829,freq=2.0), product of:
                0.20653816 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.051020417 = queryNorm
                0.26835677 = fieldWeight in 3829, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3829)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Knowing what to do with the massive amount of data collected has always been an ongoing issue for many organizations. While data mining has been touted to be the solution, it has failed to deliver the impact despite its successes in many areas. One reason is that data mining algorithms were not designed for the real world, i.e., they usually assume a static view of the data and a stable execution environment where resourcesare abundant. The reality however is that data are constantly changing and the execution environment is dynamic. Hence, it becomes difficult for data mining to truly deliver timely and relevant results. Recently, the processing of stream data has received many attention. What is interesting is that the methodology to design stream-based algorithms may well be the solution to the above problem. In this entry, we discuss this issue and present an overview of recent works.
    Theme
    Data Mining
  14. Jacquemin, C.: Spotting and discovering terms through natural language processing (2001) 0.08
    0.082744 = product of:
      0.124116 = sum of:
        0.06301467 = weight(_text_:data in 119) [ClassicSimilarity], result of:
          0.06301467 = score(doc=119,freq=10.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.39059696 = fieldWeight in 119, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=119)
        0.06110133 = product of:
          0.12220266 = sum of:
            0.12220266 = weight(_text_:processing in 119) [ClassicSimilarity], result of:
              0.12220266 = score(doc=119,freq=14.0), product of:
                0.20653816 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.051020417 = queryNorm
                0.5916711 = fieldWeight in 119, product of:
                  3.7416575 = tf(freq=14.0), with freq of:
                    14.0 = termFreq=14.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=119)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    In this book Christian Jacquemin shows how the power of natural language processing (NLP) can be used to advance text indexing and information retrieval (IR). Jacquemin's novel tool is FASTR, a parser that normalizes terms and recognizes term variants. Since there are more meanings in a language than there are words, FASTR uses a metagrammar composed of shallow linguistic transformations that describe the morphological, syntactic, semantic, and pragmatic variations of words and terms. The acquired parsed terms can then be applied for precise retrieval and assembly of information. The use of a corpus-based unification grammar to define, recognize, and combine term variants from their base forms allows for intelligent information access to, or "linguistic data tuning" of, heterogeneous texts. FASTR can be used to do automatic controlled indexing, to carry out content-based Web searches through conceptually related alternative query formulations, to abstract scientific and technical extracts, and even to translate and collect terms from multilingual material. Jacquemin provides a comprehensive account of the method and implementation of this innovative retrieval technique for text processing.
    LCSH
    Language and languages / Variation / Data processing
    Terms and phrases / Data processing
    Subject
    Language and languages / Variation / Data processing
    Terms and phrases / Data processing
  15. Blosser, J.; Michaelson, R.; Routh. R.; Xia, P.: Defining the landscape of Web resources : Concluding Report of the BAER Web Resources Sub-Group (2000) 0.08
    0.08235585 = product of:
      0.12353377 = sum of:
        0.03188318 = weight(_text_:data in 1447) [ClassicSimilarity], result of:
          0.03188318 = score(doc=1447,freq=4.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.19762816 = fieldWeight in 1447, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.03125 = fieldNorm(doc=1447)
        0.09165059 = sum of:
          0.06400034 = weight(_text_:processing in 1447) [ClassicSimilarity], result of:
            0.06400034 = score(doc=1447,freq=6.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.30987173 = fieldWeight in 1447, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.03125 = fieldNorm(doc=1447)
          0.027650248 = weight(_text_:22 in 1447) [ClassicSimilarity], result of:
            0.027650248 = score(doc=1447,freq=2.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.15476047 = fieldWeight in 1447, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=1447)
      0.6666667 = coord(2/3)
    
    Abstract
    The BAER Web Resources Group was charged in October 1999 with defining and describing the parameters of electronic resources that do not clearly belong to the categories being defined by the BAER Digital Group or the BAER Electronic Journals Group. After some difficulty identifying precisely which resources fell under the Group's charge, we finally named the following types of resources for our consideration: web sites, electronic texts, indexes, databases and abstracts, online reference resources, and networked and non-networked CD-ROMs. Electronic resources are a vast and growing collection that touch nearly every department within the Library. It is unrealistic to think one department can effectively administer all aspects of the collection. The Group then began to focus on the concern of bibliographic access to these varied resources, and to define parameters for handling or processing them within the Library. Some key elements became evident as the work progressed. * Selection process of resources to be acquired for the collection * Duplication of effort * Use of CORC * Resource Finder design * Maintenance of Resource Finder * CD-ROMs not networked * Communications * Voyager search limitations. An unexpected collaboration with the Web Development Committee on the Resource Finder helped to steer the Group to more detailed descriptions of bibliographic access. This collaboration included development of data elements for the Resource Finder database, and some discussions on Library staff processing of the resources. The Web Resources Group invited expert testimony to help the Group broaden its view to envision public use of the resources and discuss concerns related to technical services processing. The first testimony came from members of the Resource Finder Committee. Some background information on the Web Development Resource Finder Committee was shared. The second testimony was from librarians who select electronic texts. Three main themes were addressed: accessing CD-ROMs; the issue of including non-networked CD-ROMs in the Resource Finder; and, some special concerns about electronic texts. The third testimony came from librarians who select indexes and abstracts and also provide Reference services. Appendices to this report include minutes of the meetings with the experts (Appendix A), a list of proposed data elements to be used in the Resource Finder (Appendix B), and recommendations made to the Resource Finder Committee (Appendix C). Below are summaries of the key elements.
    Date
    21. 4.2002 10:22:31
  16. Decimal Classification Editorial Policy Committee (2002) 0.08
    0.08216565 = product of:
      0.12324847 = sum of:
        0.028181016 = weight(_text_:data in 236) [ClassicSimilarity], result of:
          0.028181016 = score(doc=236,freq=2.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.17468026 = fieldWeight in 236, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.0390625 = fieldNorm(doc=236)
        0.09506746 = sum of:
          0.04618826 = weight(_text_:processing in 236) [ClassicSimilarity], result of:
            0.04618826 = score(doc=236,freq=2.0), product of:
              0.20653816 = queryWeight, product of:
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.051020417 = queryNorm
              0.22363065 = fieldWeight in 236, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.048147 = idf(docFreq=2097, maxDocs=44218)
                0.0390625 = fieldNorm(doc=236)
          0.048879195 = weight(_text_:22 in 236) [ClassicSimilarity], result of:
            0.048879195 = score(doc=236,freq=4.0), product of:
              0.1786648 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.051020417 = queryNorm
              0.27358043 = fieldWeight in 236, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=236)
      0.6666667 = coord(2/3)
    
    Abstract
    The Decimal Classification Editorial Policy Committee (EPC) held its Meeting 117 at the Library Dec. 3-5, 2001, with chair Andrea Stamm (Northwestern University) presiding. Through its actions at this meeting, significant progress was made toward publication of DDC unabridged Edition 22 in mid-2003 and Abridged Edition 14 in early 2004. For Edition 22, the committee approved the revisions to two major segments of the classification: Table 2 through 55 Iran (the first half of the geographic area table) and 900 History and geography. EPC approved updates to several parts of the classification it had already considered: 004-006 Data processing, Computer science; 340 Law; 370 Education; 510 Mathematics; 610 Medicine; Table 3 issues concerning treatment of scientific and technical themes, with folklore, arts, and printing ramifications at 398.2 - 398.3, 704.94, and 758; Table 5 and Table 6 Ethnic Groups and Languages (portions concerning American native peoples and languages); and tourism issues at 647.9 and 790. Reports on the results of testing the approved 200 Religion and 305-306 Social groups schedules were received, as was a progress report on revision work for the manual being done by Ross Trotter (British Library, retired). Revisions for Abridged Edition 14 that received committee approval included 010 Bibliography; 070 Journalism; 150 Psychology; 370 Education; 380 Commerce, communications, and transportation; 621 Applied physics; 624 Civil engineering; and 629.8 Automatic control engineering. At the meeting the committee received print versions of _DC&_ numbers 4 and 5. Primarily for the use of Dewey translators, these cumulations list changes, substantive and cosmetic, to DDC Edition 21 and Abridged Edition 13 for the period October 1999 - December 2001. EPC will hold its Meeting 118 at the Library May 15-17, 2002.
  17. Intelligent information processing and web mining : Proceedings of the International IIS: IIPWM'03 Conference held in Zakopane, Poland, June 2-5, 2003 (2003) 0.08
    0.082040235 = product of:
      0.123060346 = sum of:
        0.06763443 = weight(_text_:data in 4642) [ClassicSimilarity], result of:
          0.06763443 = score(doc=4642,freq=2.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.4192326 = fieldWeight in 4642, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.09375 = fieldNorm(doc=4642)
        0.055425912 = product of:
          0.110851824 = sum of:
            0.110851824 = weight(_text_:processing in 4642) [ClassicSimilarity], result of:
              0.110851824 = score(doc=4642,freq=2.0), product of:
                0.20653816 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.051020417 = queryNorm
                0.53671354 = fieldWeight in 4642, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.09375 = fieldNorm(doc=4642)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Theme
    Data Mining
  18. Miller, D.R.; Clarke, K.S.: Putting XML to work in the library : tools for improving access and management (2004) 0.08
    0.082040235 = product of:
      0.123060346 = sum of:
        0.06763443 = weight(_text_:data in 1438) [ClassicSimilarity], result of:
          0.06763443 = score(doc=1438,freq=8.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.4192326 = fieldWeight in 1438, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1438)
        0.055425912 = product of:
          0.110851824 = sum of:
            0.110851824 = weight(_text_:processing in 1438) [ClassicSimilarity], result of:
              0.110851824 = score(doc=1438,freq=8.0), product of:
                0.20653816 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.051020417 = queryNorm
                0.53671354 = fieldWeight in 1438, product of:
                  2.828427 = tf(freq=8.0), with freq of:
                    8.0 = termFreq=8.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1438)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    LCSH
    Libraries / Data processing
    Cataloging / Data processing
    Subject
    Libraries / Data processing
    Cataloging / Data processing
  19. Wang, H.; Liu, Q.; Penin, T.; Fu, L.; Zhang, L.; Tran, T.; Yu, Y.; Pan, Y.: Semplore: a scalable IR approach to search the Web of Data (2009) 0.08
    0.07812328 = product of:
      0.117184915 = sum of:
        0.08947196 = weight(_text_:data in 1638) [ClassicSimilarity], result of:
          0.08947196 = score(doc=1638,freq=14.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.55459267 = fieldWeight in 1638, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=1638)
        0.027712956 = product of:
          0.055425912 = sum of:
            0.055425912 = weight(_text_:processing in 1638) [ClassicSimilarity], result of:
              0.055425912 = score(doc=1638,freq=2.0), product of:
                0.20653816 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.051020417 = queryNorm
                0.26835677 = fieldWeight in 1638, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=1638)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    The Web of Data keeps growing rapidly. However, the full exploitation of this large amount of structured data faces numerous challenges like usability, scalability, imprecise information needs and data change. We present Semplore, an IR-based system that aims at addressing these issues. Semplore supports intuitive faceted search and complex queries both on text and structured data. It combines imprecise keyword search and precise structured query in a unified ranking scheme. Scalable query processing is supported by leveraging inverted indexes traditionally used in IR systems. This is combined with a novel block-based index structure to support efficient index update when data changes. The experimental results show that Semplore is an efficient and effective system for searching the Web of Data and can be used as a basic infrastructure for Web-scale Semantic Web search engines.
  20. Losee, R.M.: Browsing mixed structured and unstructured data (2006) 0.08
    0.07812328 = product of:
      0.117184915 = sum of:
        0.08947196 = weight(_text_:data in 173) [ClassicSimilarity], result of:
          0.08947196 = score(doc=173,freq=14.0), product of:
            0.16132914 = queryWeight, product of:
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.051020417 = queryNorm
            0.55459267 = fieldWeight in 173, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              3.1620505 = idf(docFreq=5088, maxDocs=44218)
              0.046875 = fieldNorm(doc=173)
        0.027712956 = product of:
          0.055425912 = sum of:
            0.055425912 = weight(_text_:processing in 173) [ClassicSimilarity], result of:
              0.055425912 = score(doc=173,freq=2.0), product of:
                0.20653816 = queryWeight, product of:
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.051020417 = queryNorm
                0.26835677 = fieldWeight in 173, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.048147 = idf(docFreq=2097, maxDocs=44218)
                  0.046875 = fieldNorm(doc=173)
          0.5 = coord(1/2)
      0.6666667 = coord(2/3)
    
    Abstract
    Both structured and unstructured data, as well as structured data representing several different types of tuples, may be integrated into a single list for browsing or retrieval. Data may be arranged in the Gray code order of the features and metadata, producing optimal ordering for browsing. We provide several metrics for evaluating the performance of systems supporting browsing, given some constraints. Metadata and indexing terms are used for sorting keys and attributes for structured data, as well as for semi-structured or unstructured documents, images, media, etc. Economic and information theoretic models are suggested that enable the ordering to adapt to user preferences. Different relational structures and unstructured data may be integrated into a single, optimal ordering for browsing or for displaying tables in digital libraries, database management systems, or information retrieval systems. Adaptive displays of data are discussed.
    Source
    Information processing and management. 42(2006) no.2, S.440-452

Languages

Types

  • a 2550
  • m 294
  • el 167
  • s 115
  • b 26
  • x 24
  • i 9
  • n 7
  • r 7
  • More… Less…

Themes

Subjects

Classifications