Search (45 results, page 1 of 3)

  • × theme_ss:"Automatisches Indexieren"
  • × type_ss:"a"
  • × year_i:[2010 TO 2020}
  1. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.03
    0.026805215 = product of:
      0.05361043 = sum of:
        0.0091654705 = weight(_text_:information in 5499) [ClassicSimilarity], result of:
          0.0091654705 = score(doc=5499,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.10971737 = fieldWeight in 5499, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=5499)
        0.04444496 = sum of:
          0.01865569 = weight(_text_:technology in 5499) [ClassicSimilarity], result of:
            0.01865569 = score(doc=5499,freq=2.0), product of:
              0.1417311 = queryWeight, product of:
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.047586527 = queryNorm
              0.13162735 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.978387 = idf(docFreq=6114, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
          0.02578927 = weight(_text_:22 in 5499) [ClassicSimilarity], result of:
            0.02578927 = score(doc=5499,freq=2.0), product of:
              0.16663991 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.047586527 = queryNorm
              0.15476047 = fieldWeight in 5499, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.03125 = fieldNorm(doc=5499)
      0.5 = coord(2/4)
    
    Abstract
    Purpose Modern mathematicians and scientists of math-related disciplines often use Document Preparation Systems (DPS) to write and Computer Algebra Systems (CAS) to calculate mathematical expressions. Usually, they translate the expressions manually between DPS and CAS. This process is time-consuming and error-prone. The purpose of this paper is to automate this translation. This paper uses Maple and Mathematica as the CAS, and LaTeX as the DPS. Design/methodology/approach Bruce Miller at the National Institute of Standards and Technology (NIST) developed a collection of special LaTeX macros that create links from mathematical symbols to their definitions in the NIST Digital Library of Mathematical Functions (DLMF). The authors are using these macros to perform rule-based translations between the formulae in the DLMF and CAS. Moreover, the authors develop software to ease the creation of new rules and to discover inconsistencies. Findings The authors created 396 mappings and translated 58.8 percent of DLMF formulae (2,405 expressions) successfully between Maple and DLMF. For a significant percentage, the special function definitions in Maple and the DLMF were different. An atomic symbol in one system maps to a composite expression in the other system. The translator was also successfully used for automatic verification of mathematical online compendia and CAS. The evaluation techniques discovered two errors in the DLMF and one defect in Maple. Originality/value This paper introduces the first translation tool for special functions between LaTeX and CAS. The approach improves error-prone manual translations and can be used to verify mathematical online compendia and CAS.
    Date
    20. 1.2015 18:30:22
    Footnote
    Beitrag in einem Special Issue: Information Science in the German-speaking Countries.
    Source
    Aslib journal of information management. 71(2019) no.3, S.415-439
  2. Fauzi, F.; Belkhatir, M.: Multifaceted conceptual image indexing on the world wide web (2013) 0.02
    0.017864795 = product of:
      0.03572959 = sum of:
        0.021737823 = weight(_text_:information in 2721) [ClassicSimilarity], result of:
          0.021737823 = score(doc=2721,freq=10.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.2602176 = fieldWeight in 2721, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=2721)
        0.013991767 = product of:
          0.027983533 = sum of:
            0.027983533 = weight(_text_:technology in 2721) [ClassicSimilarity], result of:
              0.027983533 = score(doc=2721,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19744103 = fieldWeight in 2721, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2721)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    In this paper, we describe a user-centered design of an automated multifaceted concept-based indexing framework which analyzes the semantics of the Web image contextual information and classifies it into five broad semantic concept facets: signal, object, abstract, scene, and relational; and identifies the semantic relationships between the concepts. An important aspect of our indexing model is that it relates to the users' levels of image descriptions. Also, a major contribution relies on the fact that the classification is performed automatically with the raw image contextual information extracted from any general webpage and is not solely based on image tags like state-of-the-art solutions. Human Language Technology techniques and an external knowledge base are used to analyze the information both syntactically and semantically. Experimental results on a human-annotated Web image collection and corresponding contextual information indicate that our method outperforms empirical frameworks employing tf-idf and location-based tf-idf weighting schemes as well as n-gram indexing in a recall/precision based evaluation framework.
    Source
    Information processing and management. 49(2013) no.2, S.420-440
  3. Lu, K.; Mao, J.; Li, G.: Toward effective automated weighted subject indexing : a comparison of different approaches in different environments (2018) 0.02
    0.016345935 = product of:
      0.03269187 = sum of:
        0.016202414 = weight(_text_:information in 4292) [ClassicSimilarity], result of:
          0.016202414 = score(doc=4292,freq=8.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.19395474 = fieldWeight in 4292, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4292)
        0.016489455 = product of:
          0.03297891 = sum of:
            0.03297891 = weight(_text_:technology in 4292) [ClassicSimilarity], result of:
              0.03297891 = score(doc=4292,freq=4.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.23268649 = fieldWeight in 4292, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4292)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Subject indexing plays an important role in supporting subject access to information resources. Current subject indexing systems do not make adequate distinctions on the importance of assigned subject descriptors. Assigning numeric weights to subject descriptors to distinguish their importance to the documents can strengthen the role of subject metadata. Automated methods are more cost-effective. This study compares different automated weighting methods in different environments. Two evaluation methods were used to assess the performance. Experiments on three datasets in the biomedical domain suggest the performance of different weighting methods depends on whether it is an abstract or full text environment. Mutual information with bag-of-words representation shows the best average performance in the full text environment, while cosine with bag-of-words representation is the best in an abstract environment. The cosine measure has relatively consistent and robust performance. A direct weighting method, IDF (Inverse Document Frequency), can produce quick and reasonable estimates of the weights. Bag-of-words representation generally outperforms the concept-based representation. Further improvement in performance can be obtained by using the learning-to-rank method to integrate different weighting methods. This study follows up Lu and Mao (Journal of the Association for Information Science and Technology, 66, 1776-1784, 2015), in which an automated weighted subject indexing method was proposed and validated. The findings from this study contribute to more effective weighted subject indexing.
    Source
    Journal of the Association for Information Science and Technology. 69(2018) no.1, S.121-133
  4. Williams, R.V.: Hans Peter Luhn and Herbert M. Ohlman : their roles in the origins of keyword-in-context/permutation automatic indexing (2010) 0.02
    0.015808811 = product of:
      0.031617623 = sum of:
        0.012961932 = weight(_text_:information in 3440) [ClassicSimilarity], result of:
          0.012961932 = score(doc=3440,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.1551638 = fieldWeight in 3440, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0625 = fieldNorm(doc=3440)
        0.01865569 = product of:
          0.03731138 = sum of:
            0.03731138 = weight(_text_:technology in 3440) [ClassicSimilarity], result of:
              0.03731138 = score(doc=3440,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.2632547 = fieldWeight in 3440, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0625 = fieldNorm(doc=3440)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.835-849
  5. Vlachidis, A.; Tudhope, D.: ¬A knowledge-based approach to information extraction for semantic interoperability in the archaeology domain (2016) 0.01
    0.01393111 = product of:
      0.02786222 = sum of:
        0.016202414 = weight(_text_:information in 2895) [ClassicSimilarity], result of:
          0.016202414 = score(doc=2895,freq=8.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.19395474 = fieldWeight in 2895, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2895)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 2895) [ClassicSimilarity], result of:
              0.02331961 = score(doc=2895,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 2895, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2895)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    The article presents a method for automatic semantic indexing of archaeological grey-literature reports using empirical (rule-based) Information Extraction techniques in combination with domain-specific knowledge organization systems. The semantic annotation system (OPTIMA) performs the tasks of Named Entity Recognition, Relation Extraction, Negation Detection, and Word-Sense Disambiguation using hand-crafted rules and terminological resources for associating contextual abstractions with classes of the standard ontology CIDOC Conceptual Reference Model (CRM) for cultural heritage and its archaeological extension, CRM-EH. Relation Extraction (RE) performance benefits from a syntactic-based definition of RE patterns derived from domain oriented corpus analysis. The evaluation also shows clear benefit in the use of assistive natural language processing (NLP) modules relating to Word-Sense Disambiguation, Negation Detection, and Noun Phrase Validation, together with controlled thesaurus expansion. The semantic indexing results demonstrate the capacity of rule-based Information Extraction techniques to deliver interoperable semantic abstractions (semantic annotations) with respect to the CIDOC CRM and archaeological thesauri. Major contributions include recognition of relevant entities using shallow parsing NLP techniques driven by a complimentary use of ontological and terminological domain resources and empirical derivation of context-driven RE rules for the recognition of semantic relationships from phrases of unstructured text.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.5, S.1138-1152
  6. Golub, K.; Soergel, D.; Buchanan, G.; Tudhope, D.; Lykke, M.; Hiom, D.: ¬A framework for evaluating automatic indexing or classification in the context of retrieval (2016) 0.01
    0.012845755 = product of:
      0.02569151 = sum of:
        0.0140317045 = weight(_text_:information in 3311) [ClassicSimilarity], result of:
          0.0140317045 = score(doc=3311,freq=6.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.16796975 = fieldWeight in 3311, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3311)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 3311) [ClassicSimilarity], result of:
              0.02331961 = score(doc=3311,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 3311, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3311)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Tools for automatic subject assignment help deal with scale and sustainability in creating and enriching metadata, establishing more connections across and between resources and enhancing consistency. Although some software vendors and experimental researchers claim the tools can replace manual subject indexing, hard scientific evidence of their performance in operating information environments is scarce. A major reason for this is that research is usually conducted in laboratory conditions, excluding the complexities of real-life systems and situations. The article reviews and discusses issues with existing evaluation approaches such as problems of aboutness and relevance assessments, implying the need to use more than a single "gold standard" method when evaluating indexing and retrieval, and proposes a comprehensive evaluation framework. The framework is informed by a systematic review of the literature on evaluation approaches: evaluating indexing quality directly through assessment by an evaluator or through comparison with a gold standard, evaluating the quality of computer-assisted indexing directly in the context of an indexing workflow, and evaluating indexing quality indirectly through analyzing retrieval performance.
    Series
    Advances in information science
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.1, S.3-16
  7. Cui, H.; Boufford, D.; Selden, P.: Semantic annotation of biosystematics literature without training examples (2010) 0.01
    0.011856608 = product of:
      0.023713216 = sum of:
        0.00972145 = weight(_text_:information in 3422) [ClassicSimilarity], result of:
          0.00972145 = score(doc=3422,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 3422, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3422)
        0.013991767 = product of:
          0.027983533 = sum of:
            0.027983533 = weight(_text_:technology in 3422) [ClassicSimilarity], result of:
              0.027983533 = score(doc=3422,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19744103 = fieldWeight in 3422, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3422)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.3, S.522-542
  8. Zhitomirsky-Geffet, M.; Prebor, G.; Bloch, O.: Improving proverb search and retrieval with a generic multidimensional ontology (2017) 0.01
    0.011856608 = product of:
      0.023713216 = sum of:
        0.00972145 = weight(_text_:information in 3320) [ClassicSimilarity], result of:
          0.00972145 = score(doc=3320,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 3320, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=3320)
        0.013991767 = product of:
          0.027983533 = sum of:
            0.027983533 = weight(_text_:technology in 3320) [ClassicSimilarity], result of:
              0.027983533 = score(doc=3320,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19744103 = fieldWeight in 3320, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=3320)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the Association for Information Science and Technology. 68(2017) no.1, S.141-153
  9. Beckmann, R.; Hinrichs, I.; Janßen, M.; Milmeister, G.; Schäuble, P.: ¬Der Digitale Assistent DA-3 : Eine Plattform für die Inhaltserschließung (2019) 0.01
    0.011856608 = product of:
      0.023713216 = sum of:
        0.00972145 = weight(_text_:information in 5408) [ClassicSimilarity], result of:
          0.00972145 = score(doc=5408,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.116372846 = fieldWeight in 5408, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.046875 = fieldNorm(doc=5408)
        0.013991767 = product of:
          0.027983533 = sum of:
            0.027983533 = weight(_text_:technology in 5408) [ClassicSimilarity], result of:
              0.027983533 = score(doc=5408,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.19744103 = fieldWeight in 5408, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.046875 = fieldNorm(doc=5408)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Der "Digitale Assistent" DA-3 ist ein webbasiertes Tool zur maschinellen Unterstützung der intellektuellen verbalen und klassifikatorischen Inhaltserschließung. Im Frühjahr 2016 wurde einer breiteren Fachöffentlichkeit die zunächst für den Einsatz im IBS|BW-Konsortium konzipierte Vorgängerversion DA-2 vorgestellt. Die Community nahm die Entwicklung vor dem Hintergrund der strategischen Diskussionen um zukunftsfähige Verfahren der Inhaltserschließung mit großem Interesse auf. Inzwischen wird das Tool in einem auf drei Jahre angelegten Kooperationsprojekt zwischen der Firma Eurospider Information Technology, dem IBS|BW-Konsortium, der Staatsbibliothek zu Berlin und den beiden Verbundzentralen VZG und BSZ zu einem zentralen und leistungsstarken Service weiterentwickelt. Die ersten Anwenderbibliotheken in SWB und GBV setzen den DA-3 während dieser Projektphase bereits erfolgreich ein, am Ende ist die Überführung in den Routinebetrieb vorgesehen. Der Beitrag beschreibt den derzeitigen Stand und Nutzen des Projekts im Kontext der aktuellen Rahmenbedingungen, stellt ausführlich die Funktionalitäten des DA-3 vor, gibt einen kleinen Einblick hinter die Kulissen der Projektpartner und einen Ausblick auf kommende Entwicklungsschritte.
  10. Vilares, D.; Alonso, M.A.; Gómez-Rodríguez, C.: On the usefulness of lexical and syntactic processing in polarity classification of Twitter messages (2015) 0.01
    0.011558321 = product of:
      0.023116643 = sum of:
        0.011456838 = weight(_text_:information in 2161) [ClassicSimilarity], result of:
          0.011456838 = score(doc=2161,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.13714671 = fieldWeight in 2161, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2161)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 2161) [ClassicSimilarity], result of:
              0.02331961 = score(doc=2161,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 2161, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2161)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Millions of micro texts are published every day on Twitter. Identifying the sentiment present in them can be helpful for measuring the frame of mind of the public, their satisfaction with respect to a product, or their support of a social event. In this context, polarity classification is a subfield of sentiment analysis focused on determining whether the content of a text is objective or subjective, and in the latter case, if it conveys a positive or a negative opinion. Most polarity detection techniques tend to take into account individual terms in the text and even some degree of linguistic knowledge, but they do not usually consider syntactic relations between words. This article explores how relating lexical, syntactic, and psychometric information can be helpful to perform polarity classification on Spanish tweets. We provide an evaluation for both shallow and deep linguistic perspectives. Empirical results show an improved performance of syntactic approaches over pure lexical models when using large training sets to create a classifier, but this tendency is reversed when small training collections are used.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1799-1816
  11. Kanan, T.; Fox, E.A.: Automated arabic text classification with P-Stemmer, machine learning, and a tailored news article taxonomy (2016) 0.01
    0.011558321 = product of:
      0.023116643 = sum of:
        0.011456838 = weight(_text_:information in 3151) [ClassicSimilarity], result of:
          0.011456838 = score(doc=3151,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.13714671 = fieldWeight in 3151, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3151)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 3151) [ClassicSimilarity], result of:
              0.02331961 = score(doc=3151,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 3151, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3151)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Arabic news articles in electronic collections are difficult to study. Browsing by category is rarely supported. Although helpful machine-learning methods have been applied successfully to similar situations for English news articles, limited research has been completed to yield suitable solutions for Arabic news. In connection with a Qatar National Research Fund (QNRF)-funded project to build digital library community and infrastructure in Qatar, we developed software for browsing a collection of about 237,000 Arabic news articles, which should be applicable to other Arabic news collections. We designed a simple taxonomy for Arabic news stories that is suitable for the needs of Qatar and other nations, is compatible with the subject codes of the International Press Telecommunications Council, and was enhanced with the aid of a librarian expert as well as five Arabic-speaking volunteers. We developed tailored stemming (i.e., a new Arabic light stemmer called P-Stemmer) and automatic classification methods (the best being binary Support Vector Machines classifiers) to work with the taxonomy. Using evaluation techniques commonly used in the information retrieval community, including 10-fold cross-validation and the Wilcoxon signed-rank test, we showed that our approach to stemming and classification is superior to state-of-the-art techniques.
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.11, S.2667-2683
  12. Lu, K.; Mao, J.: ¬An automatic approach to weighted subject indexing : an empirical study in the biomedical domain (2015) 0.01
    0.011558321 = product of:
      0.023116643 = sum of:
        0.011456838 = weight(_text_:information in 4005) [ClassicSimilarity], result of:
          0.011456838 = score(doc=4005,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.13714671 = fieldWeight in 4005, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4005)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 4005) [ClassicSimilarity], result of:
              0.02331961 = score(doc=4005,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 4005, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4005)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Subject indexing is an intellectually intensive process that has many inherent uncertainties. Existing manual subject indexing systems generally produce binary outcomes for whether or not to assign an indexing term. This does not sufficiently reflect the extent to which the indexing terms are associated with the documents. On the other hand, the idea of probabilistic or weighted indexing was proposed a long time ago and has seen success in capturing uncertainties in the automatic indexing process. One hurdle to overcome in implementing weighted indexing in manual subject indexing systems is the practical burden that could be added to the already intensive indexing process. This study proposes a method to infer automatically the associations between subject terms and documents through text mining. By uncovering the connections between MeSH descriptors and document text, we are able to derive the weights of MeSH descriptors manually assigned to documents. Our initial results suggest that the inference method is feasible and promising. The study has practical implications for improving subject indexing practice and providing better support for information retrieval.
    Source
    Journal of the Association for Information Science and Technology. 66(2015) no.9, S.1776-1784
  13. Martins, E.F.; Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: On cold start for associative tag recommendation (2016) 0.01
    0.0098805055 = product of:
      0.019761011 = sum of:
        0.008101207 = weight(_text_:information in 2494) [ClassicSimilarity], result of:
          0.008101207 = score(doc=2494,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 2494, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2494)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 2494) [ClassicSimilarity], result of:
              0.02331961 = score(doc=2494,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 2494, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2494)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.1, S.83-105
  14. Blank, I.; Rokach, L.; Shani, G.: Leveraging metadata to recommend keywords for academic papers (2016) 0.01
    0.0098805055 = product of:
      0.019761011 = sum of:
        0.008101207 = weight(_text_:information in 3232) [ClassicSimilarity], result of:
          0.008101207 = score(doc=3232,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.09697737 = fieldWeight in 3232, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3232)
        0.011659805 = product of:
          0.02331961 = sum of:
            0.02331961 = weight(_text_:technology in 3232) [ClassicSimilarity], result of:
              0.02331961 = score(doc=3232,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.16453418 = fieldWeight in 3232, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3232)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the Association for Information Science and Technology. 67(2016) no.12, S.3073-3091
  15. Martins, A.L.; Souza, R.R.; Ribeiro de Mello, H.: ¬The use of noun phrases in information retrieval : proposing a mechanism for automatic classification (2014) 0.01
    0.009687801 = product of:
      0.019375602 = sum of:
        0.006480966 = weight(_text_:information in 1441) [ClassicSimilarity], result of:
          0.006480966 = score(doc=1441,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.0775819 = fieldWeight in 1441, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1441)
        0.012894635 = product of:
          0.02578927 = sum of:
            0.02578927 = weight(_text_:22 in 1441) [ClassicSimilarity], result of:
              0.02578927 = score(doc=1441,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.15476047 = fieldWeight in 1441, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1441)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Knowledge organization in the 21st century: between historical patterns and future prospects. Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland. Ed.: Wieslaw Babik
  16. Hauer, M.: Tiefenindexierung im Bibliothekskatalog : 17 Jahre intelligentCAPTURE (2019) 0.01
    0.009670976 = product of:
      0.038683902 = sum of:
        0.038683902 = product of:
          0.077367805 = sum of:
            0.077367805 = weight(_text_:22 in 5629) [ClassicSimilarity], result of:
              0.077367805 = score(doc=5629,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.46428138 = fieldWeight in 5629, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=5629)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Source
    B.I.T.online. 22(2019) H.2, S.163-166
  17. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.01
    0.009246658 = product of:
      0.018493315 = sum of:
        0.0091654705 = weight(_text_:information in 1264) [ClassicSimilarity], result of:
          0.0091654705 = score(doc=1264,freq=4.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.10971737 = fieldWeight in 1264, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
        0.009327845 = product of:
          0.01865569 = sum of:
            0.01865569 = weight(_text_:technology in 1264) [ClassicSimilarity], result of:
              0.01865569 = score(doc=1264,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.13162735 = fieldWeight in 1264, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1264)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
    Source
    Journal of the Association for Information Science and Technology. 65(2014) no.5, S.1058-1075
  18. Stankovic, R. et al.: Indexing of textual databases based on lexical resources : a case study for Serbian (2016) 0.01
    0.008059147 = product of:
      0.032236587 = sum of:
        0.032236587 = product of:
          0.064473175 = sum of:
            0.064473175 = weight(_text_:22 in 2759) [ClassicSimilarity], result of:
              0.064473175 = score(doc=2759,freq=2.0), product of:
                0.16663991 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.047586527 = queryNorm
                0.38690117 = fieldWeight in 2759, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.078125 = fieldNorm(doc=2759)
          0.5 = coord(1/2)
      0.25 = coord(1/4)
    
    Date
    1. 2.2016 18:25:22
  19. Chung, E.-K.; Miksa, S.; Hastings, S.K.: ¬A framework of automatic subject term assignment for text categorization : an indexing conception-based approach (2010) 0.01
    0.007904406 = product of:
      0.015808811 = sum of:
        0.006480966 = weight(_text_:information in 3434) [ClassicSimilarity], result of:
          0.006480966 = score(doc=3434,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.0775819 = fieldWeight in 3434, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=3434)
        0.009327845 = product of:
          0.01865569 = sum of:
            0.01865569 = weight(_text_:technology in 3434) [ClassicSimilarity], result of:
              0.01865569 = score(doc=3434,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.13162735 = fieldWeight in 3434, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3434)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 61(2010) no.4, S.688-699
  20. Willis, C.; Losee, R.M.: ¬A random walk on an ontology : using thesaurus structure for automatic subject indexing (2013) 0.01
    0.007904406 = product of:
      0.015808811 = sum of:
        0.006480966 = weight(_text_:information in 1016) [ClassicSimilarity], result of:
          0.006480966 = score(doc=1016,freq=2.0), product of:
            0.083537094 = queryWeight, product of:
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.047586527 = queryNorm
            0.0775819 = fieldWeight in 1016, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.7554779 = idf(docFreq=20772, maxDocs=44218)
              0.03125 = fieldNorm(doc=1016)
        0.009327845 = product of:
          0.01865569 = sum of:
            0.01865569 = weight(_text_:technology in 1016) [ClassicSimilarity], result of:
              0.01865569 = score(doc=1016,freq=2.0), product of:
                0.1417311 = queryWeight, product of:
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.047586527 = queryNorm
                0.13162735 = fieldWeight in 1016, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  2.978387 = idf(docFreq=6114, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1016)
          0.5 = coord(1/2)
      0.5 = coord(2/4)
    
    Source
    Journal of the American Society for Information Science and Technology. 64(2013) no.7, S.1330-1344

Languages

  • e 33
  • d 12