Search (82 results, page 1 of 5)

  • × theme_ss:"Automatisches Indexieren"
  1. Junger, U.: Möglichkeiten und Probleme automatischer Erschließungsverfahren in Bibliotheken : Bericht vom KASCADE-Workshop in der Universitäts- und Landesbibliothek Düsseldorf (1999) 0.05
    0.046928823 = product of:
      0.14078647 = sum of:
        0.119892076 = weight(_text_:forschung in 3645) [ClassicSimilarity], result of:
          0.119892076 = score(doc=3645,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.64500517 = fieldWeight in 3645, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.09375 = fieldNorm(doc=3645)
        0.020894397 = product of:
          0.06268319 = sum of:
            0.06268319 = weight(_text_:29 in 3645) [ClassicSimilarity], result of:
              0.06268319 = score(doc=3645,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.46638384 = fieldWeight in 3645, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.09375 = fieldNorm(doc=3645)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Date
    23.10.1996 17:26:29
    Source
    Bibliothek: Forschung und Praxis. 23(1999) H.1, S.88-90
  2. Fuhr, N.: Rankingexperimente mit gewichteter Indexierung (1986) 0.05
    0.04686617 = product of:
      0.1405985 = sum of:
        0.119892076 = weight(_text_:forschung in 2051) [ClassicSimilarity], result of:
          0.119892076 = score(doc=2051,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.64500517 = fieldWeight in 2051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.09375 = fieldNorm(doc=2051)
        0.020706438 = product of:
          0.062119313 = sum of:
            0.062119313 = weight(_text_:22 in 2051) [ClassicSimilarity], result of:
              0.062119313 = score(doc=2051,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.46428138 = fieldWeight in 2051, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=2051)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Date
    14. 6.2015 22:12:56
    Source
    Automatische Indexierung zwischen Forschung und Anwendung, Hrsg.: G. Lustig
  3. Probst, M.; Mittelbach, J.: Maschinelle Indexierung in der Sacherschließung wissenschaftlicher Bibliotheken (2006) 0.03
    0.031244114 = product of:
      0.09373234 = sum of:
        0.07992805 = weight(_text_:forschung in 1755) [ClassicSimilarity], result of:
          0.07992805 = score(doc=1755,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.43000343 = fieldWeight in 1755, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0625 = fieldNorm(doc=1755)
        0.013804292 = product of:
          0.041412875 = sum of:
            0.041412875 = weight(_text_:22 in 1755) [ClassicSimilarity], result of:
              0.041412875 = score(doc=1755,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.30952093 = fieldWeight in 1755, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1755)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Date
    22. 3.2008 12:35:19
    Source
    Bibliothek: Forschung und Praxis. 30(2006) H.2, S.168-176
  4. Wang, S.; Koopman, R.: Embed first, then predict (2019) 0.03
    0.029132461 = product of:
      0.08739738 = sum of:
        0.07869138 = weight(_text_:propose in 5400) [ClassicSimilarity], result of:
          0.07869138 = score(doc=5400,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.40112838 = fieldWeight in 5400, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5400)
        0.008706 = product of:
          0.026117997 = sum of:
            0.026117997 = weight(_text_:29 in 5400) [ClassicSimilarity], result of:
              0.026117997 = score(doc=5400,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19432661 = fieldWeight in 5400, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5400)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Automatic subject prediction is a desirable feature for modern digital library systems, as manual indexing can no longer cope with the rapid growth of digital collections. It is also desirable to be able to identify a small set of entities (e.g., authors, citations, bibliographic records) which are most relevant to a query. This gets more difficult when the amount of data increases dramatically. Data sparsity and model scalability are the major challenges to solving this type of extreme multilabel classification problem automatically. In this paper, we propose to address this problem in two steps: we first embed different types of entities into the same semantic space, where similarity could be computed easily; second, we propose a novel non-parametric method to identify the most relevant entities in addition to direct semantic similarities. We show how effectively this approach predicts even very specialised subjects, which are associated with few documents in the training set and are more problematic for a classifier.
    Date
    29. 9.2019 12:18:42
  5. Renz, M.: Automatische Inhaltserschließung im Zeichen von Wissensmanagement (2001) 0.03
    0.0273386 = product of:
      0.0820158 = sum of:
        0.06993704 = weight(_text_:forschung in 5671) [ClassicSimilarity], result of:
          0.06993704 = score(doc=5671,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.376253 = fieldWeight in 5671, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0546875 = fieldNorm(doc=5671)
        0.0120787565 = product of:
          0.036236268 = sum of:
            0.036236268 = weight(_text_:22 in 5671) [ClassicSimilarity], result of:
              0.036236268 = score(doc=5671,freq=2.0), product of:
                0.13379669 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.038207654 = queryNorm
                0.2708308 = fieldWeight in 5671, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=5671)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Methoden der automatischen Inhaltserschließung werden seit mehr als 30 Jahren entwickelt, ohne in luD-Kreisen auf merkliche Akzeptanz zu stoßen. Gegenwärtig führen jedoch die steigende Informationsflut und der Bedarf an effizienten Zugriffsverfahren im Informations- und Wissensmanagement in breiten Anwenderkreisen zu einem wachsenden Interesse an diesen Methoden, zu verstärkten Anstrengungen in Forschung und Entwicklung und zu neuen Produkten. In diesem Beitrag werden verschiedene Ansätze zu intelligentem und inhaltsbasiertem Retrieval und zur automatischen Inhaltserschließung diskutiert sowie kommerziell vertriebene Softwarewerkzeuge und Lösungen präsentiert. Abschließend wird festgestellt, dass in naher Zukunft mit einer zunehmenden Automatisierung von bestimmten Komponenten des Informations- und Wissensmanagements zu rechnen ist, indem Software-Werkzeuge zur automatischen Inhaltserschließung in den Workflow integriert werden
    Date
    22. 3.2001 13:14:48
  6. Li, W.; Wong, K.-F.; Yuan, C.: Toward automatic Chinese temporal information extraction (2001) 0.02
    0.021449735 = product of:
      0.064349204 = sum of:
        0.055643205 = weight(_text_:propose in 6029) [ClassicSimilarity], result of:
          0.055643205 = score(doc=6029,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 6029, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=6029)
        0.008706 = product of:
          0.026117997 = sum of:
            0.026117997 = weight(_text_:29 in 6029) [ClassicSimilarity], result of:
              0.026117997 = score(doc=6029,freq=2.0), product of:
                0.13440257 = queryWeight, product of:
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.038207654 = queryNorm
                0.19432661 = fieldWeight in 6029, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5176873 = idf(docFreq=3565, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=6029)
          0.33333334 = coord(1/3)
      0.33333334 = coord(2/6)
    
    Abstract
    Over the past few years, temporal information processing and temporal database management have increasingly become hot topics. Nevertheless, only a few researchers have investigated these areas in the Chinese language. This lays down the objective of our research: to exploit Chinese language processing techniques for temporal information extraction and concept reasoning. In this article, we first study the mechanism for expressing time in Chinese. On the basis of the study, we then design a general frame structure for maintaining the extracted temporal concepts and propose a system for extracting time-dependent information from Hong Kong financial news. In the system, temporal knowledge is represented by different types of temporal concepts (TTC) and different temporal relations, including absolute and relative relations, which are used to correlate between action times and reference times. In analyzing a sentence, the algorithm first determines the situation related to the verb. This in turn will identify the type of temporal concept associated with the verb. After that, the relevant temporal information is extracted and the temporal relations are derived. These relations link relevant concept frames together in chronological order, which in turn provide the knowledge to fulfill users' queries, e.g., for question-answering (i.e., Q&A) applications
    Date
    29. 9.2001 14:02:50
  7. Biebricher, P.; Fuhr, N.; Niewelt, B.: ¬Der AIR-Retrievaltest (1986) 0.02
    0.016651679 = product of:
      0.099910066 = sum of:
        0.099910066 = weight(_text_:forschung in 4040) [ClassicSimilarity], result of:
          0.099910066 = score(doc=4040,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.5375043 = fieldWeight in 4040, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.078125 = fieldNorm(doc=4040)
      0.16666667 = coord(1/6)
    
    Source
    Automatische Indexierung zwischen Forschung und Anwendung, Hrsg.: G. Lustig
  8. Automatische Indexierung zwischen Forschung und Anwendung (1986) 0.02
    0.01648432 = product of:
      0.09890591 = sum of:
        0.09890591 = weight(_text_:forschung in 953) [ClassicSimilarity], result of:
          0.09890591 = score(doc=953,freq=4.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.5321021 = fieldWeight in 953, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0546875 = fieldNorm(doc=953)
      0.16666667 = coord(1/6)
    
    Abstract
    Die automatische Indexierung von Dokumenten für das Information Retrieval, d. h. die automatische Charakterisierung von Dokumentinhalten mittels Deskriptoren (Schlagwörtern) ist bereits seit über 25 Jahren ein Gebiet theoretischer und experimenteller Forschung. Dagegen wurde erst im Oktober 1985 mit der Anwendung der automatischen Indexierung in der Inputproduktion für ein großes Retrievalsystem begonnen. Es handelt sich um die Indexierung englischer Referatetexte für die Physik-Datenbasis des Informationszentrums Energie, Physik, Mathematik GmbH in Karlsruhe. In dem vorliegenden Buch beschreiben Mitarbeiter der Technischen Hochschule Darmstadt ihre Forschungs- und Entwicklungsarbeiten, die zu dieser Pilotanwendung geführt haben.
  9. Grummann, M.: Sind Verfahren zur maschinellen Indexierung für Literaturbestände Öffentlicher Bibliotheken geeignet? : Retrievaltests von indexierten ekz-Daten mit der Software IDX (2000) 0.01
    0.013321342 = product of:
      0.07992805 = sum of:
        0.07992805 = weight(_text_:forschung in 1879) [ClassicSimilarity], result of:
          0.07992805 = score(doc=1879,freq=2.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.43000343 = fieldWeight in 1879, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.0625 = fieldNorm(doc=1879)
      0.16666667 = coord(1/6)
    
    Source
    Bibliothek: Forschung und Praxis. 24(2000) H.3, S.297-318
  10. Yang, T.-H.; Hsieh, Y.-L.; Liu, S.-H.; Chang, Y.-C.; Hsu, W.-L.: ¬A flexible template generation and matching method with applications for publication reference metadata extraction (2021) 0.01
    0.01311523 = product of:
      0.07869138 = sum of:
        0.07869138 = weight(_text_:propose in 63) [ClassicSimilarity], result of:
          0.07869138 = score(doc=63,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.40112838 = fieldWeight in 63, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=63)
      0.16666667 = coord(1/6)
    
    Abstract
    Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.
  11. Gábor, K.; Zargayouna, H.; Tellier, I.; Buscaldi, D.; Charnois, T.: ¬A typology of semantic relations dedicated to scientific literature analysis (2016) 0.01
    0.012983414 = product of:
      0.077900484 = sum of:
        0.077900484 = weight(_text_:propose in 2933) [ClassicSimilarity], result of:
          0.077900484 = score(doc=2933,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3970968 = fieldWeight in 2933, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0546875 = fieldNorm(doc=2933)
      0.16666667 = coord(1/6)
    
    Abstract
    We propose a method for improving access to scientific literature by analyzing the content of research papers beyond citation links and topic tracking. Our model relies on a typology of explicit semantic relations. These relations are instantiated in the abstract/introduction part of the papers and can be identified automatically using textual data and external ontologies. Preliminary results show a promising precision in unsupervised relationship classification.
  12. Medelyan, O.; Witten, I.H.: Domain-independent automatic keyphrase indexing with small training sets (2008) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 1871) [ClassicSimilarity], result of:
          0.06677184 = score(doc=1871,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 1871, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=1871)
      0.16666667 = coord(1/6)
    
    Abstract
    Keyphrases are widely used in both physical and digital libraries as a brief, but precise, summary of documents. They help organize material based on content, provide thematic access, represent search results, and assist with navigation. Manual assignment is expensive because trained human indexers must reach an understanding of the document and select appropriate descriptors according to defined cataloging rules. We propose a new method that enhances automatic keyphrase extraction by using semantic information about terms and phrases gleaned from a domain-specific thesaurus. The key advantage of the new approach is that it performs well with very little training data. We evaluate it on a large set of manually indexed documents in the domain of agriculture, compare its consistency with a group of six professional indexers, and explore its performance on smaller collections of documents in other domains and of French and Spanish documents.
  13. Mansour, N.; Haraty, R.A.; Daher, W.; Houri, M.: ¬An auto-indexing method for Arabic text (2008) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 2103) [ClassicSimilarity], result of:
          0.06677184 = score(doc=2103,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 2103, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=2103)
      0.16666667 = coord(1/6)
    
    Abstract
    This work addresses the information retrieval problem of auto-indexing Arabic documents. Auto-indexing a text document refers to automatically extracting words that are suitable for building an index for the document. In this paper, we propose an auto-indexing method for Arabic text documents. This method is mainly based on morphological analysis and on a technique for assigning weights to words. The morphological analysis uses a number of grammatical rules to extract stem words that become candidate index words. The weight assignment technique computes weights for these words relative to the container document. The weight is based on how spread is the word in a document and not only on its rate of occurrence. The candidate index words are then sorted in descending order by weight so that information retrievers can select the more important index words. We empirically verify the usefulness of our method using several examples. For these examples, we obtained an average recall of 46% and an average precision of 64%.
  14. Snajder, J.; Dalbelo Basic, B.D.; Tadic, M.: Automatic acquisition of inflectional lexica for morphological normalisation (2008) 0.01
    0.011128641 = product of:
      0.06677184 = sum of:
        0.06677184 = weight(_text_:propose in 2910) [ClassicSimilarity], result of:
          0.06677184 = score(doc=2910,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3403687 = fieldWeight in 2910, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.046875 = fieldNorm(doc=2910)
      0.16666667 = coord(1/6)
    
    Abstract
    Due to natural language morphology, words can take on various morphological forms. Morphological normalisation - often used in information retrieval and text mining systems - conflates morphological variants of a word to a single representative form. In this paper, we describe an approach to lexicon-based inflectional normalisation. This approach is in between stemming and lemmatisation, and is suitable for morphological normalisation of inflectionally complex languages. To eliminate the immense effort required to compile the lexicon by hand, we focus on the problem of acquiring automatically an inflectional morphological lexicon from raw corpora. We propose a convenient and highly expressive morphology representation formalism on which the acquisition procedure is based. Our approach is applied to the morphologically complex Croatian language, but it should be equally applicable to other languages of similar morphological complexity. Experimental results show that our approach can be used to acquire a lexicon whose linguistic quality allows for rather good normalisation performance.
  15. Kajanan, S.; Bao, Y.; Datta, A.; VanderMeer, D.; Dutta, K.: Efficient automatic search query formulation using phrase-level analysis (2014) 0.01
    0.010492183 = product of:
      0.0629531 = sum of:
        0.0629531 = weight(_text_:propose in 1264) [ClassicSimilarity], result of:
          0.0629531 = score(doc=1264,freq=4.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.3209027 = fieldWeight in 1264, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.03125 = fieldNorm(doc=1264)
      0.16666667 = coord(1/6)
    
    Abstract
    Over the past decade, the volume of information available digitally over the Internet has grown enormously. Technical developments in the area of search, such as Google's Page Rank algorithm, have proved so good at serving relevant results that Internet search has become integrated into daily human activity. One can endlessly explore topics of interest simply by querying and reading through the resulting links. Yet, although search engines are well known for providing relevant results based on users' queries, users do not always receive the results they are looking for. Google's Director of Research describes clickstream evidence of frustrated users repeatedly reformulating queries and searching through page after page of results. Given the general quality of search engine results, one must consider the possibility that the frustrated user's query is not effective; that is, it does not describe the essence of the user's interest. Indeed, extensive research into human search behavior has found that humans are not very effective at formulating good search queries that describe what they are interested in. Ideally, the user should simply point to a portion of text that sparked the user's interest, and a system should automatically formulate a search query that captures the essence of the text. In this paper, we describe an implemented system that provides this capability. We first describe how our work differs from existing work in automatic query formulation, and propose a new method for improved quantification of the relevance of candidate search terms drawn from input text using phrase-level analysis. We then propose an implementable method designed to provide relevant queries based on a user's text input. We demonstrate the quality of our results and performance of our system through experimental studies. Our results demonstrate that our system produces relevant search terms with roughly two-thirds precision and recall compared to search terms selected by experts, and that typical users find significantly more relevant results (31% more relevant) more quickly (64% faster) using our system than self-formulated search queries. Further, we show that our implementation can scale to request loads of up to 10 requests per second within current online responsiveness expectations (<2-second response times at the highest loads tested).
  16. Kasprzik, A.: Aufbau eines produktiven Dienstes für die automatisierte Inhaltserschließung an der ZBW : ein Status- und Erfahrungsbericht. (2023) 0.01
    0.009419611 = product of:
      0.056517664 = sum of:
        0.056517664 = weight(_text_:forschung in 935) [ClassicSimilarity], result of:
          0.056517664 = score(doc=935,freq=4.0), product of:
            0.1858777 = queryWeight, product of:
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.038207654 = queryNorm
            0.30405834 = fieldWeight in 935, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.8649335 = idf(docFreq=926, maxDocs=44218)
              0.03125 = fieldNorm(doc=935)
      0.16666667 = coord(1/6)
    
    Abstract
    Die ZBW - Leibniz-Informationszentrum Wirtschaft betreibt seit 2016 eigene angewandte Forschung im Bereich Machine Learning mit dem Zweck, praktikable Lösungen für eine automatisierte oder maschinell unterstützte Inhaltserschließung zu entwickeln. 2020 begann ein Team an der ZBW die Konzeption und Implementierung einer Softwarearchitektur, die es ermöglichte, diese prototypischen Lösungen in einen produktiven Dienst zu überführen und mit den bestehenden Nachweis- und Informationssystemen zu verzahnen. Sowohl die angewandte Forschung als auch die für dieses Vorhaben ("AutoSE") notwendige Softwareentwicklung sind direkt im Bibliotheksbereich der ZBW angesiedelt, werden kontinuierlich anhand des State of the Art vorangetrieben und profitieren von einem engen Austausch mit den Verantwortlichen für die intellektuelle Inhaltserschließung. Dieser Beitrag zeigt die Meilensteine auf, die das AutoSE-Team in zwei Jahren in Bezug auf den Aufbau und die Integration der Software erreicht hat, und skizziert, welche bis zum Ende der Pilotphase (2024) noch ausstehen. Die Architektur basiert auf Open-Source-Software und die eingesetzten Machine-Learning-Komponenten werden im Rahmen einer internationalen Zusammenarbeit im engen Austausch mit der Finnischen Nationalbibliothek (NLF) weiterentwickelt und zur Nachnutzung in dem von der NLF entwickelten Open-Source-Werkzeugkasten Annif aufbereitet. Das Betriebsmodell des AutoSE-Dienstes sieht regelmäßige Überprüfungen sowohl einzelner Komponenten als auch des Produktionsworkflows als Ganzes vor und erlaubt eine fortlaufende Weiterentwicklung der Architektur. Eines der Ergebnisse, das bis zum Ende der Pilotphase vorliegen soll, ist die Dokumentation der Anforderungen an einen dauerhaften produktiven Betrieb des Dienstes, damit die Ressourcen dafür im Rahmen eines tragfähigen Modells langfristig gesichert werden können. Aus diesem Praxisbeispiel lässt sich ableiten, welche Bedingungen gegeben sein müssen, um Machine-Learning-Lösungen wie die in Annif enthaltenen erfolgreich an einer Institution für die Inhaltserschließung einsetzen zu können.
  17. Martins, E.F.; Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: On cold start for associative tag recommendation (2016) 0.01
    0.009273868 = product of:
      0.055643205 = sum of:
        0.055643205 = weight(_text_:propose in 2494) [ClassicSimilarity], result of:
          0.055643205 = score(doc=2494,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 2494, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2494)
      0.16666667 = coord(1/6)
    
    Abstract
    Tag recommendation strategies that exploit term co-occurrence patterns with tags previously assigned to the target object have consistently produced state-of-the-art results. However, such techniques work only for objects with previously assigned tags. Here we focus on tag recommendation for objects with no tags, a variation of the well-known \textit{cold start} problem. We start by evaluating state-of-the-art co-occurrence based methods in cold start. Our results show that the effectiveness of these methods suffers in this situation. Moreover, we show that employing various automatic filtering strategies to generate an initial tag set that enables the use of co-occurrence patterns produces only marginal improvements. We then propose a new approach that exploits both positive and negative user feedback to iteratively select input tags along with a genetic programming strategy to learn the recommendation function. Our experimental results indicate that extending the methods to include user relevance feedback leads to gains in precision of up to 58% over the best baseline in cold start scenarios and gains of up to 43% over the best baseline in objects that contain some initial tags (i.e., no cold start). We also show that our best relevance-feedback-driven strategy performs well even in scenarios that lack user cooperation (i.e., users may refuse to provide feedback) and user reliability (i.e., users may provide the wrong feedback).
  18. Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.01
    0.009273868 = product of:
      0.055643205 = sum of:
        0.055643205 = weight(_text_:propose in 3810) [ClassicSimilarity], result of:
          0.055643205 = score(doc=3810,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 3810, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3810)
      0.16666667 = coord(1/6)
    
    Abstract
    Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.
  19. Toepfer, M.; Seifert, C.: Content-based quality estimation for automatic subject indexing of short texts under precision and recall constraints 0.01
    0.009273868 = product of:
      0.055643205 = sum of:
        0.055643205 = weight(_text_:propose in 4309) [ClassicSimilarity], result of:
          0.055643205 = score(doc=4309,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 4309, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4309)
      0.16666667 = coord(1/6)
    
    Abstract
    Semantic annotations have to satisfy quality constraints to be useful for digital libraries, which is particularly challenging on large and diverse datasets. Confidence scores of multi-label classification methods typically refer only to the relevance of particular subjects, disregarding indicators of insufficient content representation at the document-level. Therefore, we propose a novel approach that detects documents rather than concepts where quality criteria are met. Our approach uses a deep, multi-layered regression architecture, which comprises a variety of content-based indicators. We evaluated multiple configurations using text collections from law and economics, where the available content is restricted to very short texts. Notably, we demonstrate that the proposed quality estimation technique can determine subsets of the previously unseen data where considerable gains in document-level recall can be achieved, while upholding precision at the same time. Hence, the approach effectively performs a filtering that ensures high data quality standards in operative information retrieval systems.
  20. Zhang, Y.; Zhang, C.; Li, J.: Joint modeling of characters, words, and conversation contexts for microblog keyphrase extraction (2020) 0.01
    0.009273868 = product of:
      0.055643205 = sum of:
        0.055643205 = weight(_text_:propose in 5816) [ClassicSimilarity], result of:
          0.055643205 = score(doc=5816,freq=2.0), product of:
            0.19617504 = queryWeight, product of:
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.038207654 = queryNorm
            0.2836406 = fieldWeight in 5816, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.1344433 = idf(docFreq=707, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5816)
      0.16666667 = coord(1/6)
    
    Abstract
    Millions of messages are produced on microblog platforms every day, leading to the pressing need for automatic identification of key points from the massive texts. To absorb salient content from the vast bulk of microblog posts, this article focuses on the task of microblog keyphrase extraction. In previous work, most efforts treat messages as independent documents and might suffer from the data sparsity problem exhibited in short and informal microblog posts. On the contrary, we propose to enrich contexts via exploiting conversations initialized by target posts and formed by their replies, which are generally centered around relevant topics to the target posts and therefore helpful for keyphrase identification. Concretely, we present a neural keyphrase extraction framework, which has 2 modules: a conversation context encoder and a keyphrase tagger. The conversation context encoder captures indicative representation from their conversation contexts and feeds the representation into the keyphrase tagger, and the keyphrase tagger extracts salient words from target posts. The 2 modules were trained jointly to optimize the conversation context encoding and keyphrase extraction processes. In the conversation context encoder, we leverage hierarchical structures to capture the word-level indicative representation and message-level indicative representation hierarchically. In both of the modules, we apply character-level representations, which enables the model to explore morphological features and deal with the out-of-vocabulary problem caused by the informal language style of microblog messages. Extensive comparison results on real-life data sets indicate that our model outperforms state-of-the-art models from previous studies.

Years

Languages

  • e 43
  • d 35
  • nl 2
  • ja 1
  • ru 1
  • More… Less…

Types

  • a 71
  • el 8
  • x 5
  • m 3
  • d 1
  • p 1
  • s 1
  • More… Less…