Search (304 results, page 16 of 16)

MacDougall, S.: Rethinking indexing : the impact of the Internet (1996) 0.00
```
1.7658525E-4 = product of:
  0.003531705 = sum of:
    0.003531705 = weight(_text_:in in 704) [ClassicSimilarity], result of:
      0.003531705 = score(doc=704,freq=2.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.09017298 = fieldWeight in 704, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=704)
  0.05 = coord(1/20)
```
Abstract

Considers the challenge to professional indexers posed by the Internet. Indexing and searching on the Internet appears to have a retrograde step, as well developed and efficient information retrieval techniques have been replaced by cruder techniques, involving automatic keyword indexing and frequency ranking, leading to large retrieval sets and low precision. This is made worse by the apparent acceptance of this poor perfromance by Internet users and the feeling, on the part of indexers, that they are being bypassed by the producers of these hyperlinked menus and search engines. Key issues are: how far 'human' indexing will still be required in the Internet environment; how indexing techniques will have to change to stay relevant; and the future role of indexers. The challenge facing indexers is to adapt their skills to suit the online environment and to convince publishers of the need for efficient indexes on the Internet
Hmeidi, I.; Kanaan, G.; Evens, M.: Design and implementation of automatic indexing for information retrieval with Arabic documents (1997) 0.00
```
1.7658525E-4 = product of:
  0.003531705 = sum of:
    0.003531705 = weight(_text_:in in 1660) [ClassicSimilarity], result of:
      0.003531705 = score(doc=1660,freq=2.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.09017298 = fieldWeight in 1660, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.046875 = fieldNorm(doc=1660)
  0.05 = coord(1/20)
```
Abstract

A corpus of 242 abstracts of Arabic documents on computer science and information systems using the Proceedings of the Saudi Arabian National Conferences as a source was put together. Reports on the design and building of an automatic information retrieval system from scratch to handle Arabic data. Both automatic and manual indexing techniques were implemented. Experiments using measures of recall and precision has demonstrated that automatic indexing is at least as effective as manual indexing and more effective in some cases. Automatic indexing is both cheaper and faster. Results suggests that a wider coverage of the literature can be achieved with less money and produce as good results as with manual indexing. Compares the retrieval results using words as index terms versus stems and roots, and confirms the results obtained by Al-Kharashi and Abu-Salem with smaller corpora that root indexing is more effective than word indexing
Yang, T.-H.; Hsieh, Y.-L.; Liu, S.-H.; Chang, Y.-C.; Hsu, W.-L.: ¬A flexible template generation and matching method with applications for publication reference metadata extraction (2021) 0.00
```
1.4715438E-4 = product of:
  0.0029430876 = sum of:
    0.0029430876 = weight(_text_:in in 63) [ClassicSimilarity], result of:
      0.0029430876 = score(doc=63,freq=2.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.07514416 = fieldWeight in 63, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=63)
  0.05 = coord(1/20)
```
Abstract

Conventional rule-based approaches use exact template matching to capture linguistic information and necessarily need to enumerate all variations. We propose a novel flexible template generation and matching scheme called the principle-based approach (PBA) based on sequence alignment, and employ it for reference metadata extraction (RME) to demonstrate its effectiveness. The main contributions of this research are threefold. First, we propose an automatic template generation that can capture prominent patterns using the dominating set algorithm. Second, we devise an alignment-based template-matching technique that uses a logistic regression model, which makes it more general and flexible than pure rule-based approaches. Last, we apply PBA to RME on extensive cross-domain corpora and demonstrate its robustness and generality. Experiments reveal that the same set of templates produced by the PBA framework not only deliver consistent performance on various unseen domains, but also surpass hand-crafted knowledge (templates). We use four independent journal style test sets and one conference style test set in the experiments. When compared to renowned machine learning methods, such as conditional random fields (CRF), as well as recent deep learning methods (i.e., bi-directional long short-term memory with a CRF layer, Bi-LSTM-CRF), PBA has the best performance for all datasets.
Ahmed, M.: Automatic indexing for agriculture : designing a framework by deploying Agrovoc, Agris and Annif (2023) 0.00
```
1.4715438E-4 = product of:
  0.0029430876 = sum of:
    0.0029430876 = weight(_text_:in in 1024) [ClassicSimilarity], result of:
      0.0029430876 = score(doc=1024,freq=2.0), product of:
        0.039165888 = queryWeight, product of:
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.02879306 = queryNorm
        0.07514416 = fieldWeight in 1024, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.3602545 = idf(docFreq=30841, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1024)
  0.05 = coord(1/20)
```
Abstract

There are several ways to employ machine learning for automating subject indexing. One popular strategy is to utilize a supervised learning algorithm to train a model on a set of documents that have been manually indexed by subject matter using a standard vocabulary. The resulting model can then predict the subject of new and previously unseen documents by identifying patterns learned from the training data. To do this, the first step is to gather a large dataset of documents and manually assign each document a set of subject keywords/descriptors from a controlled vocabulary (e.g., from Agrovoc). Next, the dataset (obtained from Agris) can be divided into - i) a training dataset, and ii) a test dataset. The training dataset is used to train the model, while the test dataset is used to evaluate the model's performance. Machine learning can be a powerful tool for automating the process of subject indexing. This research is an attempt to apply Annif (http://annif. org/), an open-source AI/ML framework, to autogenerate subject keywords/descriptors for documentary resources in the domain of agriculture. The training dataset is obtained from Agris, which applies the Agrovoc thesaurus as a vocabulary tool (https://www.fao.org/agris/download).

Search (304 results, page 16 of 16)

Authors

Years

Languages

Themes