Search (86 results, page 5 of 5)

Fernández-Reyes, F.C.; Hermosillo-Valadez, J.; Montes-y-Gómez, M.: ¬A prospect-guided global query expansion strategy using word embeddings (2018) 0.00
```
0.0022056228 = product of:
  0.006616868 = sum of:
    0.006616868 = product of:
      0.013233736 = sum of:
        0.013233736 = weight(_text_:of in 5090) [ClassicSimilarity], result of:
          0.013233736 = score(doc=5090,freq=10.0), product of:
            0.06850986 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043811057 = queryNorm
            0.19316542 = fieldWeight in 5090, product of:
              3.1622777 = tf(freq=10.0), with freq of:
                10.0 = termFreq=10.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5090)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

The effectiveness of query expansion methods depends essentially on identifying good candidates, or prospects, semantically related to query terms. Word embeddings have been used recently in an attempt to address this problem. Nevertheless query disambiguation is still necessary as the semantic relatedness of each word in the corpus is modeled, but choosing the right terms for expansion from the standpoint of the un-modeled query semantics remains an open issue. In this paper we propose a novel query expansion method using word embeddings that models the global query semantics from the standpoint of prospect vocabulary terms. The proposed method allows to explore query-vocabulary semantic closeness in such a way that new terms, semantically related to more relevant topics, are elicited and added in function of the query as a whole. The method includes candidates pooling strategies that address disambiguation issues without using exogenous resources. We tested our method with three topic sets over CLEF corpora and compared it across different Information Retrieval models and against another expansion technique using word embeddings as well. Our experiments indicate that our method achieves significant results that outperform the baselines, improving both recall and precision metrics without relevance feedback.
Surfing versus Drilling for knowledge in science : When should you use your computer? When should you use your brain? (2018) 0.00
```
0.0020877826 = product of:
  0.0062633473 = sum of:
    0.0062633473 = product of:
      0.012526695 = sum of:
        0.012526695 = weight(_text_:of in 4564) [ClassicSimilarity], result of:
          0.012526695 = score(doc=4564,freq=14.0), product of:
            0.06850986 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043811057 = queryNorm
            0.18284513 = fieldWeight in 4564, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.03125 = fieldNorm(doc=4564)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

For this second Special Issue of Infozine, we have invited students, teachers, researchers, and software developers to share their opinions about one or the other aspect of this broad topic: how to balance drilling (for depth) vs. surfing (for breadth) in scientific learning, teaching, research, and software design - and how the modern digital-liberal system affects our ability to strike this balance. This special issue is meant to provide a wide and unbiased spectrum of possible viewpoints on the topic, helping readers to define lucidly their own position and information use behavior.

Content

Editorial: Surfing versus Drilling for Knowledge in Science: When should you use your computer? When should you use your brain? Blaise Pascal: Les deux infinis - The two infinities / Philippe Hünenberger and Oliver Renn - "Surfing" vs. "drilling" in the modern scientific world / Antonio Loprieno - Of millimeter paper and machine learning / Philippe Hünenberger - From one to many, from breadth to depth - industrializing research / Janne Soetbeer - "Deep drilling" requires "surfing" / Gerd Folkers and Laura Folkers - Surfing vs. drilling in science: A delicate balance / Alzbeta Kubincová - Digital trends in academia - for the sake of critical thinking or comfort? / Leif-Thore Deck - I diagnose, therefore I am a Doctor? Will drilling computer software replace human doctors in the future? / Yi Zheng - Surfing versus drilling in fundamental research / Wilfred van Gunsteren - Using brain vs. brute force in computational studies of biological systems / Arieh Warshel - Laboratory literature boards in the digital age / Jeffrey Bode - Research strategies in computational chemistry / Sereina Riniker - Surfing on the hype waves or drilling deep for knowledge? A perspective from industry / Nadine Schneider and Nikolaus Stiefl - The use and purpose of articles and scientists / Philip Mark Lund - Can you look at papers like artwork? / Oliver Renn - Dynamite fishing in the data swamp / Frank Perabo 34 Streetlights, augmented intelligence, and information discovery / Jeffrey Saffer and Vicki Burnett - "Yes Dave. Happy to do that for you." Why AI, machine learning, and blockchain will lead to deeper "drilling" / Michiel Kolman and Sjors de Heuvel - Trends in scientific document search ( Stefan Geißler - Power tools for text mining / Jane Reed 42 Publishing and patenting: Navigating the differences to ensure search success / Paul Peters
Ma, N.; Zheng, H.T.; Xiao, X.: ¬An ontology-based latent semantic indexing approach using long short-term memory networks (2017) 0.00
```
0.001972769 = product of:
  0.0059183068 = sum of:
    0.0059183068 = product of:
      0.0118366135 = sum of:
        0.0118366135 = weight(_text_:of in 3810) [ClassicSimilarity], result of:
          0.0118366135 = score(doc=3810,freq=8.0), product of:
            0.06850986 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043811057 = queryNorm
            0.17277241 = fieldWeight in 3810, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3810)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Nowadays, online data shows an astonishing increase and the issue of semantic indexing remains an open question. Ontologies and knowledge bases have been widely used to optimize performance. However, researchers are placing increased emphasis on internal relations of ontologies but neglect latent semantic relations between ontologies and documents. They generally annotate instances mentioned in documents, which are related to concepts in ontologies. In this paper, we propose an Ontology-based Latent Semantic Indexing approach utilizing Long Short-Term Memory networks (LSTM-OLSI). We utilize an importance-aware topic model to extract document-level semantic features and leverage ontologies to extract word-level contextual features. Then we encode the above two levels of features and match their embedding vectors utilizing LSTM networks. Finally, the experimental results reveal that LSTM-OLSI outperforms existing techniques and demonstrates deep comprehension of instances and articles.
Huang, L.; Milne, D.; Frank, E.; Witten, I.H.: Learning a concept-based document similarity measure (2012) 0.00
```
0.0016739499 = product of:
  0.0050218496 = sum of:
    0.0050218496 = product of:
      0.010043699 = sum of:
        0.010043699 = weight(_text_:of in 372) [ClassicSimilarity], result of:
          0.010043699 = score(doc=372,freq=4.0), product of:
            0.06850986 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043811057 = queryNorm
            0.14660224 = fieldWeight in 372, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=372)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

Document similarity measures are crucial components of many text-analysis tasks, including information retrieval, document classification, and document clustering. Conventional measures are brittle: They estimate the surface overlap between documents based on the words they mention and ignore deeper semantic connections. We propose a new measure that assesses similarity at both the lexical and semantic levels, and learns from human judgments how to combine them by using machine-learning techniques. Experiments show that the new measure produces values for documents that are more consistent with people's judgments than people are with each other. We also use it to classify and cluster large document sets covering different genres and topics, and find that it improves both classification and clustering performance.

Source

Journal of the American Society for Information Science and Technology. 63(2012) no.8, S.1593-1608
Zhang, W.; Yoshida, T.; Tang, X.: ¬A comparative study of TF*IDF, LSI and multi-words for text classification (2011) 0.00
```
0.0016739499 = product of:
  0.0050218496 = sum of:
    0.0050218496 = product of:
      0.010043699 = sum of:
        0.010043699 = weight(_text_:of in 1165) [ClassicSimilarity], result of:
          0.010043699 = score(doc=1165,freq=4.0), product of:
            0.06850986 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043811057 = queryNorm
            0.14660224 = fieldWeight in 1165, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.046875 = fieldNorm(doc=1165)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)
```
Abstract

One of the main themes in text mining is text representation, which is fundamental and indispensable for text-based intellegent information processing. Generally, text representation inludes two tasks: indexing and weighting. This paper has comparatively studied TF*IDF, LSI and multi-word for text representation. We used a Chinese and an English document collection to respectively evaluate the three methods in information retreival and text categorization. Experimental results have demonstrated that in text categorization, LSI has better performance than other methods in both document collections. Also, LSI has produced the best performance in retrieving English documents. This outcome has shown that LSI has both favorable semantic and statistical quality and is different with the claim that LSI can not produce discriminative power for indexing.

Renker, L.: Exploration von Textkorpora : Topic Models als Grundlage der Interaktion (2015) 0.00

9.863845E-4 = product of:
  0.0029591534 = sum of:
    0.0029591534 = product of:
      0.0059183068 = sum of:
        0.0059183068 = weight(_text_:of in 2380) [ClassicSimilarity], result of:
          0.0059183068 = score(doc=2380,freq=2.0), product of:
            0.06850986 = queryWeight, product of:
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.043811057 = queryNorm
            0.086386204 = fieldWeight in 2380, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              1.5637573 = idf(docFreq=25162, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2380)
      0.5 = coord(1/2)
  0.33333334 = coord(1/3)

Footnote: Masterthesis zur Erlangung des akademischen Grades Master of Science (M.Sc.) vorgelegt an der Fachhochschule Köln / Fakultät für Informatik und Ingenieurswissenschaften im Studiengang Medieninformatik.

Search (86 results, page 5 of 5)

Authors

Languages

Types

Themes

Subjects

Classifications