Search (31 results, page 1 of 2)

Huo, W.: Automatic multi-word term extraction and its application to Web-page summarization (2012) 0.41

0.41123265 = product of:
  0.8224653 = sum of:
    0.15282746 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.15282746 = score(doc=563,freq=2.0), product of:
        0.27192625 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0320743 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.15282746 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.15282746 = score(doc=563,freq=2.0), product of:
        0.27192625 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0320743 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.15282746 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.15282746 = score(doc=563,freq=2.0), product of:
        0.27192625 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0320743 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.15282746 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.15282746 = score(doc=563,freq=2.0), product of:
        0.27192625 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0320743 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.04529116 = weight(_text_:web in 563) [ClassicSimilarity], result of:
      0.04529116 = score(doc=563,freq=8.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.43268442 = fieldWeight in 563, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.15282746 = weight(_text_:2f in 563) [ClassicSimilarity], result of:
      0.15282746 = score(doc=563,freq=2.0), product of:
        0.27192625 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0320743 = queryNorm
        0.56201804 = fieldWeight in 563, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.046875 = fieldNorm(doc=563)
    0.013036874 = product of:
      0.026073748 = sum of:
        0.026073748 = weight(_text_:22 in 563) [ClassicSimilarity], result of:
          0.026073748 = score(doc=563,freq=2.0), product of:
            0.11231873 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0320743 = queryNorm
            0.23214069 = fieldWeight in 563, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.046875 = fieldNorm(doc=563)
      0.5 = coord(1/2)
  0.5 = coord(7/14)

Abstract: In this thesis we propose three new word association measures for multi-word term extraction. We combine these association measures with LocalMaxs algorithm in our extraction model and compare the results of different multi-word term extraction methods. Our approach is language and domain independent and requires no training data. It can be applied to such tasks as text summarization, information retrieval, and document classification. We further explore the potential of using multi-word terms as an effective representation for general web-page summarization. We extract multi-word terms from human written summaries in a large collection of web-pages, and generate the summaries by aligning document words with these multi-word terms. Our system applies machine translation technology to learn the aligning process from a training set and focuses on selecting high quality multi-word terms from human written summaries to generate suitable results for web-page summarization.
Content: A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science. Vgl. Unter: http://www.inf.ufrgs.br%2F~ceramisch%2Fdownload_files%2Fpublications%2F2009%2Fp01.pdf.
Date: 10. 1.2013 19:22:47

Reyes Ayala, B.; Knudson, R.; Chen, J.; Cao, G.; Wang, X.: Metadata records machine translation combining multi-engine outputs with limited parallel data (2018) 0.01
```
0.011354187 = product of:
  0.0794793 = sum of:
    0.035931468 = weight(_text_:open in 4010) [ClassicSimilarity], result of:
      0.035931468 = score(doc=4010,freq=2.0), product of:
        0.14443703 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0320743 = queryNorm
        0.24876907 = fieldWeight in 4010, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4010)
    0.04354783 = weight(_text_:source in 4010) [ClassicSimilarity], result of:
      0.04354783 = score(doc=4010,freq=2.0), product of:
        0.15900996 = queryWeight, product of:
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0320743 = queryNorm
        0.27386856 = fieldWeight in 4010, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4010)
  0.14285715 = coord(2/14)
```
Abstract

One way to facilitate Multilingual Information Access (MLIA) for digital libraries is to generate multilingual metadata records by applying Machine Translation (MT) techniques. Current online MT services are available and affordable, but are not always effective for creating multilingual metadata records. In this study, we implemented 3 different MT strategies and evaluated their performance when translating English metadata records to Chinese and Spanish. These strategies included combining MT results from 3 online MT systems (Google, Bing, and Yahoo!) with and without additional linguistic resources, such as manually-generated parallel corpora, and metadata records in the two target languages obtained from international partners. The open-source statistical MT platform Moses was applied to design and implement the three translation strategies. Human evaluation of the MT results using adequacy and fluency demonstrated that two of the strategies produced higher quality translations than individual online MT systems for both languages. Especially, adding small, manually-generated parallel corpora of metadata records significantly improved translation performance. Our study suggested an effective and efficient MT approach for providing multilingual services for digital collections.

Menge-Sonnentag, R.: Google veröffentlicht einen Parser für natürliche Sprache (2016) 0.01

0.009083348 = product of:
  0.06358343 = sum of:
    0.028745173 = weight(_text_:open in 2941) [ClassicSimilarity], result of:
      0.028745173 = score(doc=2941,freq=2.0), product of:
        0.14443703 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0320743 = queryNorm
        0.19901526 = fieldWeight in 2941, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03125 = fieldNorm(doc=2941)
    0.034838263 = weight(_text_:source in 2941) [ClassicSimilarity], result of:
      0.034838263 = score(doc=2941,freq=2.0), product of:
        0.15900996 = queryWeight, product of:
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0320743 = queryNorm
        0.21909484 = fieldWeight in 2941, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.03125 = fieldNorm(doc=2941)
  0.14285715 = coord(2/14)

Abstract: SyntaxNet zerlegt Sätze in ihre grammatikalischen Bestandteile und bestimmt die syntaktischen Beziehungen der Wörter untereinander. Das Framework ist Open Source und als TensorFlow Model implementiert. Ein Parser für natürliche Sprache ist eine Software, die Sätze in ihre grammatikalischen Bestandteile zerlegt. Diese Zerlegung ist notwendig, damit Computer Befehle verstehen oder Texte übersetzen können. Die digitalen Helfer wie Microsofts Cortana, Apples Siri und Google Now verwenden Parser, um Sätze wie "Stell den Wecker auf 5 Uhr!" richtig umzusetzen. SyntaxNet ist ein solcher Parser, den Google als TensorFlow Model veröffentlicht hat. Entwickler können eigene Modelle erstellen, und SnytaxNet bringt einen vortrainierten Parser für die englische Sprache mit, den seine Macher Parsey McParseface genannt haben.

Muneer, I.; Sharjeel, M.; Iqbal, M.; Adeel Nawab, R.M.; Rayson, P.: CLEU - A Cross-language english-urdu corpus and benchmark for text reuse experiments (2019) 0.01
```
0.008917022 = product of:
  0.062419146 = sum of:
    0.04354783 = weight(_text_:source in 5299) [ClassicSimilarity], result of:
      0.04354783 = score(doc=5299,freq=2.0), product of:
        0.15900996 = queryWeight, product of:
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0320743 = queryNorm
        0.27386856 = fieldWeight in 5299, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5299)
    0.018871317 = weight(_text_:web in 5299) [ClassicSimilarity], result of:
      0.018871317 = score(doc=5299,freq=2.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.18028519 = fieldWeight in 5299, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5299)
  0.14285715 = coord(2/14)
```
Abstract

Text reuse is becoming a serious issue in many fields and research shows that it is much harder to detect when it occurs across languages. The recent rise in multi-lingual content on the Web has increased cross-language text reuse to an unprecedented scale. Although researchers have proposed methods to detect it, one major drawback is the unavailability of large-scale gold standard evaluation resources built on real cases. To overcome this problem, we propose a cross-language sentence/passage level text reuse corpus for the English-Urdu language pair. The Cross-Language English-Urdu Corpus (CLEU) has source text in English whereas the derived text is in Urdu. It contains in total 3,235 sentence/passage pairs manually tagged into three categories that is near copy, paraphrased copy, and independently written. Further, as a second contribution, we evaluate the Translation plus Mono-lingual Analysis method using three sets of experiments on the proposed dataset to highlight its usefulness. Evaluation results (f1=0.732 binary, f1=0.552 ternary classification) indicate that it is harder to detect cross-language real cases of text reuse, especially when the language pairs have unrelated scripts. The corpus is a useful benchmark resource for the future development and assessment of cross-language text reuse detection systems for the English-Urdu language pair.
Belbachir, F.; Boughanem, M.: Using language models to improve opinion detection (2018) 0.01
```
0.0062631755 = product of:
  0.043842226 = sum of:
    0.028745173 = weight(_text_:open in 5044) [ClassicSimilarity], result of:
      0.028745173 = score(doc=5044,freq=2.0), product of:
        0.14443703 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0320743 = queryNorm
        0.19901526 = fieldWeight in 5044, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03125 = fieldNorm(doc=5044)
    0.015097054 = weight(_text_:web in 5044) [ClassicSimilarity], result of:
      0.015097054 = score(doc=5044,freq=2.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.14422815 = fieldWeight in 5044, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.03125 = fieldNorm(doc=5044)
  0.14285715 = coord(2/14)
```
Abstract

Opinion mining is one of the most important research tasks in the information retrieval research community. With the huge volume of opinionated data available on the Web, approaches must be developed to differentiate opinion from fact. In this paper, we present a lexicon-based approach for opinion retrieval. Generally, opinion retrieval consists of two stages: relevance to the query and opinion detection. In our work, we focus on the second state which itself focusses on detecting opinionated documents . We compare the document to be analyzed with opinionated sources that contain subjective information. We hypothesize that a document with a strong similarity to opinionated sources is more likely to be opinionated itself. Typical lexicon-based approaches treat and choose their opinion sources according to their test collection, then calculate the opinion score based on the frequency of subjective terms in the document. In our work, we use different open opinion collections without any specific treatment and consider them as a reference collection. We then use language models to determine opinion scores. The analysis document and reference collection are represented by different language models (i.e., Dirichlet, Jelinek-Mercer and two-stage models). These language models are generally used in information retrieval to represent the relationship between documents and queries. However, in our study, we modify these language models to represent opinionated documents. We carry out several experiments using Text REtrieval Conference (TREC) Blogs 06 as our analysis collection and Internet Movie Data Bases (IMDB), Multi-Perspective Question Answering (MPQA) and CHESLY as our reference collection. To improve opinion detection, we study the impact of using different language models to represent the document and reference collection alongside different combinations of opinion and retrieval scores. We then use this data to deduce the best opinion detection models. Using the best models, our approach improves on the best baseline of TREC Blog (baseline4) by 30%.
Ko, Y.: ¬A new term-weighting scheme for text classification using the odds of positive and negative class probabilities (2015) 0.01
```
0.0062375567 = product of:
  0.08732579 = sum of:
    0.08732579 = weight(_text_:log in 2339) [ClassicSimilarity], result of:
      0.08732579 = score(doc=2339,freq=2.0), product of:
        0.205552 = queryWeight, product of:
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.0320743 = queryNorm
        0.42483553 = fieldWeight in 2339, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.4086204 = idf(docFreq=197, maxDocs=44218)
          0.046875 = fieldNorm(doc=2339)
  0.071428575 = coord(1/14)
```
Abstract

Text classification (TC) is a core technique for text mining and information retrieval. It has been applied to many applications in many different research and industrial areas. Term-weighting schemes assign an appropriate weight to each term to obtain a high TC performance. Although term weighting is one of the important modules for TC and TC has different peculiarities from those in information retrieval, many term-weighting schemes used in information retrieval, such as term frequency-inverse document frequency (tf-idf), have been used in TC in the same manner. The peculiarity of TC that differs most from information retrieval is the existence of class information. This article proposes a new term-weighting scheme that uses class information using positive and negative class distributions. As a result, the proposed scheme, log tf-TRR, consistently performs better than do other schemes using class information as well as traditional schemes such as tf-idf.
Endres-Niggemeyer, B.: Thinkie: Lautes Denken mit Spracherkennung (mobil) (2013) 0.00
```
0.004939476 = product of:
  0.06915266 = sum of:
    0.06915266 = weight(_text_:benutzer in 1145) [ClassicSimilarity], result of:
      0.06915266 = score(doc=1145,freq=2.0), product of:
        0.18291734 = queryWeight, product of:
          5.7029257 = idf(docFreq=400, maxDocs=44218)
          0.0320743 = queryNorm
        0.37805414 = fieldWeight in 1145, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          5.7029257 = idf(docFreq=400, maxDocs=44218)
          0.046875 = fieldNorm(doc=1145)
  0.071428575 = coord(1/14)
```
Abstract

Lautes Denken ist eine bewährte Methode zum Erforschen kognitiver Prozesse. Es wird in vielen Disziplinen benutzt, z. B. um aufzudecken, welche Erfahrungen Benutzer bei der Interaktion mit Computerschnittstellen machen. Nach einer kurzen Erklärung des Lauten Denkens wird die App Thinkie vorgestellt. Thinkie ist eine mobile Lösung für das Laute Denken auf iPhone und iPad. Die Testperson nimmt auf dem iPhone den Ton auf. Die Spracherkennungssoftware Siri (http://www.apple.com/de/ios/siri/) transkribiert ihn. Parallel wird auf dem iPad oder einem anderen Gerät gefilmt. Auf dem iPad kann man - mit Video im Blick - das Transkript aufarbeiten und interpretieren. Die Textdateien transportiert Thinkie über eine Cloud-Kollektion, die Filme werden mit iTunes übertragen. Thinkie ist noch nicht tauglich für den praktischen Gebrauch. Noch sind die Sequenzen zu kurz, die Siri verarbeiten kann. Das wird sich ändern.

Rettinger, A.; Schumilin, A.; Thoma, S.; Ell, B.: Learning a cross-lingual semantic representation of relations expressed in text (2015) 0.00

0.0046694404 = product of:
  0.06537216 = sum of:
    0.06537216 = weight(_text_:web in 2027) [ClassicSimilarity], result of:
      0.06537216 = score(doc=2027,freq=6.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.6245262 = fieldWeight in 2027, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.078125 = fieldNorm(doc=2027)
  0.071428575 = coord(1/14)

Series: Information Systems and Applications, incl. Internet/Web, and HCI; Bd. 9088
Source: The Semantic Web: latest advances and new domains. 12th European Semantic Web Conference, ESWC 2015 Portoroz, Slovenia, May 31 -- June 4, 2015. Proceedings. Eds.: F. Gandon u.a

Collovini de Abreu, S.; Vieira, R.: RelP: Portuguese open relation extraction (2017) 0.00
```
0.0044453666 = product of:
  0.06223513 = sum of:
    0.06223513 = weight(_text_:open in 3621) [ClassicSimilarity], result of:
      0.06223513 = score(doc=3621,freq=6.0), product of:
        0.14443703 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0320743 = queryNorm
        0.43088073 = fieldWeight in 3621, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3621)
  0.071428575 = coord(1/14)
```
Abstract

Natural language texts are valuable data sources in many human activities. NLP techniques are being widely used in order to help find the right information to specific needs. In this paper, we present one such technique: relation extraction from texts. This task aims at identifying and classifying semantic relations that occur between entities in a text. For example, the sentence "Roberto Marinho is the founder of Rede Globo" expresses a relation occurring between "Roberto Marinho" and "Rede Globo." This work presents a system for Portuguese Open Relation Extraction, named RelP, which extracts any relation descriptor that describes an explicit relation between named entities in the organisation domain by applying the Conditional Random Fields. For implementing RelP, we define the representation scheme, features based on previous work, and a reference corpus. RelP achieved state of the art results for open relation extraction; the F-measure rate was around 60% between the named entities person, organisation and place. For better understanding of the output, we present a way for organizing the output from the mining of the extracted relation descriptors. This organization can be useful to classify relation types, to cluster the entities involved in a common relation and to populate datasets.
Multi-source, multilingual information extraction and summarization (2013) 0.00
```
0.004398995 = product of:
  0.06158593 = sum of:
    0.06158593 = weight(_text_:source in 978) [ClassicSimilarity], result of:
      0.06158593 = score(doc=978,freq=4.0), product of:
        0.15900996 = queryWeight, product of:
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0320743 = queryNorm
        0.38730863 = fieldWeight in 978, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0390625 = fieldNorm(doc=978)
  0.071428575 = coord(1/14)
```
Abstract

Information extraction (IE) and text summarization (TS) are powerful technologies for finding relevant pieces of information in text and presenting them to the user in condensed form. The ongoing information explosion makes IE and TS critical for successful functioning within the information society. These technologies face particular challenges due to the inherent multi-source nature of the information explosion. The technologies must now handle not isolated texts or individual narratives, but rather large-scale repositories and streams---in general, in multiple languages---containing a multiplicity of perspectives, opinions, or commentaries on particular topics, entities or events. There is thus a need to adapt existing techniques and develop new ones to deal with these challenges. This volume contains a selection of papers that present a variety of methodologies for content identification and extraction, as well as for content fusion and regeneration. The chapters cover various aspects of the challenges, depending on the nature of the information sought---names vs. events,--- and the nature of the sources---news streams vs. image captions vs. scientific research papers, etc. This volume aims to offer a broad and representative sample of studies from this very active research field.
Liu, P.J.; Saleh, M.; Pot, E.; Goodrich, B.; Sepassi, R.; Kaiser, L.; Shazeer, N.: Generating Wikipedia by summarizing long sequences (2018) 0.00
```
0.004354783 = product of:
  0.060966957 = sum of:
    0.060966957 = weight(_text_:source in 773) [ClassicSimilarity], result of:
      0.060966957 = score(doc=773,freq=2.0), product of:
        0.15900996 = queryWeight, product of:
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0320743 = queryNorm
        0.38341597 = fieldWeight in 773, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.9575505 = idf(docFreq=844, maxDocs=44218)
          0.0546875 = fieldNorm(doc=773)
  0.071428575 = coord(1/14)
```
Abstract

We show that generating English Wikipedia articles can be approached as a multi-document summarization of source documents. We use extractive summarization to coarsely identify salient information and a neural abstractive model to generate the article. For the abstractive model, we introduce a decoder-only architecture that can scalably attend to very long sequences, much longer than typical encoder- decoder architectures used in sequence transduction. We show that this model can generate fluent, coherent multi-sentence paragraphs and even whole Wikipedia articles. When given reference documents, we show it can extract relevant factual information as reflected in perplexity, ROUGE scores and human evaluations.
Rajasurya, S.; Muralidharan, T.; Devi, S.; Swamynathan, S.: Semantic information retrieval using ontology in university domain (2012) 0.00
```
0.0026959025 = product of:
  0.037742633 = sum of:
    0.037742633 = weight(_text_:web in 2861) [ClassicSimilarity], result of:
      0.037742633 = score(doc=2861,freq=8.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.36057037 = fieldWeight in 2861, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2861)
  0.071428575 = coord(1/14)
```
Abstract

Today's conventional search engines hardly do provide the essential content relevant to the user's search query. This is because the context and semantics of the request made by the user is not analyzed to the full extent. So here the need for a semantic web search arises. SWS is upcoming in the area of web search which combines Natural Language Processing and Artificial Intelligence. The objective of the work done here is to design, develop and implement a semantic search engine- SIEU(Semantic Information Extraction in University Domain) confined to the university domain. SIEU uses ontology as a knowledge base for the information retrieval process. It is not just a mere keyword search. It is one layer above what Google or any other search engines retrieve by analyzing just the keywords. Here the query is analyzed both syntactically and semantically. The developed system retrieves the web results more relevant to the user query through keyword expansion. The results obtained here will be accurate enough to satisfy the request made by the user. The level of accuracy will be enhanced since the query is analyzed semantically. The system will be of great use to the developers and researchers who work on web. The Google results are re-ranked and optimized for providing the relevant links. For ranking an algorithm has been applied which fetches more apt results for the user query.
Rozinajová, V.; Macko, P.: Using natural language to search linked data (2017) 0.00
```
0.0026959025 = product of:
  0.037742633 = sum of:
    0.037742633 = weight(_text_:web in 3488) [ClassicSimilarity], result of:
      0.037742633 = score(doc=3488,freq=8.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.36057037 = fieldWeight in 3488, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3488)
  0.071428575 = coord(1/14)
```
Abstract

There are many endeavors aiming to offer users more effective ways of getting relevant information from web. One of them is represented by a concept of Linked Data, which provides interconnected data sources. But querying these types of data is difficult not only for the conventional web users but also for ex-perts in this field. Therefore, a more comfortable way of user query would be of great value. One direction could be to allow the user to use a natural language. To make this task easier we have proposed a method for translating natural language query to SPARQL query. It is based on a sentence structure - utilizing dependen-cies between the words in user queries. Dependencies are used to map the query to the semantic web structure, which is in the next step translated to SPARQL query. According to our first experiments we are able to answer a significant group of user queries.

Series

Information Systems and Applications, incl. Internet/Web, and HCI; 10151
Wong, W.; Liu, W.; Bennamoun, M.: Ontology learning from text : a look back and into the future (2010) 0.00
```
0.0026688075 = product of:
  0.037363302 = sum of:
    0.037363302 = weight(_text_:web in 4733) [ClassicSimilarity], result of:
      0.037363302 = score(doc=4733,freq=4.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.35694647 = fieldWeight in 4733, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=4733)
  0.071428575 = coord(1/14)
```
Abstract

Ontologies are often viewed as the answer to the need for inter-operable semantics in modern information systems. The explosion of textual information on the "Read/Write" Web coupled with the increasing demand for ontologies to power the Semantic Web have made (semi-)automatic ontology learning from text a very promising research area. This together with the advanced state in related areas such as natural language processing have fuelled research into ontology learning over the past decade. This survey looks at how far we have come since the turn of the millennium, and discusses the remaining challenges that will define the research directions in this area in the near future.

¬Die Bibel als Stilkompass (2019) 0.00

0.0025665336 = product of:
  0.035931468 = sum of:
    0.035931468 = weight(_text_:open in 5331) [ClassicSimilarity], result of:
      0.035931468 = score(doc=5331,freq=2.0), product of:
        0.14443703 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0320743 = queryNorm
        0.24876907 = fieldWeight in 5331, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5331)
  0.071428575 = coord(1/14)

Footnote: Vgl.: R. soc. open sci. 5, 171920, 2018.

Muresan, S.; Klavans, J.L.: Inducing terminologies from text : a case study for the consumer health domain (2013) 0.00
```
0.0022875492 = product of:
  0.032025687 = sum of:
    0.032025687 = weight(_text_:web in 682) [ClassicSimilarity], result of:
      0.032025687 = score(doc=682,freq=4.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.3059541 = fieldWeight in 682, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=682)
  0.071428575 = coord(1/14)
```
Abstract

Specialized medical ontologies and terminologies, such as SNOMED CT and the Unified Medical Language System (UMLS), have been successfully leveraged in medical information systems to provide a standard web-accessible medium for interoperability, access, and reuse. However, these clinically oriented terminologies and ontologies cannot provide sufficient support when integrated into consumer-oriented applications, because these applications must "understand" both technical and lay vocabulary. The latter is not part of these specialized terminologies and ontologies. In this article, we propose a two-step approach for building consumer health terminologies from text: 1) automatic extraction of definitions from consumer-oriented articles and web documents, which reflects language in use, rather than relying solely on dictionaries, and 2) learning to map definitions expressed in natural language to terminological knowledge by inducing a syntactic-semantic grammar rather than using hand-written patterns or grammars. We present quantitative and qualitative evaluations of our two-step approach, which show that our framework could be used to induce consumer health terminologies from text.
Perovsek, M.; Kranjca, J.; Erjaveca, T.; Cestnika, B.; Lavraca, N.: TextFlows : a visual programming platform for text mining and natural language processing (2016) 0.00
```
0.0022875492 = product of:
  0.032025687 = sum of:
    0.032025687 = weight(_text_:web in 2697) [ClassicSimilarity], result of:
      0.032025687 = score(doc=2697,freq=4.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.3059541 = fieldWeight in 2697, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2697)
  0.071428575 = coord(1/14)
```
Abstract

Text mining and natural language processing are fast growing areas of research, with numerous applications in business, science and creative industries. This paper presents TextFlows, a web-based text mining and natural language processing platform supporting workflow construction, sharing and execution. The platform enables visual construction of text mining workflows through a web browser, and the execution of the constructed workflows on a processing cloud. This makes TextFlows an adaptable infrastructure for the construction and sharing of text processing workflows, which can be reused in various applications. The paper presents the implemented text mining and language processing modules, and describes some precomposed workflows. Their features are demonstrated on three use cases: comparison of document classifiers and of different part-of-speech taggers on a text categorization problem, and outlier detection in document corpora.
Clark, M.; Kim, Y.; Kruschwitz, U.; Song, D.; Albakour, D.; Dignum, S.; Beresi, U.C.; Fasli, M.; Roeck, A De: Automatically structuring domain knowledge from text : an overview of current research (2012) 0.00
```
0.0022875492 = product of:
  0.032025687 = sum of:
    0.032025687 = weight(_text_:web in 2738) [ClassicSimilarity], result of:
      0.032025687 = score(doc=2738,freq=4.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.3059541 = fieldWeight in 2738, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=2738)
  0.071428575 = coord(1/14)
```
Abstract

This paper presents an overview of automatic methods for building domain knowledge structures (domain models) from text collections. Applications of domain models have a long history within knowledge engineering and artificial intelligence. In the last couple of decades they have surfaced noticeably as a useful tool within natural language processing, information retrieval and semantic web technology. Inspired by the ubiquitous propagation of domain model structures that are emerging in several research disciplines, we give an overview of the current research landscape and some techniques and approaches. We will also discuss trade-offs between different approaches and point to some recent trends.

Content

Beitrag in einem Themenheft "Soft Approaches to IA on the Web". Vgl.: doi:10.1016/j.ipm.2011.07.002.

Ramisch, C.: Multiword expressions acquisition : a generic and open framework (2015) 0.00

0.0020532268 = product of:
  0.028745173 = sum of:
    0.028745173 = weight(_text_:open in 1649) [ClassicSimilarity], result of:
      0.028745173 = score(doc=1649,freq=2.0), product of:
        0.14443703 = queryWeight, product of:
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.0320743 = queryNorm
        0.19901526 = fieldWeight in 1649, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          4.5032015 = idf(docFreq=1330, maxDocs=44218)
          0.03125 = fieldNorm(doc=1649)
  0.071428575 = coord(1/14)

Babik, W.: Keywords as linguistic tools in information and knowledge organization (2017) 0.00
```
0.0018871318 = product of:
  0.026419844 = sum of:
    0.026419844 = weight(_text_:web in 3510) [ClassicSimilarity], result of:
      0.026419844 = score(doc=3510,freq=2.0), product of:
        0.10467481 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0320743 = queryNorm
        0.25239927 = fieldWeight in 3510, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0546875 = fieldNorm(doc=3510)
  0.071428575 = coord(1/14)
```
Source

Theorie, Semantik und Organisation von Wissen: Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.-20.03.2013): 'Theory, Information and Organization of Knowledge' / Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): 'Lexical Resources for Knowledge Organization' / Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): 'Knowledge Organization and Semantic Web' / Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.-30.09.2011): 'Economics of Knowledge Production and Organization'. Hrsg. von W. Babik, H.P. Ohly u. K. Weber

Search (31 results, page 1 of 2)

Authors

Languages

Types

Themes

Classifications