Search (261 results, page 1 of 14)

Kleineberg, M.: Context analysis and context indexing : formal pragmatics in knowledge organization (2014) 0.24

0.23925105 = product of:
  0.4785021 = sum of:
    0.11962552 = product of:
      0.35887656 = sum of:
        0.35887656 = weight(_text_:3a in 1826) [ClassicSimilarity], result of:
          0.35887656 = score(doc=1826,freq=2.0), product of:
            0.38312992 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191016 = queryNorm
            0.93669677 = fieldWeight in 1826, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.078125 = fieldNorm(doc=1826)
      0.33333334 = coord(1/3)
    0.35887656 = weight(_text_:2f in 1826) [ClassicSimilarity], result of:
      0.35887656 = score(doc=1826,freq=2.0), product of:
        0.38312992 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.045191016 = queryNorm
        0.93669677 = fieldWeight in 1826, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.078125 = fieldNorm(doc=1826)
  0.5 = coord(2/4)

Source: http://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&ved=0CDQQFjAE&url=http%3A%2F%2Fdigbib.ubka.uni-karlsruhe.de%2Fvolltexte%2Fdocuments%2F3131107&ei=HzFWVYvGMsiNsgGTyoFI&usg=AFQjCNE2FHUeR9oQTQlNC4TPedv4Mo3DaQ&sig2=Rlzpr7a3BLZZkqZCXXN_IA&bvm=bv.93564037,d.bGg&cad=rja

Popper, K.R.: Three worlds : the Tanner lecture on human values. Deliverd at the University of Michigan, April 7, 1978 (1978) 0.19

0.19140083 = product of:
  0.38280165 = sum of:
    0.09570041 = product of:
      0.28710124 = sum of:
        0.28710124 = weight(_text_:3a in 230) [ClassicSimilarity], result of:
          0.28710124 = score(doc=230,freq=2.0), product of:
            0.38312992 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191016 = queryNorm
            0.7493574 = fieldWeight in 230, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0625 = fieldNorm(doc=230)
      0.33333334 = coord(1/3)
    0.28710124 = weight(_text_:2f in 230) [ClassicSimilarity], result of:
      0.28710124 = score(doc=230,freq=2.0), product of:
        0.38312992 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.045191016 = queryNorm
        0.7493574 = fieldWeight in 230, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0625 = fieldNorm(doc=230)
  0.5 = coord(2/4)

Source: https%3A%2F%2Ftannerlectures.utah.edu%2F_documents%2Fa-to-z%2Fp%2Fpopper80.pdf&usg=AOvVaw3f4QRTEH-OEBmoYr2J_c7H

Shala, E.: ¬Die Autonomie des Menschen und der Maschine : gegenwärtige Definitionen von Autonomie zwischen philosophischem Hintergrund und technologischer Umsetzbarkeit (2014) 0.12

0.11962552 = product of:
  0.23925105 = sum of:
    0.05981276 = product of:
      0.17943828 = sum of:
        0.17943828 = weight(_text_:3a in 4388) [ClassicSimilarity], result of:
          0.17943828 = score(doc=4388,freq=2.0), product of:
            0.38312992 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191016 = queryNorm
            0.46834838 = fieldWeight in 4388, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4388)
      0.33333334 = coord(1/3)
    0.17943828 = weight(_text_:2f in 4388) [ClassicSimilarity], result of:
      0.17943828 = score(doc=4388,freq=2.0), product of:
        0.38312992 = queryWeight, product of:
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.045191016 = queryNorm
        0.46834838 = fieldWeight in 4388, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          8.478011 = idf(docFreq=24, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4388)
  0.5 = coord(2/4)

Footnote: Vgl. unter: https://www.google.de/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=2ahUKEwizweHljdbcAhVS16QKHXcFD9QQFjABegQICRAB&url=https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F271200105_Die_Autonomie_des_Menschen_und_der_Maschine_-_gegenwartige_Definitionen_von_Autonomie_zwischen_philosophischem_Hintergrund_und_technologischer_Umsetzbarkeit_Redigierte_Version_der_Magisterarbeit_Karls&usg=AOvVaw06orrdJmFF2xbCCp_hL26q.

Spero, S.: Dashed suspicuous (2008) 0.04

0.04378526 = product of:
  0.17514104 = sum of:
    0.17514104 = weight(_text_:graphic in 2626) [ClassicSimilarity], result of:
      0.17514104 = score(doc=2626,freq=2.0), product of:
        0.29924196 = queryWeight, product of:
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.045191016 = queryNorm
        0.5852823 = fieldWeight in 2626, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.0625 = fieldNorm(doc=2626)
  0.25 = coord(1/4)

Content: "This is the latest version of the Doorbell -> Mammal graph; it shows the direct and indirect broader terms of doorbells in LCSH. This incarnation of the graphic adds one new piece of visual information that seems to be very very suggestive. Dashed lines are used to indicate broader term references that have never been validated since BT and NT references were automatically generated from the old SA (See Also) links in 1988."

Tozer, J.: How long is the perfect book? : Bigger really is better. What the numbers say (2019) 0.04

0.04378526 = product of:
  0.17514104 = sum of:
    0.17514104 = weight(_text_:graphic in 4686) [ClassicSimilarity], result of:
      0.17514104 = score(doc=4686,freq=2.0), product of:
        0.29924196 = queryWeight, product of:
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.045191016 = queryNorm
        0.5852823 = fieldWeight in 4686, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.0625 = fieldNorm(doc=4686)
  0.25 = coord(1/4)

Source: https://www.1843magazine.com/data-graphic/what-the-numbers-say/how-long-is-the-perfect-book

Graphic details : a scientific study of the importance of diagrams to science (2016) 0.04
```
0.037431013 = product of:
  0.074862026 = sum of:
    0.06567789 = weight(_text_:graphic in 3035) [ClassicSimilarity], result of:
      0.06567789 = score(doc=3035,freq=2.0), product of:
        0.29924196 = queryWeight, product of:
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.045191016 = queryNorm
        0.21948087 = fieldWeight in 3035, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.0234375 = fieldNorm(doc=3035)
    0.009184138 = product of:
      0.018368276 = sum of:
        0.018368276 = weight(_text_:22 in 3035) [ClassicSimilarity], result of:
          0.018368276 = score(doc=3035,freq=2.0), product of:
            0.15825124 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045191016 = queryNorm
            0.116070345 = fieldWeight in 3035, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3035)
      0.5 = coord(1/2)
  0.5 = coord(2/4)
```
Content

As the team describe in a paper posted (http://arxiv.org/abs/1605.04951) on arXiv, they found that figures did indeed matter-but not all in the same way. An average paper in PubMed Central has about one diagram for every three pages and gets 1.67 citations. Papers with more diagrams per page and, to a lesser extent, plots per page tended to be more influential (on average, a paper accrued two more citations for every extra diagram per page, and one more for every extra plot per page). By contrast, including photographs and equations seemed to decrease the chances of a paper being cited by others. That agrees with a study from 2012, whose authors counted (by hand) the number of mathematical expressions in over 600 biology papers and found that each additional equation per page reduced the number of citations a paper received by 22%. This does not mean that researchers should rush to include more diagrams in their next paper. Dr Howe has not shown what is behind the effect, which may merely be one of correlation, rather than causation. It could, for example, be that papers with lots of diagrams tend to be those that illustrate new concepts, and thus start a whole new field of inquiry. Such papers will certainly be cited a lot. On the other hand, the presence of equations really might reduce citations. Biologists (as are most of those who write and read the papers in PubMed Central) are notoriously mathsaverse. If that is the case, looking in a physics archive would probably produce a different result.
Fowler, R.H.; Wilson, B.A.; Fowler, W.A.L.: Information navigator : an information system using associative networks for display and retrieval (1992) 0.03
```
0.032838944 = product of:
  0.13135578 = sum of:
    0.13135578 = weight(_text_:graphic in 919) [ClassicSimilarity], result of:
      0.13135578 = score(doc=919,freq=2.0), product of:
        0.29924196 = queryWeight, product of:
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.045191016 = queryNorm
        0.43896174 = fieldWeight in 919, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.046875 = fieldNorm(doc=919)
  0.25 = coord(1/4)
```
Abstract

Document retrieval is a highly interactive process dealing with large amounts of information. Visual representations can provide both a means for managing the complexity of large information structures and an interface style well suited to interactive manipulation. The system we have designed utilizes visually displayed graphic structures and a direct manipulation interface style to supply an integrated environment for retrieval. A common visually displayed network structure is used for query, document content, and term relations. A query can be modified through direct manipulation of its visual form by incorporating terms from any other information structure the system displays. An associative thesaurus of terms and an inter-document network provide information about a document collection that can complement other retrieval aids. Visualization of these large data structures makes use of fisheye views and overview diagrams to help overcome some of the inherent difficulties of orientation and navigation in large information structures.

Bartczak, J.; Glendon, I.: Python, Google Sheets, and the Thesaurus for Graphic Materials for efficient metadata project workflows (2017) 0.03

0.032838944 = product of:
  0.13135578 = sum of:
    0.13135578 = weight(_text_:graphic in 3893) [ClassicSimilarity], result of:
      0.13135578 = score(doc=3893,freq=2.0), product of:
        0.29924196 = queryWeight, product of:
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.045191016 = queryNorm
        0.43896174 = fieldWeight in 3893, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          6.6217136 = idf(docFreq=159, maxDocs=44218)
          0.046875 = fieldNorm(doc=3893)
  0.25 = coord(1/4)

Si, L.: Encoding formats and consideration of requirements for mapping (2007) 0.02

0.024838142 = product of:
  0.09935257 = sum of:
    0.09935257 = sum of:
      0.056493253 = weight(_text_:methods in 540) [ClassicSimilarity], result of:
        0.056493253 = score(doc=540,freq=2.0), product of:
          0.18168657 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.045191016 = queryNorm
          0.31093797 = fieldWeight in 540, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0546875 = fieldNorm(doc=540)
      0.042859312 = weight(_text_:22 in 540) [ClassicSimilarity], result of:
        0.042859312 = score(doc=540,freq=2.0), product of:
          0.15825124 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191016 = queryNorm
          0.2708308 = fieldWeight in 540, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0546875 = fieldNorm(doc=540)
  0.25 = coord(1/4)

Abstract: With the increasing requirement of establishing semantic mappings between different vocabularies, further development of these encoding formats is becoming more and more important. For this reason, four types of knowledge representation formats were assessed:MARC21 for Classification Data in XML, Zthes XML Schema, XTM(XML Topic Map), and SKOS (Simple Knowledge Organisation System). This paper explores the potential of adapting these representation formats to support different semantic mapping methods, and discusses the implication of extending them to represent more complex KOS.
Date: 26.12.2011 13:22:27

Monireh, E.; Sarker, M.K.; Bianchi, F.; Hitzler, P.; Doran, D.; Xie, N.: Reasoning over RDF knowledge bases using deep learning (2018) 0.02
```
0.021920148 = product of:
  0.08768059 = sum of:
    0.08768059 = sum of:
      0.057066802 = weight(_text_:methods in 4553) [ClassicSimilarity], result of:
        0.057066802 = score(doc=4553,freq=4.0), product of:
          0.18168657 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.045191016 = queryNorm
          0.31409478 = fieldWeight in 4553, product of:
            2.0 = tf(freq=4.0), with freq of:
              4.0 = termFreq=4.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4553)
      0.030613795 = weight(_text_:22 in 4553) [ClassicSimilarity], result of:
        0.030613795 = score(doc=4553,freq=2.0), product of:
          0.15825124 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191016 = queryNorm
          0.19345059 = fieldWeight in 4553, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4553)
  0.25 = coord(1/4)
```
Abstract

Semantic Web knowledge representation standards, and in particular RDF and OWL, often come endowed with a formal semantics which is considered to be of fundamental importance for the field. Reasoning, i.e., the drawing of logical inferences from knowledge expressed in such standards, is traditionally based on logical deductive methods and algorithms which can be proven to be sound and complete and terminating, i.e. correct in a very strong sense. For various reasons, though, in particular the scalability issues arising from the ever increasing amounts of Semantic Web data available and the inability of deductive algorithms to deal with noise in the data, it has been argued that alternative means of reasoning should be investigated which bear high promise for high scalability and better robustness. From this perspective, deductive algorithms can be considered the gold standard regarding correctness against which alternative methods need to be tested. In this paper, we show that it is possible to train a Deep Learning system on RDF knowledge graphs, such that it is able to perform reasoning over new RDF knowledge graphs, with high precision and recall compared to the deductive gold standard.

Date

16.11.2018 14:22:01
Zanibbi, R.; Yuan, B.: Keyword and image-based retrieval for mathematical expressions (2011) 0.02
```
0.021289835 = product of:
  0.08515934 = sum of:
    0.08515934 = sum of:
      0.048422787 = weight(_text_:methods in 3449) [ClassicSimilarity], result of:
        0.048422787 = score(doc=3449,freq=2.0), product of:
          0.18168657 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.045191016 = queryNorm
          0.26651827 = fieldWeight in 3449, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.046875 = fieldNorm(doc=3449)
      0.03673655 = weight(_text_:22 in 3449) [ClassicSimilarity], result of:
        0.03673655 = score(doc=3449,freq=2.0), product of:
          0.15825124 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191016 = queryNorm
          0.23214069 = fieldWeight in 3449, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.046875 = fieldNorm(doc=3449)
  0.25 = coord(1/4)
```
Abstract

Two new methods for retrieving mathematical expressions using conventional keyword search and expression images are presented. An expression-level TF-IDF (term frequency-inverse document frequency) approach is used for keyword search, where queries and indexed expressions are represented by keywords taken from LATEX strings. TF-IDF is computed at the level of individual expressions rather than documents to increase the precision of matching. The second retrieval technique is a form of Content-Base Image Retrieval (CBIR). Expressions are segmented into connected components, and then components in the query expression and each expression in the collection are matched using contour and density features, aspect ratios, and relative positions. In an experiment using ten randomly sampled queries from a corpus of over 22,000 expressions, precision-at-k (k= 20) for the keyword-based approach was higher (keyword: µ= 84.0,s= 19.0, image-based:µ= 32.0,s= 30.7), but for a few of the queries better results were obtained using a combination of the two techniques.

Date

22. 2.2017 12:53:49
Boldi, P.; Santini, M.; Vigna, S.: PageRank as a function of the damping factor (2005) 0.02
```
0.01774153 = product of:
  0.07096612 = sum of:
    0.07096612 = sum of:
      0.040352322 = weight(_text_:methods in 2564) [ClassicSimilarity], result of:
        0.040352322 = score(doc=2564,freq=2.0), product of:
          0.18168657 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.045191016 = queryNorm
          0.22209854 = fieldWeight in 2564, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2564)
      0.030613795 = weight(_text_:22 in 2564) [ClassicSimilarity], result of:
        0.030613795 = score(doc=2564,freq=2.0), product of:
          0.15825124 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191016 = queryNorm
          0.19345059 = fieldWeight in 2564, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=2564)
  0.25 = coord(1/4)
```
Abstract

PageRank is defined as the stationary state of a Markov chain. The chain is obtained by perturbing the transition matrix induced by a web graph with a damping factor alpha that spreads uniformly part of the rank. The choice of alpha is eminently empirical, and in most cases the original suggestion alpha=0.85 by Brin and Page is still used. Recently, however, the behaviour of PageRank with respect to changes in alpha was discovered to be useful in link-spam detection. Moreover, an analytical justification of the value chosen for alpha is still missing. In this paper, we give the first mathematical analysis of PageRank when alpha changes. In particular, we show that, contrarily to popular belief, for real-world graphs values of alpha close to 1 do not give a more meaningful ranking. Then, we give closed-form formulae for PageRank derivatives of any order, and an extension of the Power Method that approximates them with convergence O(t**k*alpha**t) for the k-th derivative. Finally, we show a tight connection between iterated computation and analytical behaviour by proving that the k-th iteration of the Power Method gives exactly the PageRank value obtained using a Maclaurin polynomial of degree k. The latter result paves the way towards the application of analytical methods to the study of PageRank.

Date

16. 1.2016 10:22:28
Roy, W.; Gray, C.: Preparing existing metadata for repository batch import : a recipe for a fickle food (2018) 0.02
```
0.01774153 = product of:
  0.07096612 = sum of:
    0.07096612 = sum of:
      0.040352322 = weight(_text_:methods in 4550) [ClassicSimilarity], result of:
        0.040352322 = score(doc=4550,freq=2.0), product of:
          0.18168657 = queryWeight, product of:
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.045191016 = queryNorm
          0.22209854 = fieldWeight in 4550, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            4.0204134 = idf(docFreq=2156, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4550)
      0.030613795 = weight(_text_:22 in 4550) [ClassicSimilarity], result of:
        0.030613795 = score(doc=4550,freq=2.0), product of:
          0.15825124 = queryWeight, product of:
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.045191016 = queryNorm
          0.19345059 = fieldWeight in 4550, product of:
            1.4142135 = tf(freq=2.0), with freq of:
              2.0 = termFreq=2.0
            3.5018296 = idf(docFreq=3622, maxDocs=44218)
            0.0390625 = fieldNorm(doc=4550)
  0.25 = coord(1/4)
```
Abstract

In 2016, the University of Waterloo began offering a mediated copyright review and deposit service to support the growth of our institutional repository UWSpace. This resulted in the need to batch import large lists of published works into the institutional repository quickly and accurately. A range of methods have been proposed for harvesting publications metadata en masse, but many technological solutions can easily become detached from a workflow that is both reproducible for support staff and applicable to a range of situations. Many repositories offer the capacity for batch upload via CSV, so our method provides a template Python script that leverages the Habanero library for populating CSV files with existing metadata retrieved from the CrossRef API. In our case, we have combined this with useful metadata contained in a TSV file downloaded from Web of Science in order to enrich our metadata as well. The appeal of this 'low-maintenance' method is that it provides more robust options for gathering metadata semi-automatically, and only requires the user's ability to access Web of Science and the Python program, while still remaining flexible enough for local customizations.

Date

10.11.2018 16:27:22
Dietz, K.: en.wikipedia.org > 6 Mio. Artikel (2020) 0.01
```
0.01495319 = product of:
  0.05981276 = sum of:
    0.05981276 = product of:
      0.17943828 = sum of:
        0.17943828 = weight(_text_:3a in 5669) [ClassicSimilarity], result of:
          0.17943828 = score(doc=5669,freq=2.0), product of:
            0.38312992 = queryWeight, product of:
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.045191016 = queryNorm
            0.46834838 = fieldWeight in 5669, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              8.478011 = idf(docFreq=24, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5669)
      0.33333334 = coord(1/3)
  0.25 = coord(1/4)
```
Content

"Die Englischsprachige Wikipedia verfügt jetzt über mehr als 6 Millionen Artikel. An zweiter Stelle kommt die deutschsprachige Wikipedia mit 2.3 Millionen Artikeln, an dritter Stelle steht die französischsprachige Wikipedia mit 2.1 Millionen Artikeln (via Researchbuzz: Firehose <https://rbfirehose.com/2020/01/24/techcrunch-wikipedia-now-has-more-than-6-million-articles-in-english/> und Techcrunch <https://techcrunch.com/2020/01/23/wikipedia-english-six-million-articles/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+Techcrunch+%28TechCrunch%29&guccounter=1&guce_referrer=aHR0cHM6Ly9yYmZpcmVob3NlLmNvbS8yMDIwLzAxLzI0L3RlY2hjcnVuY2gtd2lraXBlZGlhLW5vdy1oYXMtbW9yZS10aGFuLTYtbWlsbGlvbi1hcnRpY2xlcy1pbi1lbmdsaXNoLw&guce_referrer_sig=AQAAAK0zHfjdDZ_spFZBF_z-zDjtL5iWvuKDumFTzm4HvQzkUfE2pLXQzGS6FGB_y-VISdMEsUSvkNsg2U_NWQ4lwWSvOo3jvXo1I3GtgHpP8exukVxYAnn5mJspqX50VHIWFADHhs5AerkRn3hMRtf_R3F1qmEbo8EROZXp328HMC-o>). 250120 via digithek ch = #fineBlog s.a.: Angesichts der Veröffentlichung des 6-millionsten Artikels vergangene Woche in der englischsprachigen Wikipedia hat die Community-Zeitungsseite "Wikipedia Signpost" ein Moratorium bei der Veröffentlichung von Unternehmensartikeln gefordert. Das sei kein Vorwurf gegen die Wikimedia Foundation, aber die derzeitigen Maßnahmen, um die Enzyklopädie gegen missbräuchliches undeklariertes Paid Editing zu schützen, funktionierten ganz klar nicht. *"Da die ehrenamtlichen Autoren derzeit von Werbung in Gestalt von Wikipedia-Artikeln überwältigt werden, und da die WMF nicht in der Lage zu sein scheint, dem irgendetwas entgegenzusetzen, wäre der einzige gangbare Weg für die Autoren, fürs erste die Neuanlage von Artikeln über Unternehmen zu untersagen"*, schreibt der Benutzer Smallbones in seinem Editorial <https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2020-01-27/From_the_editor> zur heutigen Ausgabe."
Yang, Y.; Liu, X.: ¬A re-examination of text categorization methods (1999) 0.01
```
0.014123313 = product of:
  0.056493253 = sum of:
    0.056493253 = product of:
      0.112986505 = sum of:
        0.112986505 = weight(_text_:methods in 3386) [ClassicSimilarity], result of:
          0.112986505 = score(doc=3386,freq=8.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.62187594 = fieldWeight in 3386, product of:
              2.828427 = tf(freq=8.0), with freq of:
                8.0 = termFreq=8.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3386)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

This paper reports a controlled study with statistical significance tests an five text categorization methods: the Support Vector Machines (SVM), a k-Nearest Neighbor (kNN) classifier, a neural network (NNet) approach, the Linear Leastsquares Fit (LLSF) mapping and a Naive Bayes (NB) classifier. We focus an the robustness of these methods in dealing with a skewed category distribution, and their performance as function of the training-set category frequency. Our results show that SVM, kNN and LLSF significantly outperform NNet and NB when the number of positive training instances per category are small (less than ten, and that all the methods perform comparably when the categories are sufficiently common (over 300 instances).

Ramisch, C.; Schreiner, P.; Idiart, M.; Villavicencio, A.: ¬An evaluation of methods for the extraction of multiword expressions (20xx) 0.01

0.013978455 = product of:
  0.05591382 = sum of:
    0.05591382 = product of:
      0.11182764 = sum of:
        0.11182764 = weight(_text_:methods in 962) [ClassicSimilarity], result of:
          0.11182764 = score(doc=962,freq=6.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.6154976 = fieldWeight in 962, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0625 = fieldNorm(doc=962)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Abstract: This paper focuses on the evaluation of some methods for the automatic acquisition of Multiword Expressions (MWEs). First we investigate the hypothesis that MWEs can be detected solely by the distinct statistical properties of their component words, regardless of their type, comparing 3 statistical measures: Mutual Information, Chi**2 and Permutation Entropy. Moreover, we also look at the impact that the addition of type-specific linguistic information has on the performance of these methods.

Suominen, O.; Koskenniemi, I.: Annif Analyzer Shootout : comparing text lemmatization methods for automated subject indexing (2022) 0.01
```
0.013345277 = product of:
  0.053381108 = sum of:
    0.053381108 = product of:
      0.106762215 = sum of:
        0.106762215 = weight(_text_:methods in 658) [ClassicSimilarity], result of:
          0.106762215 = score(doc=658,freq=14.0), product of:
            0.18168657 = queryWeight, product of:
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.045191016 = queryNorm
            0.5876176 = fieldWeight in 658, product of:
              3.7416575 = tf(freq=14.0), with freq of:
                14.0 = termFreq=14.0
              4.0204134 = idf(docFreq=2156, maxDocs=44218)
              0.0390625 = fieldNorm(doc=658)
      0.5 = coord(1/2)
  0.25 = coord(1/4)
```
Abstract

Automated text classification is an important function for many AI systems relevant to libraries, including automated subject indexing and classification. When implemented using the traditional natural language processing (NLP) paradigm, one key part of the process is the normalization of words using stemming or lemmatization, which reduces the amount of linguistic variation and often improves the quality of classification. In this paper, we compare the output of seven different text lemmatization algorithms as well as two baseline methods. We measure how the choice of method affects the quality of text classification using example corpora in three languages. The experiments have been performed using the open source Annif toolkit for automated subject indexing and classification, but should generalize also to other NLP toolkits and similar text classification tasks. The results show that lemmatization methods in most cases outperform baseline methods in text classification particularly for Finnish and Swedish text, but not English, where baseline methods are most effective. The differences between lemmatization methods are quite small. The systematic comparison will help optimize text classification pipelines and inform the further development of the Annif toolkit to incorporate a wider choice of normalization methods.

Information als Rohstoff für Innovation : Programm der Bundesregierung 1996-2000 (1996) 0.01

0.012245518 = product of:
  0.048982073 = sum of:
    0.048982073 = product of:
      0.097964145 = sum of:
        0.097964145 = weight(_text_:22 in 5449) [ClassicSimilarity], result of:
          0.097964145 = score(doc=5449,freq=2.0), product of:
            0.15825124 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045191016 = queryNorm
            0.61904186 = fieldWeight in 5449, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=5449)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 22. 2.1997 19:26:34

Ask me[@sk.me]: your global information guide : der Wegweiser durch die Informationswelten (1996) 0.01

0.012245518 = product of:
  0.048982073 = sum of:
    0.048982073 = product of:
      0.097964145 = sum of:
        0.097964145 = weight(_text_:22 in 5837) [ClassicSimilarity], result of:
          0.097964145 = score(doc=5837,freq=2.0), product of:
            0.15825124 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045191016 = queryNorm
            0.61904186 = fieldWeight in 5837, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=5837)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 30.11.1996 13:22:37

Kosmos Weltatlas 2000 : Der Kompass für das 21. Jahrhundert. Inklusive Welt-Routenplaner (1999) 0.01

0.012245518 = product of:
  0.048982073 = sum of:
    0.048982073 = product of:
      0.097964145 = sum of:
        0.097964145 = weight(_text_:22 in 4085) [ClassicSimilarity], result of:
          0.097964145 = score(doc=4085,freq=2.0), product of:
            0.15825124 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.045191016 = queryNorm
            0.61904186 = fieldWeight in 4085, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.125 = fieldNorm(doc=4085)
      0.5 = coord(1/2)
  0.25 = coord(1/4)

Date: 7.11.1999 18:22:39

Search (261 results, page 1 of 14)

Authors

Years

Languages

Types

Themes

Subjects