Document (#33184)

Author
Crane, G.
Jones, A.
Title
Text, information, knowledge and the evolving record of humanity
Source
D-Lib magazine. 12(2006) no.3, x S
Year
2006
Abstract
Consider a sentence such as "the current price of tea in China is 35 cents per pound." In a library with millions of books we might find many statements of the above form that we could capture today with relatively simple rules: rather than pursuing every variation of a statement, programs can wait, like predators at a water hole, for their informational prey to reappear in a standard linguistic pattern. We can make inferences from sentences such as "NAME1 born at NAME2 in DATE" that NAME more likely than not represents a person and NAME a place and then convert the statement into a proposition about a person born at a given place and time. The changing price of tea in China, pedestrian birth and death dates, or other basic statements may not be truth and beauty in the Phaedrus, but a digital library that could plot the prices of various commodities in different markets over time, plot the various lifetimes of individuals, or extract and classify many events would be very useful. Services such as the Syllabus Finder1 and H-Bot2 (which Dan Cohen describes elsewhere in this issue of D-Lib) represent examples of information extraction already in use. H-Bot, in particular, builds on our evolving ability to extract information from very large corpora such as the billions of web pages available through the Google API. Aside from identifying higher order statements, however, users also want to search and browse named entities: they want to read about "C. P. E. Bach" rather than his father "Johann Sebastian" or about "Cambridge, Maryland", without hearing about "Cambridge, Massachusetts", Cambridge in the UK or any of the other Cambridges scattered around the world. Named entity identification is a well-established area with an ongoing literature. The Natural Language Processing Research Group at the University of Sheffield has developed its open source Generalized Architecture for Text Engineering (GATE) for years, while IBM's Unstructured Information Analysis and Search (UIMA) is "available as open source software to provide a common foundation for industry and academia." Powerful tools are thus freely available and more demanding users can draw upon published literature to develop their own systems. Major search engines such as Google and Yahoo also integrate increasingly sophisticated tools to categorize and identify places. The software resources are rich and expanding. The reference works on which these systems depend, however, are ill-suited for historical analysis. First, simple gazetteers and similar authority lists quickly grow too big for useful information extraction. They provide us with potential entities against which to match textual references, but existing electronic reference works assume that human readers can use their knowledge of geography and of the immediate context to pick the right Boston from the Bostons in the Getty Thesaurus of Geographic Names (TGN), but, with the crucial exception of geographic location, the TGN records do not provide any machine readable clues: we cannot tell which Bostons are large or small. If we are analyzing a document published in 1818, we cannot filter out those places that did not yet exist or that had different names: "Jefferson Davis" is not the name of a parish in Louisiana (tgn,2000880) or a county in Mississippi (tgn,2001118) until after the Civil War.
Although the Alexandria Digital Library provides far richer data than the TGN (5.9 vs. 1.3 million names), its added size lowers, rather than increases, the accuracy of most geographic name identification systems for historical documents: most of the extra 4.6 million names cover low frequency entities that rarely occur in any particular corpus. The TGN is sufficiently comprehensive to provide quite enough noise: we find place names that are used over and over (there are almost one hundred Washingtons) and semantically ambiguous (e.g., is Washington a person or a place?). Comprehensive knowledge sources emphasize recall but lower precision. We need data with which to determine which "Tribune" or "John Brown" a particular passage denotes. Secondly and paradoxically, our reference works may not be comprehensive enough. Human actors come and go over time. Organizations appear and vanish. Even places can change their names or vanish. The TGN does associate the obsolete name Siam with the nation of Thailand (tgn,1000142) - but also with towns named Siam in Iowa (tgn,2035651), Tennessee (tgn,2101519), and Ohio (tgn,2662003). Prussia appears but as a general region (tgn,7016786), with no indication when or if it was a sovereign nation. And if places do point to the same object over time, that object may have very different significance over time: in the foundational works of Western historiography, Herodotus reminds us that the great cities of the past may be small today, and the small cities of today great tomorrow (Hdt. 1.5), while Thucydides stresses that we cannot estimate the past significance of a place by its appearance today (Thuc. 1.10). In other words, we need to know the population figures for the various Washingtons in 1870 if we are analyzing documents from 1870. The foundations have been laid for reference works that provide machine actionable information about entities at particular times in history. The Alexandria Digital Library Gazetteer Content Standard8 represents a sophisticated framework with which to create such resources: places can be associated with temporal information about their foundation (e.g., Washington, DC, founded on 16 July 1790), changes in names for the same location (e.g., Saint Petersburg to Leningrad and back again), population figures at various times and similar historically contingent data. But if we have the software and the data structures, we do not yet have substantial amounts of historical content such as plentiful digital gazetteers, encyclopedias, lexica, grammars and other reference works to illustrate many periods and, even if we do, those resources may not be in a useful form: raw OCR output of a complex lexicon or gazetteer may have so many errors and have captured so little of the underlying structure that the digital resource is useless as a knowledge base. Put another way, human beings are still much better at reading and interpreting the contents of page images than machines. While people, places, and dates are probably the most important core entities, we will find a growing set of objects that we need to identify and track across collections, and each of these categories of objects will require its own knowledge sources. The following section enumerates and briefly describes some existing categories of documents that we need to mine for knowledge. This brief survey focuses on the format of print sources (e.g., highly structured textual "database" vs. unstructured text) to illustrate some of the challenges involved in converting our published knowledge into semantically annotated, machine actionable form.
Footnote
Vgl.: http://dlib.ukoln.ac.uk/dlib/march06/jones/03jones.html.
Theme
Information

Similar documents (author)

  1. Mimno, D.; Crane, G.; Jones, A.: Hierarchical catalog records : implementing a FRBR catalog (2005) 4.36
    4.359 = sum of:
      4.359 = sum of:
        1.2398229 = weight(author_txt:jones in 3184) [ClassicSimilarity], result of:
          1.2398229 = score(doc=3184,freq=1.0), product of:
            0.4755606 = queryWeight, product of:
              6.9522038 = idf(docFreq=109, maxDocs=42306)
              0.068404295 = queryNorm
            2.6070764 = fieldWeight in 3184, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              6.9522038 = idf(docFreq=109, maxDocs=42306)
              0.375 = fieldNorm(doc=3184)
        3.1191773 = weight(author_txt:crane in 3184) [ClassicSimilarity], result of:
          3.1191773 = score(doc=3184,freq=1.0), product of:
            0.8796829 = queryWeight, product of:
              1.3600665 = boost
              9.45546 = idf(docFreq=8, maxDocs=42306)
              0.068404295 = queryNorm
            3.5457973 = fieldWeight in 3184, product of:
              1.0 = tf(freq=1.0), with freq of:
                1.0 = termFreq=1.0
              9.45546 = idf(docFreq=8, maxDocs=42306)
              0.375 = fieldNorm(doc=3184)
    
  2. Crane, D.: Creating services for the digital library (1996) 2.60
    2.5993145 = sum of:
      2.5993145 = product of:
        5.198629 = sum of:
          5.198629 = weight(author_txt:crane in 137) [ClassicSimilarity], result of:
            5.198629 = score(doc=137,freq=1.0), product of:
              0.8796829 = queryWeight, product of:
                1.3600665 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.068404295 = queryNorm
              5.9096622 = fieldWeight in 137, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.625 = fieldNorm(doc=137)
        0.5 = coord(1/2)
    
  3. Crane, D.: Information needs and uses (1971) 2.60
    2.5993145 = sum of:
      2.5993145 = product of:
        5.198629 = sum of:
          5.198629 = weight(author_txt:crane in 248) [ClassicSimilarity], result of:
            5.198629 = score(doc=248,freq=1.0), product of:
              0.8796829 = queryWeight, product of:
                1.3600665 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.068404295 = queryNorm
              5.9096622 = fieldWeight in 248, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.625 = fieldNorm(doc=248)
        0.5 = coord(1/2)
    
  4. Crane, G.: What do you do with a million books? (2006) 2.60
    2.5993145 = sum of:
      2.5993145 = product of:
        5.198629 = sum of:
          5.198629 = weight(author_txt:crane in 3181) [ClassicSimilarity], result of:
            5.198629 = score(doc=3181,freq=1.0), product of:
              0.8796829 = queryWeight, product of:
                1.3600665 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.068404295 = queryNorm
              5.9096622 = fieldWeight in 3181, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.625 = fieldNorm(doc=3181)
        0.5 = coord(1/2)
    
  5. Crane, G.: ¬The Perseus Project and beyond : how building a digital library challenges the humanities and technology (1998) 2.60
    2.5993145 = sum of:
      2.5993145 = product of:
        5.198629 = sum of:
          5.198629 = weight(author_txt:crane in 3252) [ClassicSimilarity], result of:
            5.198629 = score(doc=3252,freq=1.0), product of:
              0.8796829 = queryWeight, product of:
                1.3600665 = boost
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.068404295 = queryNorm
              5.9096622 = fieldWeight in 3252, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.45546 = idf(docFreq=8, maxDocs=42306)
                0.625 = fieldNorm(doc=3252)
        0.5 = coord(1/2)
    

Similar documents (content)

  1. Hill, L.L.; Frew, J.; Zheng, Q.: Geographic names : the implementation of a gazetteer in a georeferenced digital library (1999) 0.55
    0.55357385 = sum of:
      0.55357385 = product of:
        1.1532788 = sum of:
          0.2221725 = weight(abstract_txt:geographic in 3241) [ClassicSimilarity], result of:
            0.2221725 = score(doc=3241,freq=15.0), product of:
              0.1867563 = queryWeight, product of:
                1.0287837 = boost
                6.552818 = idf(docFreq=163, maxDocs=42306)
                0.027702764 = queryNorm
              1.1896386 = fieldWeight in 3241, product of:
                3.8729835 = tf(freq=15.0), with freq of:
                  15.0 = termFreq=15.0
                6.552818 = idf(docFreq=163, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.022711325 = weight(abstract_txt:provide in 3241) [ClassicSimilarity], result of:
            0.022711325 = score(doc=3241,freq=1.0), product of:
              0.11938692 = queryWeight, product of:
                1.0619136 = boost
                4.058303 = idf(docFreq=1986, maxDocs=42306)
                0.027702764 = queryNorm
              0.19023295 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.058303 = idf(docFreq=1986, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.02892906 = weight(abstract_txt:digital in 3241) [ClassicSimilarity], result of:
            0.02892906 = score(doc=3241,freq=1.0), product of:
              0.14028718 = queryWeight, product of:
                1.1511178 = boost
                4.399214 = idf(docFreq=1412, maxDocs=42306)
                0.027702764 = queryNorm
              0.20621315 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.399214 = idf(docFreq=1412, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.029883046 = weight(abstract_txt:reference in 3241) [ClassicSimilarity], result of:
            0.029883046 = score(doc=3241,freq=1.0), product of:
              0.14335461 = queryWeight, product of:
                1.1636347 = boost
                4.447049 = idf(docFreq=1346, maxDocs=42306)
                0.027702764 = queryNorm
              0.20845543 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.447049 = idf(docFreq=1346, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.025410727 = weight(abstract_txt:about in 3241) [ClassicSimilarity], result of:
            0.025410727 = score(doc=3241,freq=1.0), product of:
              0.13673097 = queryWeight, product of:
                1.2449012 = boost
                3.964687 = idf(docFreq=2181, maxDocs=42306)
                0.027702764 = queryNorm
              0.1858447 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.964687 = idf(docFreq=2181, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.027815724 = weight(abstract_txt:such in 3241) [ClassicSimilarity], result of:
            0.027815724 = score(doc=3241,freq=2.0), product of:
              0.12134486 = queryWeight, product of:
                1.2667342 = boost
                3.4579027 = idf(docFreq=3621, maxDocs=42306)
                0.027702764 = queryNorm
              0.2292287 = fieldWeight in 3241, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4579027 = idf(docFreq=3621, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.05313035 = weight(abstract_txt:place in 3241) [ClassicSimilarity], result of:
            0.05313035 = score(doc=3241,freq=1.0), product of:
              0.21038924 = queryWeight, product of:
                1.4096866 = boost
                5.387383 = idf(docFreq=525, maxDocs=42306)
                0.027702764 = queryNorm
              0.25253358 = fieldWeight in 3241, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.387383 = idf(docFreq=525, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.026960248 = weight(abstract_txt:with in 3241) [ClassicSimilarity], result of:
            0.026960248 = score(doc=3241,freq=5.0), product of:
              0.10180331 = queryWeight, product of:
                1.4544644 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.027702764 = queryNorm
              0.26482683 = fieldWeight in 3241, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.14527698 = weight(abstract_txt:name in 3241) [ClassicSimilarity], result of:
            0.14527698 = score(doc=3241,freq=5.0), product of:
              0.2405856 = queryWeight, product of:
                1.5074594 = boost
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.027702764 = queryNorm
              0.6038474 = fieldWeight in 3241, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.020049907 = weight(abstract_txt:that in 3241) [ClassicSimilarity], result of:
            0.020049907 = score(doc=3241,freq=2.0), product of:
              0.12576711 = queryWeight, product of:
                1.8877957 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.027702764 = queryNorm
              0.15942091 = fieldWeight in 3241, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.31546795 = weight(abstract_txt:names in 3241) [ClassicSimilarity], result of:
            0.31546795 = score(doc=3241,freq=11.0), product of:
              0.34701136 = queryWeight, product of:
                2.1421342 = boost
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.027702764 = queryNorm
              0.9090998 = fieldWeight in 3241, product of:
                3.3166249 = tf(freq=11.0), with freq of:
                  11.0 = termFreq=11.0
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
          0.23547097 = weight(abstract_txt:places in 3241) [ClassicSimilarity], result of:
            0.23547097 = score(doc=3241,freq=3.0), product of:
              0.4182539 = queryWeight, product of:
                2.177316 = boost
                6.9341855 = idf(docFreq=111, maxDocs=42306)
                0.027702764 = queryNorm
              0.5629857 = fieldWeight in 3241, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.9341855 = idf(docFreq=111, maxDocs=42306)
                0.046875 = fieldNorm(doc=3241)
        0.48 = coord(12/25)
    
  2. Lutz, R.; Green, S.: Data stewardship : the care and handling of named entries (1999) 0.49
    0.4945472 = sum of:
      0.4945472 = product of:
        0.9510523 = sum of:
          0.052683234 = weight(abstract_txt:person in 711) [ClassicSimilarity], result of:
            0.052683234 = score(doc=711,freq=1.0), product of:
              0.17645222 = queryWeight, product of:
                6.3694806 = idf(docFreq=196, maxDocs=42306)
                0.027702764 = queryNorm
              0.2985694 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3694806 = idf(docFreq=196, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.013899315 = weight(abstract_txt:have in 711) [ClassicSimilarity], result of:
            0.013899315 = score(doc=711,freq=1.0), product of:
              0.091450155 = queryWeight, product of:
                1.0181075 = boost
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.027702764 = queryNorm
              0.15198788 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.01707746 = weight(abstract_txt:which in 711) [ClassicSimilarity], result of:
            0.01707746 = score(doc=711,freq=2.0), product of:
              0.08765499 = queryWeight, product of:
                1.0766218 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.027702764 = queryNorm
              0.19482589 = fieldWeight in 711, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.06720019 = weight(abstract_txt:named in 711) [ClassicSimilarity], result of:
            0.06720019 = score(doc=711,freq=1.0), product of:
              0.20753556 = queryWeight, product of:
                1.0845078 = boost
                6.907752 = idf(docFreq=114, maxDocs=42306)
                0.027702764 = queryNorm
              0.32380086 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.907752 = idf(docFreq=114, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.029883046 = weight(abstract_txt:reference in 711) [ClassicSimilarity], result of:
            0.029883046 = score(doc=711,freq=1.0), product of:
              0.14335461 = queryWeight, product of:
                1.1636347 = boost
                4.447049 = idf(docFreq=1346, maxDocs=42306)
                0.027702764 = queryNorm
              0.20845543 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.447049 = idf(docFreq=1346, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.024525922 = weight(abstract_txt:than in 711) [ClassicSimilarity], result of:
            0.024525922 = score(doc=711,freq=1.0), product of:
              0.13353826 = queryWeight, product of:
                1.230281 = boost
                3.9181254 = idf(docFreq=2285, maxDocs=42306)
                0.027702764 = queryNorm
              0.18366213 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9181254 = idf(docFreq=2285, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.019668689 = weight(abstract_txt:such in 711) [ClassicSimilarity], result of:
            0.019668689 = score(doc=711,freq=1.0), product of:
              0.12134486 = queryWeight, product of:
                1.2667342 = boost
                3.4579027 = idf(docFreq=3621, maxDocs=42306)
                0.027702764 = queryNorm
              0.16208918 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4579027 = idf(docFreq=3621, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.0317158 = weight(abstract_txt:over in 711) [ClassicSimilarity], result of:
            0.0317158 = score(doc=711,freq=1.0), product of:
              0.15850365 = queryWeight, product of:
                1.3403589 = boost
                4.268695 = idf(docFreq=1609, maxDocs=42306)
                0.027702764 = queryNorm
              0.20009507 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.268695 = idf(docFreq=1609, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.01205699 = weight(abstract_txt:with in 711) [ClassicSimilarity], result of:
            0.01205699 = score(doc=711,freq=1.0), product of:
              0.10180331 = queryWeight, product of:
                1.4544644 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.027702764 = queryNorm
              0.11843416 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.19490954 = weight(abstract_txt:name in 711) [ClassicSimilarity], result of:
            0.19490954 = score(doc=711,freq=9.0), product of:
              0.2405856 = queryWeight, product of:
                1.5074594 = boost
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.027702764 = queryNorm
              0.81014633 = fieldWeight in 711, product of:
                3.0 = tf(freq=9.0), with freq of:
                  9.0 = termFreq=9.0
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.06945362 = weight(abstract_txt:entities in 711) [ClassicSimilarity], result of:
            0.06945362 = score(doc=711,freq=1.0), product of:
              0.2515311 = queryWeight, product of:
                1.5413691 = boost
                5.8906326 = idf(docFreq=317, maxDocs=42306)
                0.027702764 = queryNorm
              0.2761234 = fieldWeight in 711, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8906326 = idf(docFreq=317, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.037509937 = weight(abstract_txt:that in 711) [ClassicSimilarity], result of:
            0.037509937 = score(doc=711,freq=7.0), product of:
              0.12576711 = queryWeight, product of:
                1.8877957 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.027702764 = queryNorm
              0.29824919 = fieldWeight in 711, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
          0.38046864 = weight(abstract_txt:names in 711) [ClassicSimilarity], result of:
            0.38046864 = score(doc=711,freq=16.0), product of:
              0.34701136 = queryWeight, product of:
                2.1421342 = boost
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.027702764 = queryNorm
              1.0964155 = fieldWeight in 711, product of:
                4.0 = tf(freq=16.0), with freq of:
                  16.0 = termFreq=16.0
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.046875 = fieldNorm(doc=711)
        0.52 = coord(13/25)
    
  3. Shaw, R.; Buckland, M.: Open identification and linking of the four Ws (2008) 0.36
    0.3627778 = sum of:
      0.3627778 = product of:
        0.6478175 = sum of:
          0.043902695 = weight(abstract_txt:person in 485) [ClassicSimilarity], result of:
            0.043902695 = score(doc=485,freq=1.0), product of:
              0.17645222 = queryWeight, product of:
                6.3694806 = idf(docFreq=196, maxDocs=42306)
                0.027702764 = queryNorm
              0.24880783 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3694806 = idf(docFreq=196, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.011582764 = weight(abstract_txt:have in 485) [ClassicSimilarity], result of:
            0.011582764 = score(doc=485,freq=1.0), product of:
              0.091450155 = queryWeight, product of:
                1.0181075 = boost
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.027702764 = queryNorm
              0.12665658 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.026765553 = weight(abstract_txt:provide in 485) [ClassicSimilarity], result of:
            0.026765553 = score(doc=485,freq=2.0), product of:
              0.11938692 = queryWeight, product of:
                1.0619136 = boost
                4.058303 = idf(docFreq=1986, maxDocs=42306)
                0.027702764 = queryNorm
              0.22419168 = fieldWeight in 485, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.058303 = idf(docFreq=1986, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.014231219 = weight(abstract_txt:which in 485) [ClassicSimilarity], result of:
            0.014231219 = score(doc=485,freq=2.0), product of:
              0.08765499 = queryWeight, product of:
                1.0766218 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.027702764 = queryNorm
              0.16235492 = fieldWeight in 485, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.02410755 = weight(abstract_txt:digital in 485) [ClassicSimilarity], result of:
            0.02410755 = score(doc=485,freq=1.0), product of:
              0.14028718 = queryWeight, product of:
                1.1511178 = boost
                4.399214 = idf(docFreq=1412, maxDocs=42306)
                0.027702764 = queryNorm
              0.17184429 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.399214 = idf(docFreq=1412, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.020438269 = weight(abstract_txt:than in 485) [ClassicSimilarity], result of:
            0.020438269 = score(doc=485,freq=1.0), product of:
              0.13353826 = queryWeight, product of:
                1.230281 = boost
                3.9181254 = idf(docFreq=2285, maxDocs=42306)
                0.027702764 = queryNorm
              0.15305178 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.9181254 = idf(docFreq=2285, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.021175604 = weight(abstract_txt:about in 485) [ClassicSimilarity], result of:
            0.021175604 = score(doc=485,freq=1.0), product of:
              0.13673097 = queryWeight, product of:
                1.2449012 = boost
                3.964687 = idf(docFreq=2181, maxDocs=42306)
                0.027702764 = queryNorm
              0.15487058 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.964687 = idf(docFreq=2181, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.023179771 = weight(abstract_txt:such in 485) [ClassicSimilarity], result of:
            0.023179771 = score(doc=485,freq=2.0), product of:
              0.12134486 = queryWeight, product of:
                1.2667342 = boost
                3.4579027 = idf(docFreq=3621, maxDocs=42306)
                0.027702764 = queryNorm
              0.19102393 = fieldWeight in 485, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4579027 = idf(docFreq=3621, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.026429834 = weight(abstract_txt:over in 485) [ClassicSimilarity], result of:
            0.026429834 = score(doc=485,freq=1.0), product of:
              0.15850365 = queryWeight, product of:
                1.3403589 = boost
                4.268695 = idf(docFreq=1609, maxDocs=42306)
                0.027702764 = queryNorm
              0.1667459 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.268695 = idf(docFreq=1609, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.044275288 = weight(abstract_txt:place in 485) [ClassicSimilarity], result of:
            0.044275288 = score(doc=485,freq=1.0), product of:
              0.21038924 = queryWeight, product of:
                1.4096866 = boost
                5.387383 = idf(docFreq=525, maxDocs=42306)
                0.027702764 = queryNorm
              0.21044464 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.387383 = idf(docFreq=525, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.017402766 = weight(abstract_txt:with in 485) [ClassicSimilarity], result of:
            0.017402766 = score(doc=485,freq=3.0), product of:
              0.10180331 = queryWeight, product of:
                1.4544644 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.027702764 = queryNorm
              0.17094499 = fieldWeight in 485, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.057878017 = weight(abstract_txt:entities in 485) [ClassicSimilarity], result of:
            0.057878017 = score(doc=485,freq=1.0), product of:
              0.2515311 = queryWeight, product of:
                1.5413691 = boost
                5.8906326 = idf(docFreq=317, maxDocs=42306)
                0.027702764 = queryNorm
              0.23010284 = fieldWeight in 485, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8906326 = idf(docFreq=317, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.016708255 = weight(abstract_txt:that in 485) [ClassicSimilarity], result of:
            0.016708255 = score(doc=485,freq=2.0), product of:
              0.12576711 = queryWeight, product of:
                1.8877957 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.027702764 = queryNorm
              0.13285075 = fieldWeight in 485, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
          0.29973987 = weight(abstract_txt:places in 485) [ClassicSimilarity], result of:
            0.29973987 = score(doc=485,freq=7.0), product of:
              0.4182539 = queryWeight, product of:
                2.177316 = boost
                6.9341855 = idf(docFreq=111, maxDocs=42306)
                0.027702764 = queryNorm
              0.7166457 = fieldWeight in 485, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.9341855 = idf(docFreq=111, maxDocs=42306)
                0.0390625 = fieldNorm(doc=485)
        0.56 = coord(14/25)
    
  4. Bishop, B.W.; Moulaison, H.L.; Burwell, C.L.: Geographic knowledge organization : critical cartographic cataloging and place-names in the geoweb (2015) 0.34
    0.34454924 = sum of:
      0.34454924 = product of:
        0.86137307 = sum of:
          0.018532421 = weight(abstract_txt:have in 4201) [ClassicSimilarity], result of:
            0.018532421 = score(doc=4201,freq=1.0), product of:
              0.091450155 = queryWeight, product of:
                1.0181075 = boost
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.027702764 = queryNorm
              0.20265052 = fieldWeight in 4201, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.07648625 = weight(abstract_txt:geographic in 4201) [ClassicSimilarity], result of:
            0.07648625 = score(doc=4201,freq=1.0), product of:
              0.1867563 = queryWeight, product of:
                1.0287837 = boost
                6.552818 = idf(docFreq=163, maxDocs=42306)
                0.027702764 = queryNorm
              0.4095511 = fieldWeight in 4201, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.552818 = idf(docFreq=163, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.089600265 = weight(abstract_txt:named in 4201) [ClassicSimilarity], result of:
            0.089600265 = score(doc=4201,freq=1.0), product of:
              0.20753556 = queryWeight, product of:
                1.0845078 = boost
                6.907752 = idf(docFreq=114, maxDocs=42306)
                0.027702764 = queryNorm
              0.4317345 = fieldWeight in 4201, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.907752 = idf(docFreq=114, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.04196398 = weight(abstract_txt:knowledge in 4201) [ClassicSimilarity], result of:
            0.04196398 = score(doc=4201,freq=2.0), product of:
              0.13176085 = queryWeight, product of:
                1.319982 = boost
                3.6032572 = idf(docFreq=3131, maxDocs=42306)
                0.027702764 = queryNorm
              0.31848595 = fieldWeight in 4201, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.6032572 = idf(docFreq=3131, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.14168093 = weight(abstract_txt:place in 4201) [ClassicSimilarity], result of:
            0.14168093 = score(doc=4201,freq=4.0), product of:
              0.21038924 = queryWeight, product of:
                1.4096866 = boost
                5.387383 = idf(docFreq=525, maxDocs=42306)
                0.027702764 = queryNorm
              0.6734229 = fieldWeight in 4201, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.387383 = idf(docFreq=525, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.016075986 = weight(abstract_txt:with in 4201) [ClassicSimilarity], result of:
            0.016075986 = score(doc=4201,freq=1.0), product of:
              0.10180331 = queryWeight, product of:
                1.4544644 = boost
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.027702764 = queryNorm
              0.15791221 = fieldWeight in 4201, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5265954 = idf(docFreq=9191, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.15004143 = weight(abstract_txt:name in 4201) [ClassicSimilarity], result of:
            0.15004143 = score(doc=4201,freq=3.0), product of:
              0.2405856 = queryWeight, product of:
                1.5074594 = boost
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.027702764 = queryNorm
              0.6236509 = fieldWeight in 4201, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.018903233 = weight(abstract_txt:that in 4201) [ClassicSimilarity], result of:
            0.018903233 = score(doc=4201,freq=1.0), product of:
              0.12576711 = queryWeight, product of:
                1.8877957 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.027702764 = queryNorm
              0.15030347 = fieldWeight in 4201, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.12682287 = weight(abstract_txt:names in 4201) [ClassicSimilarity], result of:
            0.12682287 = score(doc=4201,freq=1.0), product of:
              0.34701136 = queryWeight, product of:
                2.1421342 = boost
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.027702764 = queryNorm
              0.36547184 = fieldWeight in 4201, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
          0.18126564 = weight(abstract_txt:places in 4201) [ClassicSimilarity], result of:
            0.18126564 = score(doc=4201,freq=1.0), product of:
              0.4182539 = queryWeight, product of:
                2.177316 = boost
                6.9341855 = idf(docFreq=111, maxDocs=42306)
                0.027702764 = queryNorm
              0.4333866 = fieldWeight in 4201, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.9341855 = idf(docFreq=111, maxDocs=42306)
                0.0625 = fieldNorm(doc=4201)
        0.4 = coord(10/25)
    
  5. Shaalan, K.; Raza, H.: NERA: Named Entity Recognition for Arabic (2009) 0.33
    0.32938212 = sum of:
      0.32938212 = product of:
        0.8234553 = sum of:
          0.061463773 = weight(abstract_txt:person in 773) [ClassicSimilarity], result of:
            0.061463773 = score(doc=773,freq=1.0), product of:
              0.17645222 = queryWeight, product of:
                6.3694806 = idf(docFreq=196, maxDocs=42306)
                0.027702764 = queryNorm
              0.34833097 = fieldWeight in 773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.3694806 = idf(docFreq=196, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.022932703 = weight(abstract_txt:have in 773) [ClassicSimilarity], result of:
            0.022932703 = score(doc=773,freq=2.0), product of:
              0.091450155 = queryWeight, product of:
                1.0181075 = boost
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.027702764 = queryNorm
              0.25076723 = fieldWeight in 773, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.2424083 = idf(docFreq=4492, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.014088188 = weight(abstract_txt:which in 773) [ClassicSimilarity], result of:
            0.014088188 = score(doc=773,freq=1.0), product of:
              0.08765499 = queryWeight, product of:
                1.0766218 = boost
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.027702764 = queryNorm
              0.16072316 = fieldWeight in 773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.938938 = idf(docFreq=6085, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.20742752 = weight(abstract_txt:named in 773) [ClassicSimilarity], result of:
            0.20742752 = score(doc=773,freq=7.0), product of:
              0.20753556 = queryWeight, product of:
                1.0845078 = boost
                6.907752 = idf(docFreq=114, maxDocs=42306)
                0.027702764 = queryNorm
              0.99947935 = fieldWeight in 773, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                6.907752 = idf(docFreq=114, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.028838571 = weight(abstract_txt:time in 773) [ClassicSimilarity], result of:
            0.028838571 = score(doc=773,freq=1.0), product of:
              0.12632221 = queryWeight, product of:
                1.092322 = boost
                4.1745143 = idf(docFreq=1768, maxDocs=42306)
                0.027702764 = queryNorm
              0.22829375 = fieldWeight in 773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.1745143 = idf(docFreq=1768, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.029645847 = weight(abstract_txt:about in 773) [ClassicSimilarity], result of:
            0.029645847 = score(doc=773,freq=1.0), product of:
              0.13673097 = queryWeight, product of:
                1.2449012 = boost
                3.964687 = idf(docFreq=2181, maxDocs=42306)
                0.027702764 = queryNorm
              0.21681882 = fieldWeight in 773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.964687 = idf(docFreq=2181, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.16948982 = weight(abstract_txt:name in 773) [ClassicSimilarity], result of:
            0.16948982 = score(doc=773,freq=5.0), product of:
              0.2405856 = queryWeight, product of:
                1.5074594 = boost
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.027702764 = queryNorm
              0.70448864 = fieldWeight in 773, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                5.76104 = idf(docFreq=361, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.16205846 = weight(abstract_txt:entities in 773) [ClassicSimilarity], result of:
            0.16205846 = score(doc=773,freq=4.0), product of:
              0.2515311 = queryWeight, product of:
                1.5413691 = boost
                5.8906326 = idf(docFreq=317, maxDocs=42306)
                0.027702764 = queryNorm
              0.64428794 = fieldWeight in 773, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                5.8906326 = idf(docFreq=317, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.016540328 = weight(abstract_txt:that in 773) [ClassicSimilarity], result of:
            0.016540328 = score(doc=773,freq=1.0), product of:
              0.12576711 = queryWeight, product of:
                1.8877957 = boost
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.027702764 = queryNorm
              0.13151553 = fieldWeight in 773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4048555 = idf(docFreq=10381, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
          0.11097002 = weight(abstract_txt:names in 773) [ClassicSimilarity], result of:
            0.11097002 = score(doc=773,freq=1.0), product of:
              0.34701136 = queryWeight, product of:
                2.1421342 = boost
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.027702764 = queryNorm
              0.31978786 = fieldWeight in 773, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8475494 = idf(docFreq=331, maxDocs=42306)
                0.0546875 = fieldNorm(doc=773)
        0.4 = coord(10/25)