Search (30 results, page 1 of 2)

  • × theme_ss:"Data Mining"
  1. Fong, A.C.M.: Mining a Web citation database for document clustering (2002) 0.04
    0.03617032 = product of:
      0.108510956 = sum of:
        0.108510956 = product of:
          0.21702191 = sum of:
            0.21702191 = weight(_text_:2002 in 3940) [ClassicSimilarity], result of:
              0.21702191 = score(doc=3940,freq=5.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                1.0483589 = fieldWeight in 3940, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.109375 = fieldNorm(doc=3940)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Applied artificial intelligence. 16(2002) no.4, S.283-292
    Year
    2002
  2. Keim, D.A.: Data Mining mit bloßem Auge (2002) 0.03
    0.031003129 = product of:
      0.09300938 = sum of:
        0.09300938 = product of:
          0.18601876 = sum of:
            0.18601876 = weight(_text_:2002 in 1086) [ClassicSimilarity], result of:
              0.18601876 = score(doc=1086,freq=5.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.8985933 = fieldWeight in 1086, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1086)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Spektrum der Wissenschaft. 2002, H.11, S.88-91
    Year
    2002
  3. Kruse, R.; Borgelt, C.: Suche im Datendschungel (2002) 0.03
    0.031003129 = product of:
      0.09300938 = sum of:
        0.09300938 = product of:
          0.18601876 = sum of:
            0.18601876 = weight(_text_:2002 in 1087) [ClassicSimilarity], result of:
              0.18601876 = score(doc=1087,freq=5.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.8985933 = fieldWeight in 1087, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1087)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Spektrum der Wissenschaft. 2002, H.11, S.80-81
    Year
    2002
  4. Wrobel, S.: Lern- und Entdeckungsverfahren (2002) 0.03
    0.031003129 = product of:
      0.09300938 = sum of:
        0.09300938 = product of:
          0.18601876 = sum of:
            0.18601876 = weight(_text_:2002 in 1105) [ClassicSimilarity], result of:
              0.18601876 = score(doc=1105,freq=5.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.8985933 = fieldWeight in 1105, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.09375 = fieldNorm(doc=1105)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Spektrum der Wissenschaft. 2002, H.11, S.85-87
    Year
    2002
  5. Handbuch Web Mining im Marketing : Konzepte, Systeme, Fallstudien (2002) 0.03
    0.028017405 = product of:
      0.08405221 = sum of:
        0.08405221 = product of:
          0.16810443 = sum of:
            0.16810443 = weight(_text_:2002 in 6106) [ClassicSimilarity], result of:
              0.16810443 = score(doc=6106,freq=3.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.81205523 = fieldWeight in 6106, product of:
                  1.7320508 = tf(freq=3.0), with freq of:
                    3.0 = termFreq=3.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.109375 = fieldNorm(doc=6106)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Year
    2002
  6. Borgelt, C.; Kruse, R.: Unsicheres Wissen nutzen (2002) 0.03
    0.025835939 = product of:
      0.077507816 = sum of:
        0.077507816 = product of:
          0.15501563 = sum of:
            0.15501563 = weight(_text_:2002 in 1104) [ClassicSimilarity], result of:
              0.15501563 = score(doc=1104,freq=5.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.74882776 = fieldWeight in 1104, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.078125 = fieldNorm(doc=1104)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Spektrum der Wissenschaft. 2002, H.11, S.82-84
    Year
    2002
  7. Tiefschürfen in Datenbanken (2002) 0.02
    0.020668752 = product of:
      0.062006254 = sum of:
        0.062006254 = product of:
          0.12401251 = sum of:
            0.12401251 = weight(_text_:2002 in 996) [ClassicSimilarity], result of:
              0.12401251 = score(doc=996,freq=5.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.5990622 = fieldWeight in 996, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.0625 = fieldNorm(doc=996)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Spektrum der Wissenschaft. 2002, H.11, S.80-91
    Year
    2002
  8. Medien-Informationsmanagement : Archivarische, dokumentarische, betriebswirtschaftliche, rechtliche und Berufsbild-Aspekte ; [Frühjahrstagung der Fachgruppe 7 im Jahr 2000 in Weimar und Folgetagung 2001 in Köln] (2003) 0.02
    0.020408092 = product of:
      0.061224274 = sum of:
        0.061224274 = sum of:
          0.04159506 = weight(_text_:2002 in 1833) [ClassicSimilarity], result of:
            0.04159506 = score(doc=1833,freq=4.0), product of:
              0.20701107 = queryWeight, product of:
                4.28654 = idf(docFreq=1652, maxDocs=44218)
                0.048293278 = queryNorm
              0.20093156 = fieldWeight in 1833, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.28654 = idf(docFreq=1652, maxDocs=44218)
                0.0234375 = fieldNorm(doc=1833)
          0.019629216 = weight(_text_:22 in 1833) [ClassicSimilarity], result of:
            0.019629216 = score(doc=1833,freq=2.0), product of:
              0.16911483 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.048293278 = queryNorm
              0.116070345 = fieldWeight in 1833, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0234375 = fieldNorm(doc=1833)
      0.33333334 = coord(1/3)
    
    Classification
    P96.A72M43 2002
    Date
    11. 5.2008 19:49:22
    LCC
    P96.A72M43 2002
  9. Lischka, K.: Spurensuche im Datenwust : Data-Mining-Software fahndet nach kriminellen Mitarbeitern, guten Kunden - und bald vielleicht auch nach Terroristen (2002) 0.02
    0.01855053 = product of:
      0.05565159 = sum of:
        0.05565159 = sum of:
          0.036022376 = weight(_text_:2002 in 1178) [ClassicSimilarity], result of:
            0.036022376 = score(doc=1178,freq=3.0), product of:
              0.20701107 = queryWeight, product of:
                4.28654 = idf(docFreq=1652, maxDocs=44218)
                0.048293278 = queryNorm
              0.17401183 = fieldWeight in 1178, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.28654 = idf(docFreq=1652, maxDocs=44218)
                0.0234375 = fieldNorm(doc=1178)
          0.019629216 = weight(_text_:22 in 1178) [ClassicSimilarity], result of:
            0.019629216 = score(doc=1178,freq=2.0), product of:
              0.16911483 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.048293278 = queryNorm
              0.116070345 = fieldWeight in 1178, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0234375 = fieldNorm(doc=1178)
      0.33333334 = coord(1/3)
    
    Content
    "Ob man als Terrorist einen Anschlag gegen die Vereinigten Staaten plant, als Kassierer Scheine aus der Kasse unterschlägt oder für bestimmte Produkte besonders gerne Geld ausgibt - einen Unterschied macht Data-Mining-Software da nicht. Solche Programme analysieren riesige Daten- mengen und fällen statistische Urteile. Mit diesen Methoden wollen nun die For- scher des "Information Awaren in den Vereinigten Staaten Spuren von Terroristen in den Datenbanken von Behörden und privaten Unternehmen wie Kreditkartenfirmen finden. 200 Millionen Dollar umfasst der Jahresetat für die verschiedenen Forschungsprojekte. Dass solche Software in der Praxis funktioniert, zeigen die steigenden Umsätze der Anbieter so genannter Customer-Relationship-Management-Software. Im vergangenen Jahr ist das Potenzial für analytische CRM-Anwendungen laut dem Marktforschungsinstitut IDC weltweit um 22 Prozent gewachsen, bis zum Jahr 2006 soll es in Deutschland mit einem jährlichen Plus von 14,1 Prozent so weitergehen. Und das trotz schwacher Konjunktur - oder gerade deswegen. Denn ähnlich wie Data-Mining der USRegierung helfen soll, Terroristen zu finden, entscheiden CRM-Programme heute, welche Kunden für eine Firma profitabel sind. Und welche es künftig sein werden, wie Manuela Schnaubelt, Sprecherin des CRM-Anbieters SAP, beschreibt: "Die Kundenbewertung ist ein zentraler Bestandteil des analytischen CRM. Sie ermöglicht es Unternehmen, sich auf die für sie wichtigen und richtigen Kunden zu fokussieren. Darüber hinaus können Firmen mit speziellen Scoring- Verfahren ermitteln, welche Kunden langfristig in welchem Maße zum Unternehmenserfolg beitragen." Die Folgen der Bewertungen sind für die Betroffenen nicht immer positiv: Attraktive Kunden profitieren von individuellen Sonderangeboten und besonderer Zuwendung. Andere hängen vielleicht so lauge in der Warteschleife des Telefonservice, bis die profitableren Kunden abgearbeitet sind. So könnte eine praktische Umsetzung dessen aussehen, was SAP-Spreche-rin Schnaubelt abstrakt beschreibt: "In vielen Unternehmen wird Kundenbewertung mit der klassischen ABC-Analyse durchgeführt, bei der Kunden anhand von Daten wie dem Umsatz kategorisiert werden. A-Kunden als besonders wichtige Kunden werden anders betreut als C-Kunden." Noch näher am geplanten Einsatz von Data-Mining zur Terroristenjagd ist eine Anwendung, die heute viele Firmen erfolgreich nutzen: Sie spüren betrügende Mitarbeiter auf. Werner Sülzer vom großen CRM-Anbieter NCR Teradata beschreibt die Möglichkeiten so: "Heute hinterlässt praktisch jeder Täter - ob Mitarbeiter, Kunde oder Lieferant - Datenspuren bei seinen wirtschaftskriminellen Handlungen. Es muss vorrangig darum gehen, einzelne Spuren zu Handlungsmustern und Täterprofilen zu verdichten. Das gelingt mittels zentraler Datenlager und hoch entwickelter Such- und Analyseinstrumente." Von konkreten Erfolgen sprich: Entlas-sungen krimineller Mitarbeiter-nach Einsatz solcher Programme erzählen Unternehmen nicht gerne. Matthias Wilke von der "Beratungsstelle für Technologiefolgen und Qualifizierung" (BTQ) der Gewerkschaft Verdi weiß von einem Fall 'aus der Schweiz. Dort setzt die Handelskette "Pick Pay" das Programm "Lord Lose Prevention" ein. Zwei Monate nach Einfüh-rung seien Unterschlagungen im Wert von etwa 200 000 Franken ermittelt worden. Das kostete mehr als 50 verdächtige Kassiererinnen und Kassierer den Job.
    Year
    2002
  10. Bath, P.A.: Data mining in health and medical information (2003) 0.02
    0.018486694 = product of:
      0.055460077 = sum of:
        0.055460077 = product of:
          0.11092015 = sum of:
            0.11092015 = weight(_text_:2002 in 4263) [ClassicSimilarity], result of:
              0.11092015 = score(doc=4263,freq=4.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.5358175 = fieldWeight in 4263, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4263)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Data mining (DM) is part of a process by which information can be extracted from data or databases and used to inform decision making in a variety of contexts (Benoit, 2002; Michalski, Bratka & Kubat, 1997). DM includes a range of tools and methods for extractiog information; their use in the commercial sector for knowledge extraction and discovery has been one of the main driving forces in their development (Adriaans & Zantinge, 1996; Benoit, 2002). DM has been developed and applied in numerous areas. This review describes its use in analyzing health and medical information.
  11. Information visualization in data mining and knowledge discovery (2002) 0.02
    0.016589846 = product of:
      0.049769536 = sum of:
        0.049769536 = sum of:
          0.03668339 = weight(_text_:2002 in 1789) [ClassicSimilarity], result of:
            0.03668339 = score(doc=1789,freq=7.0), product of:
              0.20701107 = queryWeight, product of:
                4.28654 = idf(docFreq=1652, maxDocs=44218)
                0.048293278 = queryNorm
              0.17720498 = fieldWeight in 1789, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                4.28654 = idf(docFreq=1652, maxDocs=44218)
                0.015625 = fieldNorm(doc=1789)
          0.013086145 = weight(_text_:22 in 1789) [ClassicSimilarity], result of:
            0.013086145 = score(doc=1789,freq=2.0), product of:
              0.16911483 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.048293278 = queryNorm
              0.07738023 = fieldWeight in 1789, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.015625 = fieldNorm(doc=1789)
      0.33333334 = coord(1/3)
    
    Classification
    TK7882.I6I635 2002
    Date
    23. 3.2008 19:10:22
    LCC
    TK7882.I6I635 2002
    Year
    2002
  12. Benoit, G.: Data mining (2002) 0.02
    0.015501564 = product of:
      0.04650469 = sum of:
        0.04650469 = product of:
          0.09300938 = sum of:
            0.09300938 = weight(_text_:2002 in 4296) [ClassicSimilarity], result of:
              0.09300938 = score(doc=4296,freq=5.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.44929665 = fieldWeight in 4296, product of:
                  2.236068 = tf(freq=5.0), with freq of:
                    5.0 = termFreq=5.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4296)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Annual review of information science and technology. 36(2002), S.265-312
    Year
    2002
  13. Raan, A.F.J. van; Noyons, E.C.M.: Discovery of patterns of scientific and technological development and knowledge transfer (2002) 0.02
    0.015284747 = product of:
      0.04585424 = sum of:
        0.04585424 = product of:
          0.09170848 = sum of:
            0.09170848 = weight(_text_:2002 in 3603) [ClassicSimilarity], result of:
              0.09170848 = score(doc=3603,freq=7.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.44301245 = fieldWeight in 3603, product of:
                  2.6457512 = tf(freq=7.0), with freq of:
                    7.0 = termFreq=7.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3603)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    This paper addresses a bibliometric methodology to discover the structure of the scientific 'landscape' in order to gain detailed insight into the development of MD fields, their interaction, and the transfer of knowledge between them. This methodology is appropriate to visualize the position of MD activities in relation to interdisciplinary MD developments, and particularly in relation to socio-economic problems. Furthermore, it allows the identification of the major actors. It even provides the possibility of foresight. We describe a first approach to apply bibliometric mapping as an instrument to investigate characteristics of knowledge transfer. In this paper we discuss the creation of 'maps of science' with help of advanced bibliometric methods. This 'bibliometric cartography' can be seen as a specific type of data-mining, applied to large amounts of scientific publications. As an example we describe the mapping of the field neuroscience, one of the largest and fast growing fields in the life sciences. The number of publications covered by this database is about 80,000 per year, the period covered is 1995-1998. Current research is going an to update the mapping for the years 1999-2002. This paper addresses the main lines of the methodology and its application in the study of knowledge transfer.
    Source
    Gaining insight from research information (CRIS2002): Proceedings of the 6th International Conference an Current Research Information Systems, University of Kassel, August 29 - 31, 2002. Eds: W. Adamczak u. A. Nase
    Year
    2002
  14. Chowdhury, G.G.: Template mining for information extraction from digital documents (1999) 0.02
    0.015267169 = product of:
      0.045801505 = sum of:
        0.045801505 = product of:
          0.09160301 = sum of:
            0.09160301 = weight(_text_:22 in 4577) [ClassicSimilarity], result of:
              0.09160301 = score(doc=4577,freq=2.0), product of:
                0.16911483 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048293278 = queryNorm
                0.5416616 = fieldWeight in 4577, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.109375 = fieldNorm(doc=4577)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    2. 4.2000 18:01:22
  15. KDD : techniques and applications (1998) 0.01
    0.013086144 = product of:
      0.03925843 = sum of:
        0.03925843 = product of:
          0.07851686 = sum of:
            0.07851686 = weight(_text_:22 in 6783) [ClassicSimilarity], result of:
              0.07851686 = score(doc=6783,freq=2.0), product of:
                0.16911483 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048293278 = queryNorm
                0.46428138 = fieldWeight in 6783, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.09375 = fieldNorm(doc=6783)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Footnote
    A special issue of selected papers from the Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD'97), held Singapore, 22-23 Feb 1997
  16. Classification, automation, and new media : Proceedings of the 24th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Passau, March 15 - 17, 2000 (2002) 0.01
    0.010006215 = product of:
      0.030018646 = sum of:
        0.030018646 = product of:
          0.060037293 = sum of:
            0.060037293 = weight(_text_:2002 in 5997) [ClassicSimilarity], result of:
              0.060037293 = score(doc=5997,freq=3.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.29001972 = fieldWeight in 5997, product of:
                  1.7320508 = tf(freq=3.0), with freq of:
                    3.0 = termFreq=3.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5997)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Year
    2002
  17. Matson, L.D.; Bonski, D.J.: Do digital libraries need librarians? (1997) 0.01
    0.008724097 = product of:
      0.02617229 = sum of:
        0.02617229 = product of:
          0.05234458 = sum of:
            0.05234458 = weight(_text_:22 in 1737) [ClassicSimilarity], result of:
              0.05234458 = score(doc=1737,freq=2.0), product of:
                0.16911483 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048293278 = queryNorm
                0.30952093 = fieldWeight in 1737, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1737)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    22.11.1998 18:57:22
  18. Lusti, M.: Data Warehousing and Data Mining : Eine Einführung in entscheidungsunterstützende Systeme (1999) 0.01
    0.008724097 = product of:
      0.02617229 = sum of:
        0.02617229 = product of:
          0.05234458 = sum of:
            0.05234458 = weight(_text_:22 in 4261) [ClassicSimilarity], result of:
              0.05234458 = score(doc=4261,freq=2.0), product of:
                0.16911483 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048293278 = queryNorm
                0.30952093 = fieldWeight in 4261, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=4261)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Date
    17. 7.2002 19:22:06
  19. Amir, A.; Feldman, R.; Kashi, R.: ¬A new and versatile method for association generation (1997) 0.01
    0.008724097 = product of:
      0.02617229 = sum of:
        0.02617229 = product of:
          0.05234458 = sum of:
            0.05234458 = weight(_text_:22 in 1270) [ClassicSimilarity], result of:
              0.05234458 = score(doc=1270,freq=2.0), product of:
                0.16911483 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.048293278 = queryNorm
                0.30952093 = fieldWeight in 1270, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0625 = fieldNorm(doc=1270)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Source
    Information systems. 22(1997) nos.5/6, S.333-347
  20. Suakkaphong, N.; Zhang, Z.; Chen, H.: Disease named entity recognition using semisupervised learning and conditional random fields (2011) 0.01
    0.008170041 = product of:
      0.024510123 = sum of:
        0.024510123 = product of:
          0.049020246 = sum of:
            0.049020246 = weight(_text_:2002 in 4367) [ClassicSimilarity], result of:
              0.049020246 = score(doc=4367,freq=2.0), product of:
                0.20701107 = queryWeight, product of:
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.048293278 = queryNorm
                0.2368001 = fieldWeight in 4367, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  4.28654 = idf(docFreq=1652, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=4367)
          0.5 = coord(1/2)
      0.33333334 = coord(1/3)
    
    Abstract
    Information extraction is an important text-mining task that aims at extracting prespecified types of information from large text collections and making them available in structured representations such as databases. In the biomedical domain, information extraction can be applied to help biologists make the most use of their digital-literature archives. Currently, there are large amounts of biomedical literature that contain rich information about biomedical substances. Extracting such knowledge requires a good named entity recognition technique. In this article, we combine conditional random fields (CRFs), a state-of-the-art sequence-labeling algorithm, with two semisupervised learning techniques, bootstrapping and feature sampling, to recognize disease names from biomedical literature. Two data-processing strategies for each technique also were analyzed: one sequentially processing unlabeled data partitions and another one processing unlabeled data partitions in a round-robin fashion. The experimental results showed the advantage of semisupervised learning techniques given limited labeled training data. Specifically, CRFs with bootstrapping implemented in sequential fashion outperformed strictly supervised CRFs for disease name recognition. The project was supported by NIH/NLM Grant R33 LM07299-01, 2002-2005.

Languages

  • e 17
  • d 13

Types