Search (22 results, page 1 of 2)

  • × theme_ss:"Data Mining"
  • × type_ss:"a"
  • × year_i:[2000 TO 2010}
  1. Thelwall, M.; Wilkinson, D.; Uppal, S.: Data mining emotion in social network communication : gender differences in MySpace (2009) 0.01
    0.011383288 = product of:
      0.0910663 = sum of:
        0.0910663 = weight(_text_:network in 3322) [ClassicSimilarity], result of:
          0.0910663 = score(doc=3322,freq=6.0), product of:
            0.17809492 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.039991006 = queryNorm
            0.51133573 = fieldWeight in 3322, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.046875 = fieldNorm(doc=3322)
      0.125 = coord(1/8)
    
    Abstract
    Despite the rapid growth in social network sites and in data mining for emotion (sentiment analysis), little research has tied the two together, and none has had social science goals. This article examines the extent to which emotion is present in MySpace comments, using a combination of data mining and content analysis, and exploring age and gender. A random sample of 819 public comments to or from U.S. users was manually classified for strength of positive and negative emotion. Two thirds of the comments expressed positive emotion, but a minority (20%) contained negative emotion, confirming that MySpace is an extraordinarily emotion-rich environment. Females are likely to give and receive more positive comments than are males, but there is no difference for negative comments. It is thus possible that females are more successful social network site users partly because of their greater ability to textually harness positive affect.
  2. Liu, W.; Weichselbraun, A.; Scharl, A.; Chang, E.: Semi-automatic ontology extension using spreading activation (2005) 0.01
    0.010843484 = product of:
      0.08674787 = sum of:
        0.08674787 = weight(_text_:network in 3028) [ClassicSimilarity], result of:
          0.08674787 = score(doc=3028,freq=4.0), product of:
            0.17809492 = queryWeight, product of:
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.039991006 = queryNorm
            0.48708782 = fieldWeight in 3028, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.4533744 = idf(docFreq=1398, maxDocs=44218)
              0.0546875 = fieldNorm(doc=3028)
      0.125 = coord(1/8)
    
    Abstract
    This paper describes a system to semi-automatically extend and refine ontologies by mining textual data from the Web sites of international online media. Expanding a seed ontology creates a semantic network through co-occurrence analysis, trigger phrase analysis, and disambiguation based on the WordNet lexical dictionary. Spreading activation then processes this semantic network to find the most probable candidates for inclusion in an extended ontology. Approaches to identifying hierarchical relationships such as subsumption, head noun analysis and WordNet consultation are used to confirm and classify the found relationships. Using a seed ontology on "climate change" as an example, this paper demonstrates how spreading activation improves the result by naturally integrating the mentioned methods.
  3. Keim, D.A.: Datenvisualisierung und Data Mining (2004) 0.01
    0.0052157952 = product of:
      0.041726362 = sum of:
        0.041726362 = weight(_text_:computer in 2931) [ClassicSimilarity], result of:
          0.041726362 = score(doc=2931,freq=4.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.28550854 = fieldWeight in 2931, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2931)
      0.125 = coord(1/8)
    
    Abstract
    Die rasante technologische Entwicklung der letzten zwei Jahrzehnte ermöglicht heute die persistente Speicherung riesiger Datenmengen durch den Computer. Forscher an der Universität Berkeley haben berechnet, dass jedes Jahr ca. 1 Exabyte (= 1 Million Terabyte) Daten generiert werden - ein großer Teil davon in digitaler Form. Das bedeutet aber, dass in den nächsten drei Jahren mehr Daten generiert werden als in der gesamten menschlichen Entwicklung zuvor. Die Daten werden oft automatisch mit Hilfe von Sensoren und Überwachungssystemen aufgezeichnet. So werden beispielsweise alltägliche Vorgänge des menschlichen Lebens, wie das Bezahlen mit Kreditkarte oder die Benutzung des Telefons, durch Computer aufgezeichnet. Dabei werden gewöhnlich alle verfügbaren Parameter abgespeichert, wodurch hochdimensionale Datensätze entstehen. Die Daten werden gesammelt, da sie wertvolle Informationen enthalten, die einen Wettbewerbsvorteil bieten können. Das Finden der wertvollen Informationen in den großen Datenmengen ist aber keine leichte Aufgabe. Heutige Datenbankmanagementsysteme können nur kleine Teilmengen dieser riesigen Datenmengen darstellen. Werden die Daten zum Beispiel in textueller Form ausgegeben, können höchstens ein paar hundert Zeilen auf dem Bildschirm dargestellt werden. Bei Millionen von Datensätzen ist dies aber nur ein Tropfen auf den heißen Stein.
  4. Fenstermacher, K.D.; Ginsburg, M.: Client-side monitoring for Web mining (2003) 0.00
    0.004425749 = product of:
      0.035405993 = sum of:
        0.035405993 = weight(_text_:computer in 1611) [ClassicSimilarity], result of:
          0.035405993 = score(doc=1611,freq=2.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.24226204 = fieldWeight in 1611, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.046875 = fieldNorm(doc=1611)
      0.125 = coord(1/8)
    
    Abstract
    "Garbage in, garbage out" is a well-known phrase in computer analysis, and one that comes to mind when mining Web data to draw conclusions about Web users. The challenge is that data analysts wish to infer patterns of client-side behavior from server-side data. However, because only a fraction of the user's actions ever reaches the Web server, analysts must rely an incomplete data. In this paper, we propose a client-side monitoring system that is unobtrusive and supports flexible data collection. Moreover, the proposed framework encompasses client-side applications beyond the Web browser. Expanding monitoring beyond the browser to incorporate standard office productivity tools enables analysts to derive a much richer and more accurate picture of user behavior an the Web.
  5. Gluck , M.: Multimedia exploratory data analysis for geospatial data mining : the case for augmented seriation (2001) 0.00
    0.004425749 = product of:
      0.035405993 = sum of:
        0.035405993 = weight(_text_:computer in 5214) [ClassicSimilarity], result of:
          0.035405993 = score(doc=5214,freq=2.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.24226204 = fieldWeight in 5214, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.046875 = fieldNorm(doc=5214)
      0.125 = coord(1/8)
    
    Abstract
    To prevent type-one error, statisticians tend to accept the possibility of type-two error, which leads to the rejection of hypotheses later shown to be true. In both Exploratory Data Analysis and data mining the emphasis is more appropriately on the elimination of type-two error. Thus EDA methods, including its visualization tools may be appropriate for Data Mining. Seriation, creates a matrix of observations and variables, where the cells contain an icon whose size represents its value, and permits the movement of rows and columns in order to visually discern patterns. Augmented Seriation, a method of data mining, adds computer graphics, sound, color, and extra dimensions to the matrix so that the analyst has different modalities for pattern observation. Gluck has developed software for such analysis.
  6. Sánchez, D.; Chamorro-Martínez, J.; Vila, M.A.: Modelling subjectivity in visual perception of orientation for image retrieval (2003) 0.00
    0.004425749 = product of:
      0.035405993 = sum of:
        0.035405993 = weight(_text_:computer in 1067) [ClassicSimilarity], result of:
          0.035405993 = score(doc=1067,freq=2.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.24226204 = fieldWeight in 1067, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.046875 = fieldNorm(doc=1067)
      0.125 = coord(1/8)
    
    Abstract
    In this paper we combine computer vision and data mining techniques to model high-level concepts for image retrieval, on the basis of basic perceptual features of the human visual system. High-level concepts related to these features are learned and represented by means of a set of fuzzy association rules. The concepts so acquired can be used for image retrieval with the advantage that it is not needed to provide an image as a query. Instead, a query is formulated by using the labels that identify the learned concepts as search terms, and the retrieval process calculates the relevance of an image to the query by an inference mechanism. An additional feature of our methodology is that it can capture user's subjectivity. For that purpose, fuzzy sets theory is employed to measure user's assessments about the fulfillment of a concept by an image.
  7. Hereth, J.; Stumme, G.; Wille, R.; Wille, U.: Conceptual knowledge discovery and data analysis (2000) 0.00
    0.0036881242 = product of:
      0.029504994 = sum of:
        0.029504994 = weight(_text_:computer in 5083) [ClassicSimilarity], result of:
          0.029504994 = score(doc=5083,freq=2.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.20188503 = fieldWeight in 5083, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.0390625 = fieldNorm(doc=5083)
      0.125 = coord(1/8)
    
    Series
    Lecture notes in computer science; vol.1867: Lecture notes on artificial intelligence
  8. Lam, W.; Yang, C.C.; Menczer, F.: Introduction to the special topic section on mining Web resources for enhancing information retrieval (2007) 0.00
    0.0036427265 = product of:
      0.029141812 = sum of:
        0.029141812 = product of:
          0.058283623 = sum of:
            0.058283623 = weight(_text_:resources in 600) [ClassicSimilarity], result of:
              0.058283623 = score(doc=600,freq=4.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.39925572 = fieldWeight in 600, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=600)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Footnote
    Einführung in einen Themenschwerpunkt "Mining Web resources for enhancing information retrieval"
  9. Cohen, D.J.: From Babel to knowledge : data mining large digital collections (2006) 0.00
    0.0029504993 = product of:
      0.023603994 = sum of:
        0.023603994 = weight(_text_:computer in 1178) [ClassicSimilarity], result of:
          0.023603994 = score(doc=1178,freq=2.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.16150802 = fieldWeight in 1178, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.03125 = fieldNorm(doc=1178)
      0.125 = coord(1/8)
    
    Abstract
    In Jorge Luis Borges's curious short story The Library of Babel, the narrator describes an endless collection of books stored from floor to ceiling in a labyrinth of countless hexagonal rooms. The pages of the library's books seem to contain random sequences of letters and spaces; occasionally a few intelligible words emerge in the sea of paper and ink. Nevertheless, readers diligently, and exasperatingly, scan the shelves for coherent passages. The narrator himself has wandered numerous rooms in search of enlightenment, but with resignation he simply awaits his death and burial - which Borges explains (with signature dark humor) consists of being tossed unceremoniously over the library's banister. Borges's nightmare, of course, is a cursed vision of the research methods of disciplines such as literature, history, and philosophy, where the careful reading of books, one after the other, is supposed to lead inexorably to knowledge and understanding. Computer scientists would approach Borges's library far differently. Employing the information theory that forms the basis for search engines and other computerized techniques for assessing in one fell swoop large masses of documents, they would quickly realize the collection's incoherence though sampling and statistical methods - and wisely start looking for the library's exit. These computational methods, which allow us to find patterns, determine relationships, categorize documents, and extract information from massive corpuses, will form the basis for new tools for research in the humanities and other disciplines in the coming decade. For the past three years I have been experimenting with how to provide such end-user tools - that is, tools that harness the power of vast electronic collections while hiding much of their complicated technical plumbing. In particular, I have made extensive use of the application programming interfaces (APIs) the leading search engines provide for programmers to query their databases directly (from server to server without using their web interfaces). In addition, I have explored how one might extract information from large digital collections, from the well-curated lexicographic database WordNet to the democratic (and poorly curated) online reference work Wikipedia. While processing these digital corpuses is currently an imperfect science, even now useful tools can be created by combining various collections and methods for searching and analyzing them. And more importantly, these nascent services suggest a future in which information can be gleaned from, and sense can be made out of, even imperfect digital libraries of enormous scale. A brief examination of two approaches to data mining large digital collections hints at this future, while also providing some lessons about how to get there.
  10. Wang, W.M.; Cheung, C.F.; Lee, W.B.; Kwok, S.K.: Mining knowledge from natural language texts using fuzzy associated concept mapping (2008) 0.00
    0.0029504993 = product of:
      0.023603994 = sum of:
        0.023603994 = weight(_text_:computer in 2121) [ClassicSimilarity], result of:
          0.023603994 = score(doc=2121,freq=2.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.16150802 = fieldWeight in 2121, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.03125 = fieldNorm(doc=2121)
      0.125 = coord(1/8)
    
    Abstract
    Natural Language Processing (NLP) techniques have been successfully used to automatically extract information from unstructured text through a detailed analysis of their content, often to satisfy particular information needs. In this paper, an automatic concept map construction technique, Fuzzy Association Concept Mapping (FACM), is proposed for the conversion of abstracted short texts into concept maps. The approach consists of a linguistic module and a recommendation module. The linguistic module is a text mining method that does not require the use to have any prior knowledge about using NLP techniques. It incorporates rule-based reasoning (RBR) and case based reasoning (CBR) for anaphoric resolution. It aims at extracting the propositions in text so as to construct a concept map automatically. The recommendation module is arrived at by adopting fuzzy set theories. It is an interactive process which provides suggestions of propositions for further human refinement of the automatically generated concept maps. The suggested propositions are relationships among the concepts which are not explicitly found in the paragraphs. This technique helps to stimulate individual reflection and generate new knowledge. Evaluation was carried out by using the Science Citation Index (SCI) abstract database and CNET News as test data, which are well known databases and the quality of the text is assured. Experimental results show that the automatically generated concept maps conform to the outputs generated manually by domain experts, since the degree of difference between them is proportionally small. The method provides users with the ability to convert scientific and short texts into a structured format which can be easily processed by computer. Moreover, it provides knowledge workers with extra time to re-think their written text and to view their knowledge from another angle.
  11. Wang, F.L.; Yang, C.C.: Mining Web data for Chinese segmentation (2007) 0.00
    0.0026019474 = product of:
      0.02081558 = sum of:
        0.02081558 = product of:
          0.04163116 = sum of:
            0.04163116 = weight(_text_:resources in 604) [ClassicSimilarity], result of:
              0.04163116 = score(doc=604,freq=4.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.28518265 = fieldWeight in 604, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=604)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Abstract
    Modern information retrieval systems use keywords within documents as indexing terms for search of relevant documents. As Chinese is an ideographic character-based language, the words in the texts are not delimited by white spaces. Indexing of Chinese documents is impossible without a proper segmentation algorithm. Many Chinese segmentation algorithms have been proposed in the past. Traditional segmentation algorithms cannot operate without a large dictionary or a large corpus of training data. Nowadays, the Web has become the largest corpus that is ideal for Chinese segmentation. Although most search engines have problems in segmenting texts into proper words, they maintain huge databases of documents and frequencies of character sequences in the documents. Their databases are important potential resources for segmentation. In this paper, we propose a segmentation algorithm by mining Web data with the help of search engines. On the other hand, the Romanized pinyin of Chinese language indicates boundaries of words in the text. Our algorithm is the first to utilize the Romanized pinyin to segmentation. It is the first unified segmentation algorithm for the Chinese language from different geographical areas, and it is also domain independent because of the nature of the Web. Experiments have been conducted on the datasets of a recent Chinese segmentation competition. The results show that our algorithm outperforms the traditional algorithms in terms of precision and recall. Moreover, our algorithm can effectively deal with the problems of segmentation ambiguity, new word (unknown word) detection, and stop words.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  12. Liu, Y.; Zhang, M.; Cen, R.; Ru, L.; Ma, S.: Data cleansing for Web information retrieval using query independent features (2007) 0.00
    0.0026019474 = product of:
      0.02081558 = sum of:
        0.02081558 = product of:
          0.04163116 = sum of:
            0.04163116 = weight(_text_:resources in 607) [ClassicSimilarity], result of:
              0.04163116 = score(doc=607,freq=4.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.28518265 = fieldWeight in 607, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=607)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Abstract
    Understanding what kinds of Web pages are the most useful for Web search engine users is a critical task in Web information retrieval (IR). Most previous works used hyperlink analysis algorithms to solve this problem. However, little research has been focused on query-independent Web data cleansing for Web IR. In this paper, we first provide analysis of the differences between retrieval target pages and ordinary ones based on more than 30 million Web pages obtained from both the Text Retrieval Conference (TREC) and a widely used Chinese search engine, SOGOU (www.sogou.com). We further propose a learning-based data cleansing algorithm for reducing Web pages that are unlikely to be useful for user requests. We found that there exists a large proportion of low-quality Web pages in both the English and the Chinese Web page corpus, and retrieval target pages can be identified using query-independent features and cleansing algorithms. The experimental results showed that our algorithm is effective in reducing a large portion of Web pages with a small loss in retrieval target pages. It makes it possible for Web IR tools to meet a large fraction of users' needs with only a small part of pages on the Web. These results may help Web search engines make better use of their limited storage and computation resources to improve search performance.
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  13. Seidenfaden, U.: Schürfen in Datenbergen : Data-Mining soll möglichst viel Information zu Tage fördern (2001) 0.00
    0.0025816867 = product of:
      0.020653494 = sum of:
        0.020653494 = weight(_text_:computer in 6923) [ClassicSimilarity], result of:
          0.020653494 = score(doc=6923,freq=2.0), product of:
            0.1461475 = queryWeight, product of:
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.039991006 = queryNorm
            0.14131951 = fieldWeight in 6923, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.6545093 = idf(docFreq=3109, maxDocs=44218)
              0.02734375 = fieldNorm(doc=6923)
      0.125 = coord(1/8)
    
    Content
    "Fast alles wird heute per Computer erfasst. Kaum einer überblickt noch die enormen Datenmengen, die sich in Unternehmen, Universitäten und Verwaltung ansammeln. Allein in den öffentlich zugänglichen Datenbanken der Genforscher fallen pro Woche rund 4,5 Gigabyte an neuer Information an. "Vom potentiellen Wissen in den Datenbanken wird bislang aber oft nur ein Teil genutzt", meint Stefan Wrobel vom Lehrstuhl für Wissensentdeckung und Maschinelles Lernen der Otto-von-Guericke-Universität in Magdeburg. Sein Doktorand Mark-Andre Krogel hat soeben mit einem neuen Verfahren zur Datenbankrecherche in San Francisco einen inoffiziellen Weltmeister-Titel in der Disziplin "Data-Mining" gewonnen. Dieser Daten-Bergbau arbeitet im Unterschied zur einfachen Datenbankabfrage, die sich einfacher statistischer Methoden bedient, zusätzlich mit künstlicher Intelligenz und Visualisierungsverfahren, um Querverbindungen zu finden. "Das erleichtert die Suche nach verborgenen Zusammenhängen im Datenmaterial ganz erheblich", so Wrobel. Die Wirtschaft setzt Data-Mining bereits ein, um das Kundenverhalten zu untersuchen und vorherzusagen. "Stellen sie sich ein Unternehmen mit einer breiten Produktpalette und einem großen Kundenstamm vor", erklärt Wrobel. "Es kann seinen Erfolg maximieren, wenn es Marketing-Post zielgerichtet an seine Kunden verschickt. Wer etwa gerade einen PC gekauft hat, ist womöglich auch an einem Drucker oder Scanner interessiert." In einigen Jahren könnte ein Analysemodul den Manager eines Unternehmens selbständig informieren, wenn ihm etwas Ungewöhnliches aufgefallen ist. Das muss nicht immer positiv für den Kunden sein. Data-Mining ließe sich auch verwenden, um die Lebensdauer von Geschäftsbeziehungen zu prognostizieren. Für Kunden mit geringen Kaufinteressen würden Reklamationen dann längere Bearbeitungszeiten nach sich ziehen. Im konkreten Projekt von Mark-Andre Krogel ging es um die Vorhersage von Protein-Funktionen. Proteine sind Eiweißmoleküle, die fast alle Stoffwechselvorgänge im menschlichen Körper steuern. Sie sind daher die primären Ziele von Wirkstoffen zur Behandlung von Erkrankungen. Das erklärt das große Interesse der Pharmaindustrie. Experimentelle Untersuchungen, die Aufschluss über die Aufgaben der über 100 000 Eiweißmoleküle im menschlichen Körper geben können, sind mit einem hohen Zeitaufwand verbunden. Die Forscher möchten deshalb die Zeit verkürzen, indem sie das vorhandene Datenmaterial mit Hilfe von Data-Mining auswerten. Aus der im Humangenomprojekt bereits entschlüsselten Abfolge der Erbgut-Bausteine lässt sich per Datenbankanalyse die Aneinanderreihung bestimmter Aminosäuren zu einem Protein vorhersagen. Andere Datenbanken wiederum enthalten Informationen, welche Struktur ein Protein mit einer bestimmten vorgegebenen Funktion haben könnte. Aus bereits bekannten Strukturelementen versuchen die Genforscher dann, auf die mögliche Funktion eines bislang noch unbekannten Eiweißmoleküls zu schließen.- Fakten Verschmelzen - Bei diesem theoretischen Ansatz kommt es darauf an, die in Datenbanken enthaltenen Informationen so zu verknüpfen, dass die Ergebnisse mit hoher Wahrscheinlichkeit mit der Realität übereinstimmen. "Im Rahmen des Wettbewerbs erhielten wir Tabellen als Vorgabe, in denen Gene und Chromosomen nach bestimmten Gesichtspunkten klassifiziert waren", erläutert Krogel. Von einigen Genen war bekannt, welche Proteine sie produzieren und welche Aufgabe diese Eiweißmoleküle besitzen. Diese Beispiele dienten dem von Krogel entwickelten Programm dann als Hilfe, für andere Gene vorherzusagen, welche Funktionen die von ihnen erzeugten Proteine haben. "Die Genauigkeit der Vorhersage lag bei den gestellten Aufgaben bei über 90 Prozent", stellt Krogel fest. Allerdings könne man in der Praxis nicht davon ausgehen, dass alle Informationen aus verschiedenen Datenbanken in einem einheitlichen Format vorliegen. Es gebe verschiedene Abfragesprachen der Datenbanken, und die Bezeichnungen von Eiweißmolekülen mit gleicher Aufgabe seien oftmals uneinheitlich. Die Magdeburger Informatiker arbeiten deshalb in der DFG-Forschergruppe "Informationsfusion" an Methoden, um die verschiedenen Datenquellen besser zu erschließen."
  14. Baeza-Yates, R.; Hurtado, C.; Mendoza, M.: Improving search engines by query clustering (2007) 0.00
    0.0025757966 = product of:
      0.020606373 = sum of:
        0.020606373 = product of:
          0.041212745 = sum of:
            0.041212745 = weight(_text_:resources in 601) [ClassicSimilarity], result of:
              0.041212745 = score(doc=601,freq=2.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.28231642 = fieldWeight in 601, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.0546875 = fieldNorm(doc=601)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  15. Perugini, S.; Ramakrishnan, N.: Mining Web functional dependencies for flexible information access (2007) 0.00
    0.0022078257 = product of:
      0.017662605 = sum of:
        0.017662605 = product of:
          0.03532521 = sum of:
            0.03532521 = weight(_text_:resources in 602) [ClassicSimilarity], result of:
              0.03532521 = score(doc=602,freq=2.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.2419855 = fieldWeight in 602, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.046875 = fieldNorm(doc=602)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  16. Schwartz, F.; Fang, Y.C.: Citation data analysis on hydrogeology (2007) 0.00
    0.0020815579 = product of:
      0.016652463 = sum of:
        0.016652463 = product of:
          0.033304926 = sum of:
            0.033304926 = weight(_text_:resources in 433) [ClassicSimilarity], result of:
              0.033304926 = score(doc=433,freq=4.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.22814612 = fieldWeight in 433, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.03125 = fieldNorm(doc=433)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Abstract
    This article explores the status of research in hydrogeology using data mining techniques. First we try to explain what citation analysis is and review some of the previous work on citation analysis. The main idea in this article is to address some common issues about citation numbers and the use of these data. To validate the use of citation numbers, we compare the citation patterns for Water Resources Research papers in the 1980s with those in the 1990s. The citation growths for highly cited authors from the 1980s are used to examine whether it is possible to predict the citation patterns for highly-cited authors in the 1990s. If the citation data prove to be steady and stable, these numbers then can be used to explore the evolution of science in hydrogeology. The famous quotation, "If you are not the lead dog, the scenery never changes," attributed to Lee Iacocca, points to the importance of an entrepreneurial spirit in all forms of endeavor. In the case of hydrogeological research, impact analysis makes it clear how important it is to be a pioneer. Statistical correlation coefficients are used to retrieve papers among a collection of 2,847 papers before and after 1991 sharing the same topics with 273 papers in 1991 in Water Resources Research. The numbers of papers before and after 1991 are then plotted against various levels of citations for papers in 1991 to compare the distributions of paper population before and after that year. The similarity metrics based on word counts can ensure that the "before" papers are like ancestors and "after" papers are descendants in the same type of research. This exercise gives us an idea of how many papers are populated before and after 1991 (1991 is chosen based on balanced numbers of papers before and after that year). In addition, the impact of papers is measured in terms of citation presented as "percentile," a relative measure based on rankings in one year, in order to minimize the effect of time.
  17. Shi, X.; Yang, C.C.: Mining related queries from Web search engine query logs using an improved association rule mining model (2007) 0.00
    0.0018398546 = product of:
      0.014718837 = sum of:
        0.014718837 = product of:
          0.029437674 = sum of:
            0.029437674 = weight(_text_:resources in 597) [ClassicSimilarity], result of:
              0.029437674 = score(doc=597,freq=2.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.20165458 = fieldWeight in 597, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=597)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  18. Ku, L.-W.; Chen, H.-H.: Mining opinions from the Web : beyond relevance retrieval (2007) 0.00
    0.0018398546 = product of:
      0.014718837 = sum of:
        0.014718837 = product of:
          0.029437674 = sum of:
            0.029437674 = weight(_text_:resources in 605) [ClassicSimilarity], result of:
              0.029437674 = score(doc=605,freq=2.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.20165458 = fieldWeight in 605, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=605)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  19. Liu, Y.; Huang, X.; An, A.: Personalized recommendation with adaptive mixture of markov models (2007) 0.00
    0.0018398546 = product of:
      0.014718837 = sum of:
        0.014718837 = product of:
          0.029437674 = sum of:
            0.029437674 = weight(_text_:resources in 606) [ClassicSimilarity], result of:
              0.029437674 = score(doc=606,freq=2.0), product of:
                0.14598069 = queryWeight, product of:
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.039991006 = queryNorm
                0.20165458 = fieldWeight in 606, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.650338 = idf(docFreq=3122, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=606)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Footnote
    Beitrag eines Themenschwerpunktes "Mining Web resources for enhancing information retrieval"
  20. Peters, G.; Gaese, V.: ¬Das DocCat-System in der Textdokumentation von G+J (2003) 0.00
    0.0013545573 = product of:
      0.010836459 = sum of:
        0.010836459 = product of:
          0.021672918 = sum of:
            0.021672918 = weight(_text_:22 in 1507) [ClassicSimilarity], result of:
              0.021672918 = score(doc=1507,freq=2.0), product of:
                0.1400417 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.039991006 = queryNorm
                0.15476047 = fieldWeight in 1507, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.03125 = fieldNorm(doc=1507)
          0.5 = coord(1/2)
      0.125 = coord(1/8)
    
    Date
    22. 4.2003 11:45:36