Search (2041 results, page 2 of 103)

  • × language_ss:"e"
  • × year_i:[2010 TO 2020}
  1. Haustein, S.; Sugimoto, C.; Larivière, V.: Social media in scholarly communication : Guest editorial (2015) 0.12
    0.1164408 = product of:
      0.15525441 = sum of:
        0.00611645 = product of:
          0.0244658 = sum of:
            0.0244658 = weight(_text_:based in 3809) [ClassicSimilarity], result of:
              0.0244658 = score(doc=3809,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.17297572 = fieldWeight in 3809, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0234375 = fieldNorm(doc=3809)
          0.25 = coord(1/4)
        0.047908474 = weight(_text_:term in 3809) [ClassicSimilarity], result of:
          0.047908474 = score(doc=3809,freq=4.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.21872015 = fieldWeight in 3809, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0234375 = fieldNorm(doc=3809)
        0.10122948 = sum of:
          0.08214887 = weight(_text_:assessment in 3809) [ClassicSimilarity], result of:
            0.08214887 = score(doc=3809,freq=6.0), product of:
              0.25917634 = queryWeight, product of:
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.04694356 = queryNorm
              0.31696132 = fieldWeight in 3809, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.0234375 = fieldNorm(doc=3809)
          0.019080611 = weight(_text_:22 in 3809) [ClassicSimilarity], result of:
            0.019080611 = score(doc=3809,freq=2.0), product of:
              0.16438834 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.04694356 = queryNorm
              0.116070345 = fieldWeight in 3809, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0234375 = fieldNorm(doc=3809)
      0.75 = coord(3/4)
    
    Abstract
    One of the solutions to help scientists filter the most relevant publications and, thus, to stay current on developments in their fields during the transition from "little science" to "big science", was the introduction of citation indexing as a Wellsian "World Brain" (Garfield, 1964) of scientific information: It is too much to expect a research worker to spend an inordinate amount of time searching for the bibliographic descendants of antecedent papers. It would not be excessive to demand that the thorough scholar check all papers that have cited or criticized such papers, if they could be located quickly. The citation index makes this check practicable (Garfield, 1955, p. 108). In retrospective, citation indexing can be perceived as a pre-social web version of crowdsourcing, as it is based on the concept that the community of citing authors outperforms indexers in highlighting cognitive links between papers, particularly on the level of specific ideas and concepts (Garfield, 1983). Over the last 50 years, citation analysis and more generally, bibliometric methods, have developed from information retrieval tools to research evaluation metrics, where they are presumed to make scientific funding more efficient and effective (Moed, 2006). However, the dominance of bibliometric indicators in research evaluation has also led to significant goal displacement (Merton, 1957) and the oversimplification of notions of "research productivity" and "scientific quality", creating adverse effects such as salami publishing, honorary authorships, citation cartels, and misuse of indicators (Binswanger, 2015; Cronin and Sugimoto, 2014; Frey and Osterloh, 2006; Haustein and Larivière, 2015; Weingart, 2005).
    Furthermore, the rise of the web, and subsequently, the social web, has challenged the quasi-monopolistic status of the journal as the main form of scholarly communication and citation indices as the primary assessment mechanisms. Scientific communication is becoming more open, transparent, and diverse: publications are increasingly open access; manuscripts, presentations, code, and data are shared online; research ideas and results are discussed and criticized openly on blogs; and new peer review experiments, with open post publication assessment by anonymous or non-anonymous referees, are underway. The diversification of scholarly production and assessment, paired with the increasing speed of the communication process, leads to an increased information overload (Bawden and Robinson, 2008), demanding new filters. The concept of altmetrics, short for alternative (to citation) metrics, was created out of an attempt to provide a filter (Priem et al., 2010) and to steer against the oversimplification of the measurement of scientific success solely on the basis of number of journal articles published and citations received, by considering a wider range of research outputs and metrics (Piwowar, 2013). Although the term altmetrics was introduced in a tweet in 2010 (Priem, 2010), the idea of capturing traces - "polymorphous mentioning" (Cronin et al., 1998, p. 1320) - of scholars and their documents on the web to measure "impact" of science in a broader manner than citations was introduced years before, largely in the context of webometrics (Almind and Ingwersen, 1997; Thelwall et al., 2005):
    There will soon be a critical mass of web-based digital objects and usage statistics on which to model scholars' communication behaviors - publishing, posting, blogging, scanning, reading, downloading, glossing, linking, citing, recommending, acknowledging - and with which to track their scholarly influence and impact, broadly conceived and broadly felt (Cronin, 2005, p. 196). A decade after Cronin's prediction and five years after the coining of altmetrics, the time seems ripe to reflect upon the role of social media in scholarly communication. This Special Issue does so by providing an overview of current research on the indicators and metrics grouped under the umbrella term of altmetrics, on their relationships with traditional indicators of scientific activity, and on the uses that are made of the various social media platforms - on which these indicators are based - by scientists of various disciplines.
    Date
    20. 1.2015 18:30:22
  2. Negm, E.; AbdelRahman, S.; Bahgat, R.: PREFCA: a portal retrieval engine based on formal concept analysis (2017) 0.12
    0.11629958 = product of:
      0.1550661 = sum of:
        0.008155267 = product of:
          0.032621067 = sum of:
            0.032621067 = weight(_text_:based in 3291) [ClassicSimilarity], result of:
              0.032621067 = score(doc=3291,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.2306343 = fieldWeight in 3291, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.03125 = fieldNorm(doc=3291)
          0.25 = coord(1/4)
        0.04516854 = weight(_text_:term in 3291) [ClassicSimilarity], result of:
          0.04516854 = score(doc=3291,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.20621133 = fieldWeight in 3291, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=3291)
        0.10174229 = weight(_text_:frequency in 3291) [ClassicSimilarity], result of:
          0.10174229 = score(doc=3291,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.36804655 = fieldWeight in 3291, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.03125 = fieldNorm(doc=3291)
      0.75 = coord(3/4)
    
    Abstract
    The web is a network of linked sites whereby each site either forms a physical portal or a standalone page. In the former case, the portal presents an access point to its embedded web pages that coherently present a specific topic. In the latter case, there are millions of standalone web pages, that are scattered throughout the web, having the same topic and could be conceptually linked together to form virtual portals. Search engines have been developed to help users in reaching the adequate pages in an efficient and effective manner. All the known current search engine techniques rely on the web page as the basic atomic search unit. They ignore the conceptual links, that reveal the implicit web related meanings, among the retrieved pages. However, building a semantic model for the whole portal may contain more semantic information than a model of scattered individual pages. In addition, user queries can be poor and contain imprecise terms that do not reflect the real user intention. Consequently, retrieving the standalone individual pages that are directly related to the query may not satisfy the user's need. In this paper, we propose PREFCA, a Portal Retrieval Engine based on Formal Concept Analysis that relies on the portal as the main search unit. PREFCA consists of three phases: First, the information extraction phase that is concerned with extracting portal's semantic data. Second, the formal concept analysis phase that utilizes formal concept analysis to discover the conceptual links among portal and attributes. Finally, the information retrieval phase where we propose a portal ranking method to retrieve ranked pairs of portals and embedded pages. Additionally, we apply the network analysis rules to output some portal characteristics. We evaluated PREFCA using two data sets, namely the Forum for Information Retrieval Evaluation 2010 and ClueWeb09 (category B) test data, for physical and virtual portals respectively. PREFCA proves higher F-measure accuracy, better Mean Average Precision ranking and comparable network analysis and efficiency results than other search engine approaches, namely Term Frequency Inverse Document Frequency (TF-IDF), Latent Semantic Analysis (LSA), and BM25 techniques. As well, it gains high Mean Average Precision in comparison with learning to rank techniques. Moreover, PREFCA also gains better reach time than Carrot as a well-known topic-based search engine.
  3. Wei, C.-P.; Lee, Y.-H.; Chiang, Y.-S.; Chen, C.-T.; Yang, C.C.C.: Exploiting temporal characteristics of features for effectively discovering event episodes from news corpora (2014) 0.12
    0.11603433 = product of:
      0.15471244 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 1225) [ClassicSimilarity], result of:
              0.033293735 = score(doc=1225,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 1225, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1225)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 1225) [ClassicSimilarity], result of:
          0.056460675 = score(doc=1225,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 1225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1225)
        0.08992833 = weight(_text_:frequency in 1225) [ClassicSimilarity], result of:
          0.08992833 = score(doc=1225,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 1225, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1225)
      0.75 = coord(3/4)
    
    Abstract
    An organization performing environmental scanning generally monitors or tracks various events concerning its external environment. One of the major resources for environmental scanning is online news documents, which are readily accessible on news websites or infomediaries. However, the proliferation of the World Wide Web, which increases information sources and improves information circulation, has vastly expanded the amount of information to be scanned. Thus, it is essential to develop an effective event episode discovery mechanism to organize news documents pertaining to an event of interest. In this study, we propose two new metrics, Term Frequency × Inverse Document FrequencyTempo (TF×IDFTempo) and TF×Enhanced-IDFTempo, and develop a temporal-based event episode discovery (TEED) technique that uses the proposed metrics for feature selection and document representation. Using a traditional TF×IDF-based hierarchical agglomerative clustering technique as a performance benchmark, our empirical evaluation reveals that the proposed TEED technique outperforms its benchmark, as measured by cluster recall and cluster precision. In addition, the use of TF×Enhanced-IDFTempo significantly improves the effectiveness of event episode discovery when compared with the use of TF×IDFTempo.
  4. Wu, M.; Hawking, D.; Turpin, A.; Scholer, F.: Using anchor text for homepage and topic distillation search tasks (2012) 0.11
    0.11420593 = product of:
      0.15227456 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 257) [ClassicSimilarity], result of:
              0.023542227 = score(doc=257,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 257, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=257)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 257) [ClassicSimilarity], result of:
          0.056460675 = score(doc=257,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 257, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=257)
        0.08992833 = weight(_text_:frequency in 257) [ClassicSimilarity], result of:
          0.08992833 = score(doc=257,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 257, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=257)
      0.75 = coord(3/4)
    
    Abstract
    Past work suggests that anchor text is a good source of evidence that can be used to improve web searching. Two approaches for making use of this evidence include fusing search results from an anchor text representation and the original text representation based on a document's relevance score or rank position, and combining term frequency from both representations during the retrieval process. Although these approaches have each been tested and compared against baselines, different evaluations have used different baselines; no consistent work enables rigorous cross-comparison between these methods. The purpose of this work is threefold. First, we survey existing fusion methods of using anchor text in search. Second, we compare these methods with common testbeds and web search tasks, with the aim of identifying the most effective fusion method. Third, we try to correlate search performance with the characteristics of a test collection. Our experimental results show that the best performing method in each category can significantly improve search results over a common baseline. However, there is no single technique that consistently outperforms competing approaches across different collections and search tasks.
  5. Amolochitis, E.; Christou, I.T.; Tan, Z.-H.; Prasad, R.: ¬A heuristic hierarchical scheme for academic search and retrieval (2013) 0.11
    0.11420593 = product of:
      0.15227456 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 2711) [ClassicSimilarity], result of:
              0.023542227 = score(doc=2711,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 2711, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2711)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 2711) [ClassicSimilarity], result of:
          0.056460675 = score(doc=2711,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 2711, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2711)
        0.08992833 = weight(_text_:frequency in 2711) [ClassicSimilarity], result of:
          0.08992833 = score(doc=2711,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 2711, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2711)
      0.75 = coord(3/4)
    
    Abstract
    We present PubSearch, a hybrid heuristic scheme for re-ranking academic papers retrieved from standard digital libraries such as the ACM Portal. The scheme is based on the hierarchical combination of a custom implementation of the term frequency heuristic, a time-depreciated citation score and a graph-theoretic computed score that relates the paper's index terms with each other. We designed and developed a meta-search engine that submits user queries to standard digital repositories of academic publications and re-ranks the repository results using the hierarchical heuristic scheme. We evaluate our proposed re-ranking scheme via user feedback against the results of ACM Portal on a total of 58 different user queries specified from 15 different users. The results show that our proposed scheme significantly outperforms ACM Portal in terms of retrieval precision as measured by most common metrics in Information Retrieval including Normalized Discounted Cumulative Gain (NDCG), Expected Reciprocal Rank (ERR) as well as a newly introduced lexicographic rule (LEX) of ranking search results. In particular, PubSearch outperforms ACM Portal by more than 77% in terms of ERR, by more than 11% in terms of NDCG, and by more than 907.5% in terms of LEX. We also re-rank the top-10 results of a subset of the original 58 user queries produced by Google Scholar, Microsoft Academic Search, and ArnetMiner; the results show that PubSearch compares very well against these search engines as well. The proposed scheme can be easily plugged in any existing search engine for retrieval of academic publications.
  6. AlQenaei, Z.M.; Monarchi, D.E.: ¬The use of learning techniques to analyze the results of a manual classification system (2016) 0.11
    0.11420593 = product of:
      0.15227456 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 2836) [ClassicSimilarity], result of:
              0.023542227 = score(doc=2836,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 2836, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2836)
          0.25 = coord(1/4)
        0.056460675 = weight(_text_:term in 2836) [ClassicSimilarity], result of:
          0.056460675 = score(doc=2836,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
        0.08992833 = weight(_text_:frequency in 2836) [ClassicSimilarity], result of:
          0.08992833 = score(doc=2836,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 2836, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2836)
      0.75 = coord(3/4)
    
    Abstract
    Classification is the process of assigning objects to pre-defined classes based on observations or characteristics of those objects, and there are many approaches to performing this task. The overall objective of this study is to demonstrate the use of two learning techniques to analyze the results of a manual classification system. Our sample consisted of 1,026 documents, from the ACM Computing Classification System, classified by their authors as belonging to one of the groups of the classification system: "H.3 Information Storage and Retrieval." A singular value decomposition of the documents' weighted term-frequency matrix was used to represent each document in a 50-dimensional vector space. The analysis of the representation using both supervised (decision tree) and unsupervised (clustering) techniques suggests that two pairs of the ACM classes are closely related to each other in the vector space. Class 1 (Content Analysis and Indexing) is closely related to Class 3 (Information Search and Retrieval), and Class 4 (Systems and Software) is closely related to Class 5 (Online Information Services). Further analysis was performed to test the diffusion of the words in the two classes using both cosine and Euclidean distance.
  7. Vivanco, L.; Bartolomé, B.; San Martín, M.; Martínez, A.: Bibliometric analysis of the use of the term preembryo in scientific literature (2011) 0.11
    0.11411409 = product of:
      0.22822818 = sum of:
        0.13829985 = weight(_text_:term in 4454) [ClassicSimilarity], result of:
          0.13829985 = score(doc=4454,freq=12.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.6313907 = fieldWeight in 4454, product of:
              3.4641016 = tf(freq=12.0), with freq of:
                12.0 = termFreq=12.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4454)
        0.08992833 = weight(_text_:frequency in 4454) [ClassicSimilarity], result of:
          0.08992833 = score(doc=4454,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 4454, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=4454)
      0.5 = coord(2/4)
    
    Abstract
    Our objective was to determine the prevalence of the term preembryo in the scientific literature using a bibliometric study in the Web of Science database. We retrieved data from the Web of Science from 1986 to 2005, covering a range of 20 years since the term was first published. Searches for the terms embryo, blastocyst, preimplantation embryo, and preembryo were performed. Then, Boolean operators were applied to measure associations between terms. Finally, statistical assessments were made to compare the use of each term in the scientific literature, and in specific areas where preembryo is most used. From a total of 93,019 registers, 90,888 corresponded to embryo; 8,366 to blastocyst; 2,397 to preimplantation embryo; and 172 to preembryo. The use frequency for preembryo was 2:1000. The term preembryo showed a lower cumulative impact factor (343) in comparison with the others (25,448; 5,530; and 546; respectively) in the highest scored journal category. We conclude that the term preembryo is not used in the scientific community, probably because it is confusing or inadequate. The authors suggest that its use in the scientific literature should be avoided in future publications. The bibliometric analysis confirms this statement. While preembryo hardly ever is used, terms such as preimplantation embryo and blastocyst have gained wide acceptance in publications from the same areas of study.
  8. Zhou, D.; Lawless, S.; Wu, X.; Zhao, W.; Liu, J.: ¬A study of user profile representation for personalized cross-language information retrieval (2016) 0.11
    0.11355135 = product of:
      0.1514018 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 3167) [ClassicSimilarity], result of:
              0.033293735 = score(doc=3167,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 3167, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3167)
          0.25 = coord(1/4)
        0.12717786 = weight(_text_:frequency in 3167) [ClassicSimilarity], result of:
          0.12717786 = score(doc=3167,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.46005818 = fieldWeight in 3167, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3167)
        0.015900511 = product of:
          0.031801023 = sum of:
            0.031801023 = weight(_text_:22 in 3167) [ClassicSimilarity], result of:
              0.031801023 = score(doc=3167,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19345059 = fieldWeight in 3167, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3167)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    Purpose - With an increase in the amount of multilingual content on the World Wide Web, users are often striving to access information provided in a language of which they are non-native speakers. The purpose of this paper is to present a comprehensive study of user profile representation techniques and investigate their use in personalized cross-language information retrieval (CLIR) systems through the means of personalized query expansion. Design/methodology/approach - The user profiles consist of weighted terms computed by using frequency-based methods such as tf-idf and BM25, as well as various latent semantic models trained on monolingual documents and cross-lingual comparable documents. This paper also proposes an automatic evaluation method for comparing various user profile generation techniques and query expansion methods. Findings - Experimental results suggest that latent semantic-weighted user profile representation techniques are superior to frequency-based methods, and are particularly suitable for users with a sufficient amount of historical data. The study also confirmed that user profiles represented by latent semantic models trained on a cross-lingual level gained better performance than the models trained on a monolingual level. Originality/value - Previous studies on personalized information retrieval systems have primarily investigated user profiles and personalization strategies on a monolingual level. The effect of utilizing such monolingual profiles for personalized CLIR remains unclear. The current study fills the gap by a comprehensive study of user profile representation for personalized CLIR and a novel personalized CLIR evaluation methodology to ensure repeatable and controlled experiments can be conducted.
    Date
    20. 1.2015 18:30:22
  9. Altinel, B.; Ganiz, M.C.: Semantic text classification : a survey of past and recent advances (2018) 0.11
    0.10798191 = product of:
      0.14397588 = sum of:
        0.008155267 = product of:
          0.032621067 = sum of:
            0.032621067 = weight(_text_:based in 5051) [ClassicSimilarity], result of:
              0.032621067 = score(doc=5051,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.2306343 = fieldWeight in 5051, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.03125 = fieldNorm(doc=5051)
          0.25 = coord(1/4)
        0.06387796 = weight(_text_:term in 5051) [ClassicSimilarity], result of:
          0.06387796 = score(doc=5051,freq=4.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.29162687 = fieldWeight in 5051, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=5051)
        0.071942665 = weight(_text_:frequency in 5051) [ClassicSimilarity], result of:
          0.071942665 = score(doc=5051,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.2602482 = fieldWeight in 5051, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.03125 = fieldNorm(doc=5051)
      0.75 = coord(3/4)
    
    Abstract
    Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.
  10. Zhu, W.Z.; Allen, R.B.: Document clustering using the LSI subspace signature model (2013) 0.11
    0.10762094 = product of:
      0.14349459 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 690) [ClassicSimilarity], result of:
              0.028250674 = score(doc=690,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.25 = coord(1/4)
        0.117351316 = weight(_text_:term in 690) [ClassicSimilarity], result of:
          0.117351316 = score(doc=690,freq=6.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.5357528 = fieldWeight in 690, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=690)
        0.019080611 = product of:
          0.038161222 = sum of:
            0.038161222 = weight(_text_:22 in 690) [ClassicSimilarity], result of:
              0.038161222 = score(doc=690,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23214069 = fieldWeight in 690, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.046875 = fieldNorm(doc=690)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    We describe the latent semantic indexing subspace signature model (LSISSM) for semantic content representation of unstructured text. Grounded on singular value decomposition, the model represents terms and documents by the distribution signatures of their statistical contribution across the top-ranking latent concept dimensions. LSISSM matches term signatures with document signatures according to their mapping coherence between latent semantic indexing (LSI) term subspace and LSI document subspace. LSISSM does feature reduction and finds a low-rank approximation of scalable and sparse term-document matrices. Experiments demonstrate that this approach significantly improves the performance of major clustering algorithms such as standard K-means and self-organizing maps compared with the vector space model and the traditional LSI model. The unique contribution ranking mechanism in LSISSM also improves the initialization of standard K-means compared with random seeding procedure, which sometimes causes low efficiency and effectiveness of clustering. A two-stage initialization strategy based on LSISSM significantly reduces the running time of standard K-means procedures.
    Date
    23. 3.2013 13:22:36
  11. Wang, F.; Wolfram, D.: Assessment of journal similarity based on citing discipline analysis (2015) 0.10
    0.10150333 = product of:
      0.13533777 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 1849) [ClassicSimilarity], result of:
              0.023542227 = score(doc=1849,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 1849, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1849)
          0.25 = coord(1/4)
        0.08992833 = weight(_text_:frequency in 1849) [ClassicSimilarity], result of:
          0.08992833 = score(doc=1849,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 1849, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1849)
        0.039523892 = product of:
          0.079047784 = sum of:
            0.079047784 = weight(_text_:assessment in 1849) [ClassicSimilarity], result of:
              0.079047784 = score(doc=1849,freq=2.0), product of:
                0.25917634 = queryWeight, product of:
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.04694356 = queryNorm
                0.30499613 = fieldWeight in 1849, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  5.52102 = idf(docFreq=480, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=1849)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    This study compares the range of disciplines of citing journal articles to determine how closely related journals assigned to the same Web of Science research area are. The frequency distribution of disciplines by citing articles provides a signature for a cited journal that permits it to be compared with other journals using similarity comparison techniques. As an initial exploration, citing discipline data for 40 high-impact-factor journals assigned to the "information science and library science" category of the Web of Science were compared across 5 time periods. Similarity relationships were determined using multidimensional scaling and hierarchical cluster analysis to compare the outcomes produced by the proposed citing discipline and established cocitation methods. The maps and clustering outcomes reveal that a number of journals in allied areas of the information science and library science category may not be very closely related to each other or may not be appropriately situated in the category studied. The citing discipline similarity data resulted in similar outcomes with the cocitation data but with some notable differences. Because the citing discipline method relies on a citing perspective different from cocitations, it may provide a complementary way to compare journal similarity that is less labor intensive than cocitation analysis.
  12. Zhao, D.; Strotmann, A.: Dimensions and uncertainties of author citation rankings : lessons learned from frequency-weighted in-text citation counting (2016) 0.10
    0.096987605 = product of:
      0.19397521 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 2774) [ClassicSimilarity], result of:
              0.028250674 = score(doc=2774,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 2774, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=2774)
          0.25 = coord(1/4)
        0.18691254 = weight(_text_:frequency in 2774) [ClassicSimilarity], result of:
          0.18691254 = score(doc=2774,freq=6.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.6761447 = fieldWeight in 2774, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.046875 = fieldNorm(doc=2774)
      0.5 = coord(2/4)
    
    Abstract
    In-text frequency-weighted citation counting has been seen as a particularly promising solution to the well-known problem of citation analysis that it treats all citations equally, be they crucial to the citing paper or perfunctory. But what is a good weighting scheme? We compare 12 different in-text citation frequency-weighting schemes in the field of library and information science (LIS) and explore author citation impact patterns based on their performance in these schemes. Our results show that the ranks of authors vary widely with different weighting schemes that favor or are biased against common citation impact patterns-substantiated, applied, or noted. These variations separate LIS authors quite clearly into groups with these impact patterns. With consensus rank limits, the hard upper and lower bounds for reasonable author ranks that they provide suggest that author citation ranks may be subject to something like an uncertainty principle.
  13. Alzahrani, S.; Palade, V.; Salim, N.; Abraham, A.: Using structural information and citation evidence to detect significant plagiarism cases in scientific publications (2012) 0.09
    0.093949854 = product of:
      0.12526648 = sum of:
        0.008155267 = product of:
          0.032621067 = sum of:
            0.032621067 = weight(_text_:based in 4982) [ClassicSimilarity], result of:
              0.032621067 = score(doc=4982,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.2306343 = fieldWeight in 4982, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.03125 = fieldNorm(doc=4982)
          0.25 = coord(1/4)
        0.04516854 = weight(_text_:term in 4982) [ClassicSimilarity], result of:
          0.04516854 = score(doc=4982,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.20621133 = fieldWeight in 4982, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.03125 = fieldNorm(doc=4982)
        0.071942665 = weight(_text_:frequency in 4982) [ClassicSimilarity], result of:
          0.071942665 = score(doc=4982,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.2602482 = fieldWeight in 4982, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.03125 = fieldNorm(doc=4982)
      0.75 = coord(3/4)
    
    Abstract
    In plagiarism detection (PD) systems, two important problems should be considered: the problem of retrieving candidate documents that are globally similar to a document q under investigation, and the problem of side-by-side comparison of q and its candidates to pinpoint plagiarized fragments in detail. In this article, the authors investigate the usage of structural information of scientific publications in both problems, and the consideration of citation evidence in the second problem. Three statistical measures namely Inverse Generic Class Frequency, Spread, and Depth are introduced to assign a degree of importance (i.e., weight) to structural components in scientific articles. A term-weighting scheme is adjusted to incorporate component-weight factors, which is used to improve the retrieval of potential sources of plagiarism. A plagiarism screening process is applied based on a measure of resemblance, in which component-weight factors are exploited to ignore less or nonsignificant plagiarism cases. Using the notion of citation evidence, parts with proper citation evidence are excluded, and remaining cases are suspected and used to calculate the similarity index. The authors compare their approach to two flat-based baselines, TF-IDF weighting with a Cosine coefficient, and shingling with a Jaccard coefficient. In both baselines, they use different comparison units with overlapping measures for plagiarism screening. They conducted extensive experiments using a dataset of 15,412 documents divided into 8,657 source publications and 6,755 suspicious queries, which included 18,147 plagiarism cases inserted automatically. Component-weight factors are assessed using precision, recall, and F-measure averaged over a 10-fold cross-validation and compared using the ANOVA statistical test. Results from structural-based candidate retrieval and plagiarism detection are evaluated statistically against the flat baselines using paired-t tests on 10-fold cross-validation runs, which demonstrate the efficacy achieved by the proposed framework. An empirical study on the system's response shows that structural information, unlike existing plagiarism detectors, helps to flag significant plagiarism cases, improve the similarity index, and provide human-like plagiarism screening results.
  14. Liu, X.; Zhang, J.; Guo, C.: Full-text citation analysis : a new method to enhance scholarly networks (2013) 0.09
    0.09181927 = product of:
      0.18363854 = sum of:
        0.056460675 = weight(_text_:term in 1044) [ClassicSimilarity], result of:
          0.056460675 = score(doc=1044,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.25776416 = fieldWeight in 1044, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1044)
        0.12717786 = weight(_text_:frequency in 1044) [ClassicSimilarity], result of:
          0.12717786 = score(doc=1044,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.46005818 = fieldWeight in 1044, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=1044)
      0.5 = coord(2/4)
    
    Abstract
    In this article, we use innovative full-text citation analysis along with supervised topic modeling and network-analysis algorithms to enhance classical bibliometric analysis and publication/author/venue ranking. By utilizing citation contexts extracted from a large number of full-text publications, each citation or publication is represented by a probability distribution over a set of predefined topics, where each topic is labeled by an author-contributed keyword. We then used publication/citation topic distribution to generate a citation graph with vertex prior and edge transitioning probability distributions. The publication importance score for each given topic is calculated by PageRank with edge and vertex prior distributions. To evaluate this work, we sampled 104 topics (labeled with keywords) in review papers. The cited publications of each review paper are assumed to be "important publications" for the target topic (keyword), and we use these cited publications to validate our topic-ranking result and to compare different publication-ranking lists. Evaluation results show that full-text citation and publication content prior topic distribution, along with the classical PageRank algorithm can significantly enhance bibliometric analysis and scientific publication ranking performance, comparing with term frequency-inverted document frequency (tf-idf), language model, BM25, PageRank, and PageRank + language model (p < .001), for academic information retrieval (IR) systems.
  15. Jiang, Z.; Gu, Q.; Yin, Y.; Wang, J.; Chen, D.: GRAW+ : a two-view graph propagation method with word coupling for readability assessment (2019) 0.09
    0.08851962 = product of:
      0.17703924 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 5218) [ClassicSimilarity], result of:
              0.033293735 = score(doc=5218,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 5218, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=5218)
          0.25 = coord(1/4)
        0.1687158 = sum of:
          0.13691479 = weight(_text_:assessment in 5218) [ClassicSimilarity], result of:
            0.13691479 = score(doc=5218,freq=6.0), product of:
              0.25917634 = queryWeight, product of:
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.04694356 = queryNorm
              0.5282689 = fieldWeight in 5218, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.52102 = idf(docFreq=480, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5218)
          0.031801023 = weight(_text_:22 in 5218) [ClassicSimilarity], result of:
            0.031801023 = score(doc=5218,freq=2.0), product of:
              0.16438834 = queryWeight, product of:
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.04694356 = queryNorm
              0.19345059 = fieldWeight in 5218, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.5018296 = idf(docFreq=3622, maxDocs=44218)
                0.0390625 = fieldNorm(doc=5218)
      0.5 = coord(2/4)
    
    Abstract
    Existing methods for readability assessment usually construct inductive classification models to assess the readability of singular text documents based on extracted features, which have been demonstrated to be effective. However, they rarely make use of the interrelationship among documents on readability, which can help increase the accuracy of readability assessment. In this article, we adopt a graph-based classification method to model and utilize the relationship among documents using the coupled bag-of-words model. We propose a word coupling method to build the coupled bag-of-words model by estimating the correlation between words on reading difficulty. In addition, we propose a two-view graph propagation method to make use of both the coupled bag-of-words model and the linguistic features. Our method employs a graph merging operation to combine graphs built according to different views, and improves the label propagation by incorporating the ordinal relation among reading levels. Experiments were conducted on both English and Chinese data sets, and the results demonstrate both effectiveness and potential of the method.
    Date
    15. 4.2019 13:46:22
  16. Gil-Leiva, I.: SISA-automatic indexing system for scientific articles : experiments with location heuristics rules versus TF-IDF rules (2017) 0.09
    0.087833405 = product of:
      0.17566681 = sum of:
        0.06775281 = weight(_text_:term in 3622) [ClassicSimilarity], result of:
          0.06775281 = score(doc=3622,freq=2.0), product of:
            0.21904005 = queryWeight, product of:
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.04694356 = queryNorm
            0.309317 = fieldWeight in 3622, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              4.66603 = idf(docFreq=1130, maxDocs=44218)
              0.046875 = fieldNorm(doc=3622)
        0.107914 = weight(_text_:frequency in 3622) [ClassicSimilarity], result of:
          0.107914 = score(doc=3622,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.39037234 = fieldWeight in 3622, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.046875 = fieldNorm(doc=3622)
      0.5 = coord(2/4)
    
    Abstract
    Indexing is contextualized and a brief description is provided of some of the most used automatic indexing systems. We describe SISA, a system which uses location heuristics rules, statistical rules like term frequency (TF) or TF-IDF to obtain automatic or semi-automatic indexing, depending on the user's preference. The aim of this research is to ascertain which rules (location heuristics rules or TF-IDF rules) provide the best indexing terms. SISA is used to obtain the automatic indexing of 200 scientific articles on fruit growing written in Portuguese. It uses, on the one hand, location heuristics rules founded on the value of certain parts of the articles for indexing such as titles, abstracts, keywords, headings, first paragraph, conclusions and references and, on the other, TF-IDF rules. The indexing is then evaluated to ascertain retrieval performance through recall, precision and f-measure. Automatic indexing of the articles with location heuristics rules provided the best results with the evaluation measures.
  17. Yi, K.: Harnessing collective intelligence in social tagging using Delicious (2012) 0.08
    0.083785795 = product of:
      0.11171439 = sum of:
        0.005885557 = product of:
          0.023542227 = sum of:
            0.023542227 = weight(_text_:based in 515) [ClassicSimilarity], result of:
              0.023542227 = score(doc=515,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.16644597 = fieldWeight in 515, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=515)
          0.25 = coord(1/4)
        0.08992833 = weight(_text_:frequency in 515) [ClassicSimilarity], result of:
          0.08992833 = score(doc=515,freq=2.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.32531026 = fieldWeight in 515, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=515)
        0.015900511 = product of:
          0.031801023 = sum of:
            0.031801023 = weight(_text_:22 in 515) [ClassicSimilarity], result of:
              0.031801023 = score(doc=515,freq=2.0), product of:
                0.16438834 = queryWeight, product of:
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19345059 = fieldWeight in 515, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.5018296 = idf(docFreq=3622, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=515)
          0.5 = coord(1/2)
      0.75 = coord(3/4)
    
    Abstract
    A new collaborative approach in information organization and sharing has recently arisen, known as collaborative tagging or social indexing. A key element of collaborative tagging is the concept of collective intelligence (CI), which is a shared intelligence among all participants. This research investigates the phenomenon of social tagging in the context of CI with the aim to serve as a stepping-stone towards the mining of truly valuable social tags for web resources. This study focuses on assessing and evaluating the degree of CI embedded in social tagging over time in terms of two-parameter values, number of participants, and top frequency ranking window. Five different metrics were adopted and utilized for assessing the similarity between ranking lists: overlapList, overlapRank, Footrule, Fagin's measure, and the Inverse Rank measure. The result of this study demonstrates that a substantial degree of CI is most likely to be achieved when somewhere between the first 200 and 400 people have participated in tagging, and that a target degree of CI can be projected by controlling the two factors along with the selection of a similarity metric. The study also tests some experimental conditions for detecting social tags with high CI degree. The results of this study can be applicable to the study of filtering social tags based on CI; filtered social tags may be utilized for the metadata creation of tagged resources and possibly for the retrieval of tagged resources.
    Date
    25.12.2012 15:22:37
  18. Paltoglou, G.: Sentiment-based event detection in Twitter (2016) 0.08
    0.08297726 = product of:
      0.16595452 = sum of:
        0.010194084 = product of:
          0.040776335 = sum of:
            0.040776335 = weight(_text_:based in 3010) [ClassicSimilarity], result of:
              0.040776335 = score(doc=3010,freq=6.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.28829288 = fieldWeight in 3010, product of:
                  2.4494898 = tf(freq=6.0), with freq of:
                    6.0 = termFreq=6.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=3010)
          0.25 = coord(1/4)
        0.15576044 = weight(_text_:frequency in 3010) [ClassicSimilarity], result of:
          0.15576044 = score(doc=3010,freq=6.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.5634539 = fieldWeight in 3010, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3010)
      0.5 = coord(2/4)
    
    Abstract
    The main focus of this article is to examine whether sentiment analysis can be successfully used for "event detection," that is, detecting significant events that occur in the world. Most solutions to this problem are typically based on increases or spikes in frequency of terms in social media. In our case, we explore whether sudden changes in the positivity or negativity that keywords are typically associated with can be exploited for this purpose. A data set that contains several million Twitter messages over a 1-month time span is presented and experimental results demonstrate that sentiment analysis can be successfully utilized for this purpose. Further experiments study the sensitivity of both frequency- or sentiment-based solutions to a number of parameters. Concretely, we show that the number of tweets that are used for event detection is an important factor, while the number of days used to extract token frequency or sentiment averages is not. Lastly, we present results focusing on detecting local events and conclude that all approaches are dependant on the level of coverage that such events receive in social media.
  19. Lievers, W.B.; Pilkey, A.K.: Characterizing the frequency of repeated citations : the effects of journal, subject area, and self-citation (2012) 0.08
    0.082041934 = product of:
      0.16408387 = sum of:
        0.008323434 = product of:
          0.033293735 = sum of:
            0.033293735 = weight(_text_:based in 2725) [ClassicSimilarity], result of:
              0.033293735 = score(doc=2725,freq=4.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.23539014 = fieldWeight in 2725, product of:
                  2.0 = tf(freq=4.0), with freq of:
                    4.0 = termFreq=4.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.0390625 = fieldNorm(doc=2725)
          0.25 = coord(1/4)
        0.15576044 = weight(_text_:frequency in 2725) [ClassicSimilarity], result of:
          0.15576044 = score(doc=2725,freq=6.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.5634539 = fieldWeight in 2725, product of:
              2.4494898 = tf(freq=6.0), with freq of:
                6.0 = termFreq=6.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.0390625 = fieldNorm(doc=2725)
      0.5 = coord(2/4)
    
    Abstract
    Previous studies have repeatedly demonstrated that the relevance of a citing document is related to the number of times with which the source document is cited. Despite the ease with which electronic documents would permit the incorporation of this information into citation-based document search and retrieval systems, the possibilities of repeated citations remain untapped. Part of this under-utilization may be due to the fact that very little is known regarding the pattern of repeated citations in scholarly literature or how this pattern may vary as a function of journal, academic discipline or self-citation. The current research addresses these unanswered questions in order to facilitate the future incorporation of repeated citation information into document search and retrieval systems. Using data mining of electronic texts, the citation characteristics of nine different journals, covering the three different academic fields (economics, computing, and medicine & biology), were characterized. It was found that the frequency (f) with which a reference is cited N or more times within a document is consistent across the sampled journals and academic fields. Self-citation causes an increase in frequency, and this effect becomes more pronounced for large N. The objectivity, automatability, and insensitivity of repeated citations to journal and discipline, present powerful opportunities for improving citation-based document search.
  20. Zhao, D.; Strotmann, A.; Cappello, A.: In-text function of author self-citations : implications for research evaluation practice (2018) 0.08
    0.07983806 = product of:
      0.15967612 = sum of:
        0.0070626684 = product of:
          0.028250674 = sum of:
            0.028250674 = weight(_text_:based in 4347) [ClassicSimilarity], result of:
              0.028250674 = score(doc=4347,freq=2.0), product of:
                0.14144066 = queryWeight, product of:
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.04694356 = queryNorm
                0.19973516 = fieldWeight in 4347, product of:
                  1.4142135 = tf(freq=2.0), with freq of:
                    2.0 = termFreq=2.0
                  3.0129938 = idf(docFreq=5906, maxDocs=44218)
                  0.046875 = fieldNorm(doc=4347)
          0.25 = coord(1/4)
        0.15261345 = weight(_text_:frequency in 4347) [ClassicSimilarity], result of:
          0.15261345 = score(doc=4347,freq=4.0), product of:
            0.27643865 = queryWeight, product of:
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.04694356 = queryNorm
            0.55206984 = fieldWeight in 4347, product of:
              2.0 = tf(freq=4.0), with freq of:
                4.0 = termFreq=4.0
              5.888745 = idf(docFreq=332, maxDocs=44218)
              0.046875 = fieldNorm(doc=4347)
      0.5 = coord(2/4)
    
    Abstract
    Author self-citations were examined as to their function, frequency, and location in the full text of research articles and compared with external citations. Function analysis was based on manual coding of a small dataset in the field of library and information studies, whereas the analyses by frequency and location used both this small dataset and a large dataset from PubMed Central. Strong evidence was found that self-citations appear more likely to serve as substantial citations in a text than do external citations. This finding challenges previous studies that assumed that self-citations should be discounted or even removed and suggests that self-citations should be given more weight in citation analysis, if anything.

Types

  • a 1893
  • el 132
  • m 84
  • s 40
  • x 13
  • b 4
  • r 4
  • i 1
  • p 1
  • More… Less…

Themes

Subjects

Classifications