Search (12 results, page 1 of 1)

Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: ¬A survey on tag recommendation methods : a review (2017) 0.02

0.015274854 = product of:
  0.053461988 = sum of:
    0.02465703 = weight(_text_:web in 3524) [ClassicSimilarity], result of:
      0.02465703 = score(doc=3524,freq=4.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.25496176 = fieldWeight in 3524, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3524)
    0.0071344664 = weight(_text_:information in 3524) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=3524,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 3524, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3524)
    0.014978974 = weight(_text_:retrieval in 3524) [ClassicSimilarity], result of:
      0.014978974 = score(doc=3524,freq=2.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.16710453 = fieldWeight in 3524, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3524)
    0.0066915164 = product of:
      0.020074548 = sum of:
        0.020074548 = weight(_text_:22 in 3524) [ClassicSimilarity], result of:
          0.020074548 = score(doc=3524,freq=2.0), product of:
            0.103770934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.029633347 = queryNorm
            0.19345059 = fieldWeight in 3524, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3524)
      0.33333334 = coord(1/3)
  0.2857143 = coord(4/14)

Abstract: Tags (keywords freely assigned by users to describe web content) have become highly popular on Web 2.0 applications, because of the strong stimuli and easiness for users to create and describe their own content. This increase in tag popularity has led to a vast literature on tag recommendation methods. These methods aim at assisting users in the tagging process, possibly increasing the quality of the generated tags and, consequently, improving the quality of the information retrieval (IR) services that rely on tags as data sources. Regardless of the numerous and diversified previous studies on tag recommendation, to our knowledge, no previous work has summarized and organized them into a single survey article. In this article, we propose a taxonomy for tag recommendation methods, classifying them according to the target of the recommendations, their objectives, exploited data sources, and underlying techniques. Moreover, we provide a critical overview of these methods, pointing out their advantages and disadvantages. Finally, we describe the main open challenges related to the field, such as tag ambiguity, cold start, and evaluation issues.
Date: 16.11.2017 13:30:22
Source: Journal of the Association for Information Science and Technology. 68(2017) no.4, S.830-844

Dalip, D.H.; Gonçalves, M.A.; Cristo, M.; Calado, P.: ¬A general multiview framework for assessing the quality of collaboratively created content on web 2.0 (2017) 0.01

0.0077985805 = product of:
  0.036393374 = sum of:
    0.02465703 = weight(_text_:web in 3343) [ClassicSimilarity], result of:
      0.02465703 = score(doc=3343,freq=4.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.25496176 = fieldWeight in 3343, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3343)
    0.0050448296 = weight(_text_:information in 3343) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=3343,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 3343, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3343)
    0.0066915164 = product of:
      0.020074548 = sum of:
        0.020074548 = weight(_text_:22 in 3343) [ClassicSimilarity], result of:
          0.020074548 = score(doc=3343,freq=2.0), product of:
            0.103770934 = queryWeight, product of:
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.029633347 = queryNorm
            0.19345059 = fieldWeight in 3343, product of:
              1.4142135 = tf(freq=2.0), with freq of:
                2.0 = termFreq=2.0
              3.5018296 = idf(docFreq=3622, maxDocs=44218)
              0.0390625 = fieldNorm(doc=3343)
      0.33333334 = coord(1/3)
  0.21428572 = coord(3/14)

Date: 16.11.2017 13:04:22
Object: Web 2.0
Source: Journal of the Association for Information Science and Technology. 68(2017) no.2, S.286-308

Moura, E.S. de; Fernandes, D.; Ribeiro-Neto, B.; Silva, A.S. da; Gonçalves, M.A.: Using structural information to improve search in Web collections (2010) 0.01
```
0.00639995 = product of:
  0.04479965 = sum of:
    0.036238287 = weight(_text_:web in 4119) [ClassicSimilarity], result of:
      0.036238287 = score(doc=4119,freq=6.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.37471575 = fieldWeight in 4119, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.046875 = fieldNorm(doc=4119)
    0.00856136 = weight(_text_:information in 4119) [ClassicSimilarity], result of:
      0.00856136 = score(doc=4119,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.16457605 = fieldWeight in 4119, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=4119)
  0.14285715 = coord(2/14)
```
Abstract

In this work, we investigate the problem of using the block structure of Web pages to improve ranking results. Starting with basic intuitions provided by the concepts of term frequency (TF) and inverse document frequency (IDF), we propose nine block-weight functions to distinguish the impact of term occurrences inside page blocks, instead of inside whole pages. These are then used to compute a modified BM25 ranking function. Using four distinct Web collections, we ran extensive experiments to compare our block-weight ranking formulas with two other baselines: (a) a BM25 ranking applied to full pages, and (b) a BM25 ranking that takes into account best blocks. Our methods suggest that our block-weighting ranking method is superior to all baselines across all collections we used and that average gain in precision figures from 5 to 20% are generated.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.12, S.2503-2513
Pereira, D.A.; Ribeiro-Neto, B.; Ziviani, N.; Laender, A.H.F.; Gonçalves, M.A.: ¬A generic Web-based entity resolution framework (2011) 0.01
```
0.006000682 = product of:
  0.04200477 = sum of:
    0.034870304 = weight(_text_:web in 4450) [ClassicSimilarity], result of:
      0.034870304 = score(doc=4450,freq=8.0), product of:
        0.09670874 = queryWeight, product of:
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.029633347 = queryNorm
        0.36057037 = fieldWeight in 4450, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          3.2635105 = idf(docFreq=4597, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4450)
    0.0071344664 = weight(_text_:information in 4450) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=4450,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 4450, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4450)
  0.14285715 = coord(2/14)
```
Abstract

Web data repositories usually contain references to thousands of real-world entities from multiple sources. It is not uncommon that multiple entities share the same label (polysemes) and that distinct label variations are associated with the same entity (synonyms), which frequently leads to ambiguous interpretations. Further, spelling variants, acronyms, abbreviated forms, and misspellings compound to worsen the problem. Solving this problem requires identifying which labels correspond to the same real-world entity, a process known as entity resolution. One approach to solve the entity resolution problem is to associate an authority identifier and a list of variant forms with each entity-a data structure known as an authority file. In this work, we propose a generic framework for implementing a method for generating authority files. Our method uses information from the Web to improve the quality of the authority file and, because of that, is referred to as WER-Web-based Entity Resolution. Our contribution here is threefold: (a) we discuss how to implement the WER framework, which is flexible and easy to adapt to new domains; (b) we run extended experimentation with our WER framework to show that it outperforms selected baselines; and (c) we compare the results of a specialized solution for author name resolution with those produced by the generic WER framework, and show that the WER results remain competitive.

Source

Journal of the American Society for Information Science and Technology. 62(2011) no.5, S.919-932
Cavalcante Dourado, Í.; Galante, R.; Gonçalves, M.A.; Silva Torres, R. de: Bag of textual graphs (BoTG) : a general graph-based text representation model (2019) 0.00
```
0.0047255447 = product of:
  0.033078812 = sum of:
    0.0071344664 = weight(_text_:information in 5291) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=5291,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 5291, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5291)
    0.025944345 = weight(_text_:retrieval in 5291) [ClassicSimilarity], result of:
      0.025944345 = score(doc=5291,freq=6.0), product of:
        0.08963835 = queryWeight, product of:
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.029633347 = queryNorm
        0.28943354 = fieldWeight in 5291, product of:
          2.4494898 = tf(freq=6.0), with freq of:
            6.0 = termFreq=6.0
          3.024915 = idf(docFreq=5836, maxDocs=44218)
          0.0390625 = fieldNorm(doc=5291)
  0.14285715 = coord(2/14)
```
Abstract

Text representation models are the fundamental basis for information retrieval and text mining tasks. Although different text models have been proposed, they typically target specific task aspects in isolation, such as time efficiency, accuracy, or applicability for different scenarios. Here we present Bag of Textual Graphs (BoTG), a general text representation model that addresses these three requirements at the same time. The proposed textual representation is based on a graph-based scheme that encodes term proximity and term ordering, and represents text documents into an efficient vector space that addresses all these aspects as well as provides discriminative textual patterns. Extensive experiments are conducted in two experimental scenarios-classification and retrieval-considering multiple well-known text collections. We also compare our model against several methods from the literature. Experimental results demonstrate that our model is generic enough to handle different tasks and collections. It is also more efficient than the widely used state-of-the-art methods in textual classification and retrieval tasks, with a competitive effectiveness, sometimes with gains by large margins.

Source

Journal of the Association for Information Science and Technology. 70(2019) no.8, S.817-829
Ferreira, A.A.; Veloso, A.; Gonçalves, M.A.; Laender, A.H.F.: Self-training author name disambiguation for information scarce scenarios (2014) 0.00
```
8.826613E-4 = product of:
  0.012357258 = sum of:
    0.012357258 = weight(_text_:information in 1292) [ClassicSimilarity], result of:
      0.012357258 = score(doc=1292,freq=12.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.23754507 = fieldWeight in 1292, product of:
          3.4641016 = tf(freq=12.0), with freq of:
            12.0 = termFreq=12.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1292)
  0.071428575 = coord(1/14)
```
Abstract

We present a novel 3-step self-training method for author name disambiguation-SAND (self-training associative name disambiguator)-which requires no manual labeling, no parameterization (in real-world scenarios) and is particularly suitable for the common situation in which only the most basic information about a citation record is available (i.e., author names, and work and venue titles). During the first step, real-world heuristics on coauthors are able to produce highly pure (although fragmented) clusters. The most representative of these clusters are then selected to serve as training data for the third supervised author assignment step. The third step exploits a state-of-the-art transductive disambiguation method capable of detecting unseen authors not included in any training example and incorporating reliable predictions to the training data. Experiments conducted with standard public collections, using the minimum set of attributes present in a citation, demonstrate that our proposed method outperforms all representative unsupervised author grouping disambiguation methods and is very competitive with fully supervised author assignment methods. Thus, different from other bootstrapping methods that explore privileged, hard to obtain information such as self-citations and personal information, our proposed method produces topnotch performance with no (manual) training data or parameterization and in the presence of scarce information.

Source

Journal of the Association for Information Science and Technology. 65(2014) no.6, S.1257-1278
Cota, R.G.; Ferreira, A.A.; Nascimento, C.; Gonçalves, M.A.; Laender, A.H.F.: ¬An unsupervised heuristic-based hierarchical method for name disambiguation in bibliographic citations (2010) 0.00
```
7.2068995E-4 = product of:
  0.010089659 = sum of:
    0.010089659 = weight(_text_:information in 3986) [ClassicSimilarity], result of:
      0.010089659 = score(doc=3986,freq=8.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.19395474 = fieldWeight in 3986, product of:
          2.828427 = tf(freq=8.0), with freq of:
            8.0 = termFreq=8.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3986)
  0.071428575 = coord(1/14)
```
Abstract

Name ambiguity in the context of bibliographic citations is a difficult problem which, despite the many efforts from the research community, still has a lot of room for improvement. In this article, we present a heuristic-based hierarchical clustering method to deal with this problem. The method successively fuses clusters of citations of similar author names based on several heuristics and similarity measures on the components of the citations (e.g., coauthor names, work title, and publication venue title). During the disambiguation task, the information about fused clusters is aggregated providing more information for the next round of fusion. In order to demonstrate the effectiveness of our method, we ran a series of experiments in two different collections extracted from real-world digital libraries and compared it, under two metrics, with four representative methods described in the literature. We present comparisons of results using each considered attribute separately (i.e., coauthor names, work title, and publication venue title) with the author name attribute and using all attributes together. These results show that our unsupervised method, when using all attributes, performs competitively against all other methods, under both metrics, loosing only in one case against a supervised method, whose result was very close to ours. Moreover, such results are achieved without the burden of any training and without using any privileged information such as knowing a priori the correct number of clusters.

Source

Journal of the American Society for Information Science and Technology. 61(2010) no.9, S.1853-1870
Melo, P.F.; Dalip, D.H.; Junior, M.M.; Gonçalves, M.A.; Benevenuto, F.: 10SENT : a stable sentiment analysis method based on the combination of off-the-shelf approaches (2019) 0.00
```
5.0960475E-4 = product of:
  0.0071344664 = sum of:
    0.0071344664 = weight(_text_:information in 4990) [ClassicSimilarity], result of:
      0.0071344664 = score(doc=4990,freq=4.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.13714671 = fieldWeight in 4990, product of:
          2.0 = tf(freq=4.0), with freq of:
            4.0 = termFreq=4.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=4990)
  0.071428575 = coord(1/14)
```
Abstract

Sentiment analysis has become a very important tool for analysis of social media data. There are several methods developed, covering distinct aspects of the problem and disparate strategies. However, no single technique fits well in all cases or for all data sources. Supervised approaches may be able to adapt to specific situations, but require manually labeled training, which is very cumbersome and expensive to acquire, mainly for a new application. In this context, we propose to combine several popular and effective state-of-the-practice sentiment analysis methods by means of an unsupervised bootstrapped strategy. One of our main goals is to reduce the large variability (low stability) of the unsupervised methods across different domains. The experimental results demonstrate that our combined method (aka, 10SENT) improves the effectiveness of the classification task, considering thirteen different data sets. Also, it tackles the key problem of cross-domain low stability and produces the best (or close to best) results in almost all considered contexts, without any additional costs (e.g., manual labeling). Finally, we also investigate a transfer learning approach for sentiment analysis to gather additional (unsupervised) information for the proposed approach, and we show the potential of this technique to improve our results.

Source

Journal of the Association for Information Science and Technology. 70(2019) no.3, S.242-255

Santana, A.F.; Gonçalves, M.A.; Laender, A.H.F.; Ferreira, A.A.: Incremental author name disambiguation by exploiting domain-specific heuristics (2017) 0.00

4.32414E-4 = product of:
  0.0060537956 = sum of:
    0.0060537956 = weight(_text_:information in 3587) [ClassicSimilarity], result of:
      0.0060537956 = score(doc=3587,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.116372846 = fieldWeight in 3587, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.046875 = fieldNorm(doc=3587)
  0.071428575 = coord(1/14)

Source: Journal of the Association for Information Science and Technology. 68(2017) no.4, S.931-945

Silva, R.M.; Gonçalves, M.A.; Veloso, A.: ¬A Two-stage active learning method for learning to rank (2014) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 1184) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=1184,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 1184, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=1184)
  0.071428575 = coord(1/14)

Source: Journal of the Association for Information Science and Technology. 65(2014) no.1, S.109-128

Martins, E.F.; Belém, F.M.; Almeida, J.M.; Gonçalves, M.A.: On cold start for associative tag recommendation (2016) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 2494) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=2494,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 2494, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=2494)
  0.071428575 = coord(1/14)

Source: Journal of the Association for Information Science and Technology. 67(2016) no.1, S.83-105

Salles, T.; Rocha, L.; Gonçalves, M.A.; Almeida, J.M.; Mourão, F.; Meira Jr., W.; Viegas, F.: ¬A quantitative analysis of the temporal effects on automatic text classification (2016) 0.00

3.6034497E-4 = product of:
  0.0050448296 = sum of:
    0.0050448296 = weight(_text_:information in 3014) [ClassicSimilarity], result of:
      0.0050448296 = score(doc=3014,freq=2.0), product of:
        0.052020688 = queryWeight, product of:
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.029633347 = queryNorm
        0.09697737 = fieldWeight in 3014, product of:
          1.4142135 = tf(freq=2.0), with freq of:
            2.0 = termFreq=2.0
          1.7554779 = idf(docFreq=20772, maxDocs=44218)
          0.0390625 = fieldNorm(doc=3014)
  0.071428575 = coord(1/14)

Source: Journal of the Association for Information Science and Technology. 67(2016) no.7, S.1639-1667

Search (12 results, page 1 of 1)

Authors

Themes