Document (#40451)

Author
Zanibbi, R.
Yuan, B.
Title
Keyword and image-based retrieval for mathematical expressions
Source
https://www.cs.rit.edu/~rlaz/files/ZanibbiYuanDRR2011.pdf
Year
2011
Abstract
Two new methods for retrieving mathematical expressions using conventional keyword search and expression images are presented. An expression-level TF-IDF (term frequency-inverse document frequency) approach is used for keyword search, where queries and indexed expressions are represented by keywords taken from LATEX strings. TF-IDF is computed at the level of individual expressions rather than documents to increase the precision of matching. The second retrieval technique is a form of Content-Base Image Retrieval (CBIR). Expressions are segmented into connected components, and then components in the query expression and each expression in the collection are matched using contour and density features, aspect ratios, and relative positions. In an experiment using ten randomly sampled queries from a corpus of over 22,000 expressions, precision-at-k (k= 20) for the keyword-based approach was higher (keyword: µ= 84.0,s= 19.0, image-based:µ= 32.0,s= 30.7), but for a few of the queries better results were obtained using a combination of the two techniques.
Field
Mathematik

Similar documents (author)

  1. Yuan, W.: End-user searching behavior in information retrieval : a longitudinal study (1997) 5.18
    5.1847467 = sum of:
      5.1847467 = weight(author_txt:yuan in 395) [ClassicSimilarity], result of:
        5.1847467 = fieldWeight in 395, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.295595 = idf(docFreq=28, maxDocs=42740)
          0.625 = fieldNorm(doc=395)
    
  2. Yuan, W.; Meadow, C.T.: ¬A study of the use of variables in information retrieval user studies (1999) 4.15
    4.1477976 = sum of:
      4.1477976 = weight(author_txt:yuan in 3944) [ClassicSimilarity], result of:
        4.1477976 = fieldWeight in 3944, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.295595 = idf(docFreq=28, maxDocs=42740)
          0.5 = fieldNorm(doc=3944)
    
  3. Jin, Z.; Yuan, C.: On the ambiguity of information retrieval for visualization (1998) 4.15
    4.1477976 = sum of:
      4.1477976 = weight(author_txt:yuan in 4217) [ClassicSimilarity], result of:
        4.1477976 = fieldWeight in 4217, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.295595 = idf(docFreq=28, maxDocs=42740)
          0.5 = fieldNorm(doc=4217)
    
  4. Yuan, X.; Belkin, N.J.: Investigating information retrieval support techniques for different information-seeking strategies (2010) 4.15
    4.1477976 = sum of:
      4.1477976 = weight(author_txt:yuan in 700) [ClassicSimilarity], result of:
        4.1477976 = fieldWeight in 700, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.295595 = idf(docFreq=28, maxDocs=42740)
          0.5 = fieldNorm(doc=700)
    
  5. Yuan, X.; Belkin, N.J.: Evaluating an integrated system supporting multiple information-seeking strategies (2010) 4.15
    4.1477976 = sum of:
      4.1477976 = weight(author_txt:yuan in 993) [ClassicSimilarity], result of:
        4.1477976 = fieldWeight in 993, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          8.295595 = idf(docFreq=28, maxDocs=42740)
          0.5 = fieldNorm(doc=993)
    

Similar documents (content)

  1. Greiner-Petter, A.; Schubotz, M.; Cohl, H.S.; Gipp, B.: Semantic preserving bijective mappings for expressions involving special functions between computer algebra systems and document preparation systems (2019) 0.23
    0.22542033 = sum of:
      0.22542033 = product of:
        0.93925136 = sum of:
          0.019367436 = weight(abstract_txt:approach in 1500) [ClassicSimilarity], result of:
            0.019367436 = score(doc=1500,freq=2.0), product of:
              0.058085795 = queryWeight, product of:
                1.0252473 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.015018761 = queryNorm
              0.33342808 = fieldWeight in 1500, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0625 = fieldNorm(doc=1500)
          0.1811909 = weight(abstract_txt:latex in 1500) [ClassicSimilarity], result of:
            0.1811909 = score(doc=1500,freq=3.0), product of:
              0.17881572 = queryWeight, product of:
                1.2719837 = boost
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.015018761 = queryNorm
              1.0132828 = fieldWeight in 1500, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.360306 = idf(docFreq=9, maxDocs=42740)
                0.0625 = fieldNorm(doc=1500)
          0.14536634 = weight(abstract_txt:mathematical in 1500) [ClassicSimilarity], result of:
            0.14536634 = score(doc=1500,freq=5.0), product of:
              0.16406569 = queryWeight, product of:
                1.7230686 = boost
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.015018761 = queryNorm
              0.88602525 = fieldWeight in 1500, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.3398805 = idf(docFreq=204, maxDocs=42740)
                0.0625 = fieldNorm(doc=1500)
          0.021493819 = weight(abstract_txt:using in 1500) [ClassicSimilarity], result of:
            0.021493819 = score(doc=1500,freq=1.0), product of:
              0.09883655 = queryWeight, product of:
                1.8913307 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.015018761 = queryNorm
              0.21746832 = fieldWeight in 1500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.0625 = fieldNorm(doc=1500)
          0.14713244 = weight(abstract_txt:expression in 1500) [ClassicSimilarity], result of:
            0.14713244 = score(doc=1500,freq=1.0), product of:
              0.35632598 = queryWeight, product of:
                3.591141 = boost
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.015018761 = queryNorm
              0.4129153 = fieldWeight in 1500, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.0625 = fieldNorm(doc=1500)
          0.42470044 = weight(abstract_txt:expressions in 1500) [ClassicSimilarity], result of:
            0.42470044 = score(doc=1500,freq=3.0), product of:
              0.57335067 = queryWeight, product of:
                5.5791044 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.015018761 = queryNorm
              0.74073416 = fieldWeight in 1500, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0625 = fieldNorm(doc=1500)
        0.24 = coord(6/25)
    
  2. Yoon, J.W.; Chung, E.K.: Understanding image needs in daily life by analyzing questions in a social Q&A site (2011) 0.19
    0.18966678 = sum of:
      0.18966678 = product of:
        0.67738134 = sum of:
          0.013694845 = weight(abstract_txt:approach in 1923) [ClassicSimilarity], result of:
            0.013694845 = score(doc=1923,freq=1.0), product of:
              0.058085795 = queryWeight, product of:
                1.0252473 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.015018761 = queryNorm
              0.23576926 = fieldWeight in 1923, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.0625 = fieldNorm(doc=1923)
          0.015917024 = weight(abstract_txt:retrieval in 1923) [ClassicSimilarity], result of:
            0.015917024 = score(doc=1923,freq=1.0), product of:
              0.073502734 = queryWeight, product of:
                1.4125085 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015018761 = queryNorm
              0.21655008 = fieldWeight in 1923, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=1923)
          0.044614155 = weight(abstract_txt:components in 1923) [ClassicSimilarity], result of:
            0.044614155 = score(doc=1923,freq=1.0), product of:
              0.12764789 = queryWeight, product of:
                1.5198493 = boost
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.015018761 = queryNorm
              0.34950954 = fieldWeight in 1923, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5921526 = idf(docFreq=432, maxDocs=42740)
                0.0625 = fieldNorm(doc=1923)
          0.050440352 = weight(abstract_txt:queries in 1923) [ClassicSimilarity], result of:
            0.050440352 = score(doc=1923,freq=1.0), product of:
              0.15857974 = queryWeight, product of:
                2.0747375 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.015018761 = queryNorm
              0.31807566 = fieldWeight in 1923, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0625 = fieldNorm(doc=1923)
          0.16755848 = weight(abstract_txt:image in 1923) [ClassicSimilarity], result of:
            0.16755848 = score(doc=1923,freq=8.0), product of:
              0.17652649 = queryWeight, product of:
                2.1889925 = boost
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.015018761 = queryNorm
              0.94919735 = fieldWeight in 1923, product of:
                2.828427 = tf(freq=8.0), with freq of:
                  8.0 = termFreq=8.0
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.0625 = fieldNorm(doc=1923)
          0.13995558 = weight(abstract_txt:keyword in 1923) [ClassicSimilarity], result of:
            0.13995558 = score(doc=1923,freq=1.0), product of:
              0.37125474 = queryWeight, product of:
                4.098262 = boost
                6.0316787 = idf(docFreq=278, maxDocs=42740)
                0.015018761 = queryNorm
              0.37697992 = fieldWeight in 1923, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.0316787 = idf(docFreq=278, maxDocs=42740)
                0.0625 = fieldNorm(doc=1923)
          0.2452009 = weight(abstract_txt:expressions in 1923) [ClassicSimilarity], result of:
            0.2452009 = score(doc=1923,freq=1.0), product of:
              0.57335067 = queryWeight, product of:
                5.5791044 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.015018761 = queryNorm
              0.42766306 = fieldWeight in 1923, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0625 = fieldNorm(doc=1923)
        0.28 = coord(7/25)
    
  3. D'Ambrosio, D.M.: Conceptualizing metadata via repertory grids : exploring a method for the development of domain-specific systems for knowledge organization (2007) 0.13
    0.12948312 = sum of:
      0.12948312 = product of:
        0.80926955 = sum of:
          0.02251007 = weight(abstract_txt:retrieval in 2663) [ClassicSimilarity], result of:
            0.02251007 = score(doc=2663,freq=2.0), product of:
              0.073502734 = queryWeight, product of:
                1.4125085 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015018761 = queryNorm
              0.30624807 = fieldWeight in 2663, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=2663)
          0.030396849 = weight(abstract_txt:using in 2663) [ClassicSimilarity], result of:
            0.030396849 = score(doc=2663,freq=2.0), product of:
              0.09883655 = queryWeight, product of:
                1.8913307 = boost
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.015018761 = queryNorm
              0.30754665 = fieldWeight in 2663, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.4794931 = idf(docFreq=3580, maxDocs=42740)
                0.0625 = fieldNorm(doc=2663)
          0.2080767 = weight(abstract_txt:expression in 2663) [ClassicSimilarity], result of:
            0.2080767 = score(doc=2663,freq=2.0), product of:
              0.35632598 = queryWeight, product of:
                3.591141 = boost
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.015018761 = queryNorm
              0.5839504 = fieldWeight in 2663, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.0625 = fieldNorm(doc=2663)
          0.5482859 = weight(abstract_txt:expressions in 2663) [ClassicSimilarity], result of:
            0.5482859 = score(doc=2663,freq=5.0), product of:
              0.57335067 = queryWeight, product of:
                5.5791044 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.015018761 = queryNorm
              0.9562837 = fieldWeight in 2663, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.0625 = fieldNorm(doc=2663)
        0.16 = coord(4/25)
    
  4. Eerola, J.; Vakkari, P.: How a general and a specific thesaurus cover expressions in patients' questions and physicians' answers (2008) 0.13
    0.12788545 = sum of:
      0.12788545 = product of:
        0.7992841 = sum of:
          0.067374475 = weight(abstract_txt:matched in 3733) [ClassicSimilarity], result of:
            0.067374475 = score(doc=3733,freq=1.0), product of:
              0.11492437 = queryWeight, product of:
                1.019729 = boost
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.015018761 = queryNorm
              0.58625054 = fieldWeight in 3733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.5040073 = idf(docFreq=63, maxDocs=42740)
                0.078125 = fieldNorm(doc=3733)
          0.017118555 = weight(abstract_txt:approach in 3733) [ClassicSimilarity], result of:
            0.017118555 = score(doc=3733,freq=1.0), product of:
              0.058085795 = queryWeight, product of:
                1.0252473 = boost
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.015018761 = queryNorm
              0.29471156 = fieldWeight in 3733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.772308 = idf(docFreq=2671, maxDocs=42740)
                0.078125 = fieldNorm(doc=3733)
          0.18391556 = weight(abstract_txt:expression in 3733) [ClassicSimilarity], result of:
            0.18391556 = score(doc=3733,freq=1.0), product of:
              0.35632598 = queryWeight, product of:
                3.591141 = boost
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.015018761 = queryNorm
              0.5161441 = fieldWeight in 3733, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.078125 = fieldNorm(doc=3733)
          0.5308755 = weight(abstract_txt:expressions in 3733) [ClassicSimilarity], result of:
            0.5308755 = score(doc=3733,freq=3.0), product of:
              0.57335067 = queryWeight, product of:
                5.5791044 = boost
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.015018761 = queryNorm
              0.9259177 = fieldWeight in 3733, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                6.842609 = idf(docFreq=123, maxDocs=42740)
                0.078125 = fieldNorm(doc=3733)
        0.16 = coord(4/25)
    
  5. Corridoni, J.M.; Bimbo, A. del; Vicario, E.: Image retrieval by color semantics with incomplete knowledge (1998) 0.13
    0.1251645 = sum of:
      0.1251645 = product of:
        0.5215187 = sum of:
          0.04749336 = weight(abstract_txt:level in 1595) [ClassicSimilarity], result of:
            0.04749336 = score(doc=1595,freq=4.0), product of:
              0.08383663 = queryWeight, product of:
                1.231716 = boost
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.015018761 = queryNorm
              0.56649894 = fieldWeight in 1595, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                4.5319915 = idf(docFreq=1249, maxDocs=42740)
                0.0625 = fieldNorm(doc=1595)
          0.027569091 = weight(abstract_txt:retrieval in 1595) [ClassicSimilarity], result of:
            0.027569091 = score(doc=1595,freq=3.0), product of:
              0.073502734 = queryWeight, product of:
                1.4125085 = boost
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.015018761 = queryNorm
              0.37507573 = fieldWeight in 1595, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4648013 = idf(docFreq=3633, maxDocs=42740)
                0.0625 = fieldNorm(doc=1595)
          0.042829305 = weight(abstract_txt:precision in 1595) [ClassicSimilarity], result of:
            0.042829305 = score(doc=1595,freq=1.0), product of:
              0.12422029 = queryWeight, product of:
                1.4993049 = boost
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.015018761 = queryNorm
              0.3447851 = fieldWeight in 1595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.5165615 = idf(docFreq=466, maxDocs=42740)
                0.0625 = fieldNorm(doc=1595)
          0.050440352 = weight(abstract_txt:queries in 1595) [ClassicSimilarity], result of:
            0.050440352 = score(doc=1595,freq=1.0), product of:
              0.15857974 = queryWeight, product of:
                2.0747375 = boost
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.015018761 = queryNorm
              0.31807566 = fieldWeight in 1595, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.0892105 = idf(docFreq=715, maxDocs=42740)
                0.0625 = fieldNorm(doc=1595)
          0.1451099 = weight(abstract_txt:image in 1595) [ClassicSimilarity], result of:
            0.1451099 = score(doc=1595,freq=6.0), product of:
              0.17652649 = queryWeight, product of:
                2.1889925 = boost
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.015018761 = queryNorm
              0.82202905 = fieldWeight in 1595, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.369471 = idf(docFreq=540, maxDocs=42740)
                0.0625 = fieldNorm(doc=1595)
          0.2080767 = weight(abstract_txt:expression in 1595) [ClassicSimilarity], result of:
            0.2080767 = score(doc=1595,freq=2.0), product of:
              0.35632598 = queryWeight, product of:
                3.591141 = boost
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.015018761 = queryNorm
              0.5839504 = fieldWeight in 1595, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.6066446 = idf(docFreq=156, maxDocs=42740)
                0.0625 = fieldNorm(doc=1595)
        0.24 = coord(6/25)