Document (#33008)

Author
O'Kane, K.C.
Lockner, M.J.
Title
Indexing genomic sequence libraries
Source
Information processing and management. 41(2005) no.2, S.265-274
Year
2005
Abstract
This paper describes an extensible, open-source (GPL) data repository and retrieval system that supports fast, efficient, keyword based retrieval of genomic sequences from multiple libraries with retrieved sequences post-processed by FASTA, Smith-Waterman and other analysis software. This application is implemented for Linux and is written in Mumps, C, and C++ with supporting components that include the Berkeley Data Base, the Perl Compatible Regular Expression Library, GLADE, and tools such as FASTA, Smith-Waterman, and modules from EMBOSS. The package described here can quickly index data sets of up to 256 terabytes using a B-tree based multi-dimensional data model. An example is presented that indexes the text of the full NCBI Genbank library.

Similar documents (content)

  1. Shachak, A.: Diffusion pattern of the use of genomic databases and analysis of biological sequences from 1970-2003 : bibliographic record analysis of 12 journals (2006) 0.16
    0.16419442 = sum of:
      0.16419442 = product of:
        0.8209721 = sum of:
          0.05674444 = weight(abstract_txt:sequence in 904) [ClassicSimilarity], result of:
            0.05674444 = score(doc=904,freq=1.0), product of:
              0.1329343 = queryWeight, product of:
                1.0710033 = boost
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.018173559 = queryNorm
              0.42686078 = fieldWeight in 904, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.0625 = fieldNorm(doc=904)
          0.010209612 = weight(abstract_txt:that in 904) [ClassicSimilarity], result of:
            0.010209612 = score(doc=904,freq=2.0), product of:
              0.0484981 = queryWeight, product of:
                1.1204573 = boost
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.018173559 = queryNorm
              0.2105157 = fieldWeight in 904, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.0625 = fieldNorm(doc=904)
          0.03801059 = weight(abstract_txt:data in 904) [ClassicSimilarity], result of:
            0.03801059 = score(doc=904,freq=2.0), product of:
              0.12822358 = queryWeight, product of:
                2.1037118 = boost
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.018173559 = queryNorm
              0.29643992 = fieldWeight in 904, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.0625 = fieldNorm(doc=904)
          0.20691687 = weight(abstract_txt:sequences in 904) [ClassicSimilarity], result of:
            0.20691687 = score(doc=904,freq=2.0), product of:
              0.31493345 = queryWeight, product of:
                2.3312922 = boost
                7.4333076 = idf(docFreq=69, maxDocs=43556)
                0.018173559 = queryNorm
              0.65701777 = fieldWeight in 904, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.4333076 = idf(docFreq=69, maxDocs=43556)
                0.0625 = fieldNorm(doc=904)
          0.5090906 = weight(abstract_txt:genomic in 904) [ClassicSimilarity], result of:
            0.5090906 = score(doc=904,freq=3.0), product of:
              0.50140405 = queryWeight, product of:
                2.9415839 = boost
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.018173559 = queryNorm
              1.0153301 = fieldWeight in 904, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.0625 = fieldNorm(doc=904)
        0.2 = coord(5/25)
    
  2. Trotman, A.: Searching structured documents (2004) 0.11
    0.110381775 = sum of:
      0.110381775 = product of:
        0.55190885 = sum of:
          0.07117518 = weight(abstract_txt:tree in 3536) [ClassicSimilarity], result of:
            0.07117518 = score(doc=3536,freq=1.0), product of:
              0.13323978 = queryWeight, product of:
                1.0722332 = boost
                6.8376155 = idf(docFreq=126, maxDocs=43556)
                0.018173559 = queryNorm
              0.5341887 = fieldWeight in 3536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8376155 = idf(docFreq=126, maxDocs=43556)
                0.078125 = fieldNorm(doc=3536)
          0.03233259 = weight(abstract_txt:retrieval in 3536) [ClassicSimilarity], result of:
            0.03233259 = score(doc=3536,freq=3.0), product of:
              0.06878252 = queryWeight, product of:
                1.0894978 = boost
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.018173559 = queryNorm
              0.4700699 = fieldWeight in 3536, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.078125 = fieldNorm(doc=3536)
          0.009024107 = weight(abstract_txt:that in 3536) [ClassicSimilarity], result of:
            0.009024107 = score(doc=3536,freq=1.0), product of:
              0.0484981 = queryWeight, product of:
                1.1204573 = boost
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.018173559 = queryNorm
              0.18607134 = fieldWeight in 3536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.078125 = fieldNorm(doc=3536)
          0.033596933 = weight(abstract_txt:data in 3536) [ClassicSimilarity], result of:
            0.033596933 = score(doc=3536,freq=1.0), product of:
              0.12822358 = queryWeight, product of:
                2.1037118 = boost
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.018173559 = queryNorm
              0.26201835 = fieldWeight in 3536, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.078125 = fieldNorm(doc=3536)
          0.40578005 = weight(abstract_txt:smith in 3536) [ClassicSimilarity], result of:
            0.40578005 = score(doc=3536,freq=2.0), product of:
              0.42521507 = queryWeight, product of:
                2.7088916 = boost
                8.63728 = idf(docFreq=20, maxDocs=43556)
                0.018173559 = queryNorm
              0.95429367 = fieldWeight in 3536, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                8.63728 = idf(docFreq=20, maxDocs=43556)
                0.078125 = fieldNorm(doc=3536)
        0.2 = coord(5/25)
    
  3. Rapp, B.A.; Wheeler, D.L.: Bioinformatics resources from the National Center for Biotechnology Information : an integrated foundation for discovery (2005) 0.11
    0.10748391 = sum of:
      0.10748391 = product of:
        0.44784963 = sum of:
          0.05093025 = weight(abstract_txt:expression in 263) [ClassicSimilarity], result of:
            0.05093025 = score(doc=263,freq=1.0), product of:
              0.12369118 = queryWeight, product of:
                1.0330983 = boost
                6.5880527 = idf(docFreq=162, maxDocs=43556)
                0.018173559 = queryNorm
              0.4117533 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5880527 = idf(docFreq=162, maxDocs=43556)
                0.0625 = fieldNorm(doc=263)
          0.0510731 = weight(abstract_txt:repository in 263) [ClassicSimilarity], result of:
            0.0510731 = score(doc=263,freq=1.0), product of:
              0.123922355 = queryWeight, product of:
                1.0340633 = boost
                6.5942063 = idf(docFreq=161, maxDocs=43556)
                0.018173559 = queryNorm
              0.4121379 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.5942063 = idf(docFreq=161, maxDocs=43556)
                0.0625 = fieldNorm(doc=263)
          0.11348888 = weight(abstract_txt:sequence in 263) [ClassicSimilarity], result of:
            0.11348888 = score(doc=263,freq=4.0), product of:
              0.1329343 = queryWeight, product of:
                1.0710033 = boost
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.018173559 = queryNorm
              0.85372156 = fieldWeight in 263, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.0625 = fieldNorm(doc=263)
          0.014933785 = weight(abstract_txt:retrieval in 263) [ClassicSimilarity], result of:
            0.014933785 = score(doc=263,freq=1.0), product of:
              0.06878252 = queryWeight, product of:
                1.0894978 = boost
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.018173559 = queryNorm
              0.21711598 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.0625 = fieldNorm(doc=263)
          0.0711113 = weight(abstract_txt:data in 263) [ClassicSimilarity], result of:
            0.0711113 = score(doc=263,freq=7.0), product of:
              0.12822358 = queryWeight, product of:
                2.1037118 = boost
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.018173559 = queryNorm
              0.5545883 = fieldWeight in 263, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.0625 = fieldNorm(doc=263)
          0.14631233 = weight(abstract_txt:sequences in 263) [ClassicSimilarity], result of:
            0.14631233 = score(doc=263,freq=1.0), product of:
              0.31493345 = queryWeight, product of:
                2.3312922 = boost
                7.4333076 = idf(docFreq=69, maxDocs=43556)
                0.018173559 = queryNorm
              0.46458173 = fieldWeight in 263, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4333076 = idf(docFreq=69, maxDocs=43556)
                0.0625 = fieldNorm(doc=263)
        0.24 = coord(6/25)
    
  4. Michon, J.: Biomedicine and the Semantic Web : a knowledge model for visual phenotype (2006) 0.10
    0.09941678 = sum of:
      0.09941678 = product of:
        0.41423658 = sum of:
          0.011547576 = weight(abstract_txt:library in 1369) [ClassicSimilarity], result of:
            0.011547576 = score(doc=1369,freq=1.0), product of:
              0.057946257 = queryWeight, product of:
                3.1884925 = idf(docFreq=4881, maxDocs=43556)
                0.018173559 = queryNorm
              0.19928078 = fieldWeight in 1369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.1884925 = idf(docFreq=4881, maxDocs=43556)
                0.0625 = fieldNorm(doc=1369)
          0.05674444 = weight(abstract_txt:sequence in 1369) [ClassicSimilarity], result of:
            0.05674444 = score(doc=1369,freq=1.0), product of:
              0.1329343 = queryWeight, product of:
                1.0710033 = boost
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.018173559 = queryNorm
              0.42686078 = fieldWeight in 1369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.0625 = fieldNorm(doc=1369)
          0.014933785 = weight(abstract_txt:retrieval in 1369) [ClassicSimilarity], result of:
            0.014933785 = score(doc=1369,freq=1.0), product of:
              0.06878252 = queryWeight, product of:
                1.0894978 = boost
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.018173559 = queryNorm
              0.21711598 = fieldWeight in 1369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.4738557 = idf(docFreq=3669, maxDocs=43556)
                0.0625 = fieldNorm(doc=1369)
          0.010209612 = weight(abstract_txt:that in 1369) [ClassicSimilarity], result of:
            0.010209612 = score(doc=1369,freq=2.0), product of:
              0.0484981 = queryWeight, product of:
                1.1204573 = boost
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.018173559 = queryNorm
              0.2105157 = fieldWeight in 1369, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.0625 = fieldNorm(doc=1369)
          0.026877545 = weight(abstract_txt:data in 1369) [ClassicSimilarity], result of:
            0.026877545 = score(doc=1369,freq=1.0), product of:
              0.12822358 = queryWeight, product of:
                2.1037118 = boost
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.018173559 = queryNorm
              0.20961468 = fieldWeight in 1369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.0625 = fieldNorm(doc=1369)
          0.29392362 = weight(abstract_txt:genomic in 1369) [ClassicSimilarity], result of:
            0.29392362 = score(doc=1369,freq=1.0), product of:
              0.50140405 = queryWeight, product of:
                2.9415839 = boost
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.018173559 = queryNorm
              0.58620113 = fieldWeight in 1369, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.379218 = idf(docFreq=9, maxDocs=43556)
                0.0625 = fieldNorm(doc=1369)
        0.24 = coord(6/25)
    
  5. Tsai, R.T.-H.; Chiu, B.; Wu, C.-E.: Visual webpage block importance prediction using conditional random fields (2011) 0.09
    0.09471325 = sum of:
      0.09471325 = product of:
        0.39463854 = sum of:
          0.020214958 = weight(abstract_txt:based in 1922) [ClassicSimilarity], result of:
            0.020214958 = score(doc=1922,freq=3.0), product of:
              0.058358796 = queryWeight, product of:
                1.0035534 = boost
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.018173559 = queryNorm
              0.34639093 = fieldWeight in 1922, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                3.1998224 = idf(docFreq=4826, maxDocs=43556)
                0.0625 = fieldNorm(doc=1922)
          0.11348888 = weight(abstract_txt:sequence in 1922) [ClassicSimilarity], result of:
            0.11348888 = score(doc=1922,freq=4.0), product of:
              0.1329343 = queryWeight, product of:
                1.0710033 = boost
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.018173559 = queryNorm
              0.85372156 = fieldWeight in 1922, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                6.8297725 = idf(docFreq=127, maxDocs=43556)
                0.0625 = fieldNorm(doc=1922)
          0.080525525 = weight(abstract_txt:tree in 1922) [ClassicSimilarity], result of:
            0.080525525 = score(doc=1922,freq=2.0), product of:
              0.13323978 = queryWeight, product of:
                1.0722332 = boost
                6.8376155 = idf(docFreq=126, maxDocs=43556)
                0.018173559 = queryNorm
              0.6043655 = fieldWeight in 1922, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.8376155 = idf(docFreq=126, maxDocs=43556)
                0.0625 = fieldNorm(doc=1922)
          0.0072192852 = weight(abstract_txt:that in 1922) [ClassicSimilarity], result of:
            0.0072192852 = score(doc=1922,freq=1.0), product of:
              0.0484981 = queryWeight, product of:
                1.1204573 = boost
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.018173559 = queryNorm
              0.14885707 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.3817132 = idf(docFreq=10938, maxDocs=43556)
                0.0625 = fieldNorm(doc=1922)
          0.026877545 = weight(abstract_txt:data in 1922) [ClassicSimilarity], result of:
            0.026877545 = score(doc=1922,freq=1.0), product of:
              0.12822358 = queryWeight, product of:
                2.1037118 = boost
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.018173559 = queryNorm
              0.20961468 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                3.3538349 = idf(docFreq=4137, maxDocs=43556)
                0.0625 = fieldNorm(doc=1922)
          0.14631233 = weight(abstract_txt:sequences in 1922) [ClassicSimilarity], result of:
            0.14631233 = score(doc=1922,freq=1.0), product of:
              0.31493345 = queryWeight, product of:
                2.3312922 = boost
                7.4333076 = idf(docFreq=69, maxDocs=43556)
                0.018173559 = queryNorm
              0.46458173 = fieldWeight in 1922, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                7.4333076 = idf(docFreq=69, maxDocs=43556)
                0.0625 = fieldNorm(doc=1922)
        0.24 = coord(6/25)