Document (#36630)

Author
Ku, C.-H.
Leroy, G.
Title
¬A crime reports analysis system to identify related crimes
Source
Journal of the American Society for Information Science and Technology. 62(2011) no.8, S.1533-1547
Year
2011
Abstract
The popularity of online and anonymous options to report crimes, such as tips websites and text messaging, has led to an increasing amount of textual information available to law enforcement personnel. However, locating, filtering, extracting, and combining information to solve crimes is a time-consuming task. In response, we are developing entity and document similarity algorithms to automatically identify overlapping and complementary information. These are essential components for systems that combine and contrast crime information. The entity similarity algorithm integrates a domain-specific hierarchical lexicon with Jaccard coefficients. The document similarity algorithm combines the entity similarity scores using a Dice coefficient. We describe the evaluation of both components. To evaluate the entity similarity algorithm, we compared the new algorithm and four generic algorithms with a gold standard. The strongest correlation with the gold standard, r = 0.710, was found with our entity similarity algorithm. To evaluate the document similarity algorithm, we first developed a test bed containing witness reports for 17 crimes shown in video clips. We evaluated five versions of the algorithm that differ in how much importance is assigned to different entity types. Cosine similarity is then used as a baseline comparison to evaluate the performance of the document similarity algorithms for accuracy in recognizing reports describing the same crime and distinguishing them from reports on different crimes. The best version achieved 92% accuracy.

Similar documents (author)

  1. Leroy, G.; Chen, H.: Genescene: an ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts (2005) 4.88
    4.8754888 = sum of:
      4.8754888 = weight(author_txt:leroy in 5259) [ClassicSimilarity], result of:
        4.8754888 = fieldWeight in 5259, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.5 = fieldNorm(doc=5259)
    
  2. Leroy, S.Y.; Thomas, S.L.: Impact of Web access on cataloging (2004) 4.88
    4.8754888 = sum of:
      4.8754888 = weight(author_txt:leroy in 5656) [ClassicSimilarity], result of:
        4.8754888 = fieldWeight in 5656, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.5 = fieldNorm(doc=5656)
    
  3. Kauchak, D.; Leroy, G.; Hogue, A.: Measuring text difficulty using parse-tree frequency (2017) 3.66
    3.6566167 = sum of:
      3.6566167 = weight(author_txt:leroy in 3786) [ClassicSimilarity], result of:
        3.6566167 = fieldWeight in 3786, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.375 = fieldNorm(doc=3786)
    
  4. Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 3.05
    3.0471804 = sum of:
      3.0471804 = weight(author_txt:leroy in 1998) [ClassicSimilarity], result of:
        3.0471804 = fieldWeight in 1998, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.3125 = fieldNorm(doc=1998)
    
  5. Thirion, B.; Leroy, J.P.; Baudic, F.; Douyère, M.; Piot, J.; Darmoni, S.J.: SDI selecting, decribing, and indexing : did you mean automatically? (2001) 2.44
    2.4377444 = sum of:
      2.4377444 = weight(author_txt:leroy in 6198) [ClassicSimilarity], result of:
        2.4377444 = fieldWeight in 6198, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.7509775 = idf(docFreq=6, maxDocs=44218)
          0.25 = fieldNorm(doc=6198)
    

Similar documents (content)

  1. Ellis, D.; Furner-Hines, J.; Willett, P.: Measuring the degree of similarity between objects in text retrieval systems (1993) 0.19
    0.19180396 = sum of:
      0.19180396 = product of:
        0.9590198 = sum of:
          0.16548449 = weight(abstract_txt:coefficients in 6716) [ClassicSimilarity], result of:
            0.16548449 = score(doc=6716,freq=5.0), product of:
              0.11324857 = queryWeight, product of:
                1.0445398 = boost
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.012961589 = queryNorm
              1.4612501 = fieldWeight in 6716, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.364683 = idf(docFreq=27, maxDocs=44218)
                0.078125 = fieldNorm(doc=6716)
          0.007176939 = weight(abstract_txt:information in 6716) [ClassicSimilarity], result of:
            0.007176939 = score(doc=6716,freq=1.0), product of:
              0.03794583 = queryWeight, product of:
                1.2092627 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012961589 = queryNorm
              0.18913643 = fieldWeight in 6716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.078125 = fieldNorm(doc=6716)
          0.007900686 = weight(abstract_txt:with in 6716) [ClassicSimilarity], result of:
            0.007900686 = score(doc=6716,freq=1.0), product of:
              0.04045583 = queryWeight, product of:
                1.2486168 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012961589 = queryNorm
              0.19529167 = fieldWeight in 6716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.078125 = fieldNorm(doc=6716)
          0.069296 = weight(abstract_txt:document in 6716) [ClassicSimilarity], result of:
            0.069296 = score(doc=6716,freq=3.0), product of:
              0.11929885 = queryWeight, product of:
                2.144158 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012961589 = queryNorm
              0.5808606 = fieldWeight in 6716, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.078125 = fieldNorm(doc=6716)
          0.7091617 = weight(abstract_txt:similarity in 6716) [ClassicSimilarity], result of:
            0.7091617 = score(doc=6716,freq=10.0), product of:
              0.49328235 = queryWeight, product of:
                6.539999 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.012961589 = queryNorm
              1.4376385 = fieldWeight in 6716, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.078125 = fieldNorm(doc=6716)
        0.2 = coord(5/25)
    
  2. Hook, P.A.: Using course-subject Co-occurrence (CSCO) to reveal the structure of an academic discipline : a framework to evaluate different inputs of a domain map (2017) 0.18
    0.18329738 = sum of:
      0.18329738 = product of:
        0.6546335 = sum of:
          0.021345658 = weight(abstract_txt:standard in 3324) [ClassicSimilarity], result of:
            0.021345658 = score(doc=3324,freq=1.0), product of:
              0.07227825 = queryWeight, product of:
                1.1801234 = boost
                4.725219 = idf(docFreq=1065, maxDocs=44218)
                0.012961589 = queryNorm
              0.29532617 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.725219 = idf(docFreq=1065, maxDocs=44218)
                0.0625 = fieldNorm(doc=3324)
          0.006320549 = weight(abstract_txt:with in 3324) [ClassicSimilarity], result of:
            0.006320549 = score(doc=3324,freq=1.0), product of:
              0.04045583 = queryWeight, product of:
                1.2486168 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012961589 = queryNorm
              0.15623334 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=3324)
          0.13937743 = weight(abstract_txt:gold in 3324) [ClassicSimilarity], result of:
            0.13937743 = score(doc=3324,freq=2.0), product of:
              0.20041007 = queryWeight, product of:
                1.9650943 = boost
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.012961589 = queryNorm
              0.6954612 = fieldWeight in 3324, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.8682456 = idf(docFreq=45, maxDocs=44218)
                0.0625 = fieldNorm(doc=3324)
          0.045919675 = weight(abstract_txt:evaluate in 3324) [ClassicSimilarity], result of:
            0.045919675 = score(doc=3324,freq=1.0), product of:
              0.1378788 = queryWeight, product of:
                1.9962642 = boost
                5.3287 = idf(docFreq=582, maxDocs=44218)
                0.012961589 = queryNorm
              0.33304375 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3287 = idf(docFreq=582, maxDocs=44218)
                0.0625 = fieldNorm(doc=3324)
          0.056437783 = weight(abstract_txt:algorithms in 3324) [ClassicSimilarity], result of:
            0.056437783 = score(doc=3324,freq=1.0), product of:
              0.15820187 = queryWeight, product of:
                2.1383317 = boost
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.012961589 = queryNorm
              0.35674536 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.707926 = idf(docFreq=398, maxDocs=44218)
                0.0625 = fieldNorm(doc=3324)
          0.13151501 = weight(abstract_txt:algorithm in 3324) [ClassicSimilarity], result of:
            0.13151501 = score(doc=3324,freq=1.0), product of:
              0.36881408 = queryWeight, product of:
                4.987253 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.012961589 = queryNorm
              0.35658893 = fieldWeight in 3324, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=3324)
          0.2537174 = weight(abstract_txt:similarity in 3324) [ClassicSimilarity], result of:
            0.2537174 = score(doc=3324,freq=2.0), product of:
              0.49328235 = queryWeight, product of:
                6.539999 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.012961589 = queryNorm
              0.51434517 = fieldWeight in 3324, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=3324)
        0.28 = coord(7/25)
    
  3. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.16
    0.16296831 = sum of:
      0.16296831 = product of:
        0.67903465 = sum of:
          0.011483102 = weight(abstract_txt:information in 3237) [ClassicSimilarity], result of:
            0.011483102 = score(doc=3237,freq=4.0), product of:
              0.03794583 = queryWeight, product of:
                1.2092627 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012961589 = queryNorm
              0.3026183 = fieldWeight in 3237, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=3237)
          0.024372842 = weight(abstract_txt:identify in 3237) [ClassicSimilarity], result of:
            0.024372842 = score(doc=3237,freq=1.0), product of:
              0.07895967 = queryWeight, product of:
                1.2334635 = boost
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.012961589 = queryNorm
              0.30867454 = fieldWeight in 3237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.9387927 = idf(docFreq=860, maxDocs=44218)
                0.0625 = fieldNorm(doc=3237)
          0.008938607 = weight(abstract_txt:with in 3237) [ClassicSimilarity], result of:
            0.008938607 = score(doc=3237,freq=2.0), product of:
              0.04045583 = queryWeight, product of:
                1.2486168 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012961589 = queryNorm
              0.22094731 = fieldWeight in 3237, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=3237)
          0.055171326 = weight(abstract_txt:reports in 3237) [ClassicSimilarity], result of:
            0.055171326 = score(doc=3237,freq=2.0), product of:
              0.13612677 = queryWeight, product of:
                2.290395 = boost
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.012961589 = queryNorm
              0.40529373 = fieldWeight in 3237, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.0625 = fieldNorm(doc=3237)
          0.23111273 = weight(abstract_txt:crime in 3237) [ClassicSimilarity], result of:
            0.23111273 = score(doc=3237,freq=1.0), product of:
              0.4049309 = queryWeight, product of:
                3.4210522 = boost
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.012961589 = queryNorm
              0.5707461 = fieldWeight in 3237, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.131938 = idf(docFreq=12, maxDocs=44218)
                0.0625 = fieldNorm(doc=3237)
          0.34795603 = weight(abstract_txt:algorithm in 3237) [ClassicSimilarity], result of:
            0.34795603 = score(doc=3237,freq=7.0), product of:
              0.36881408 = queryWeight, product of:
                4.987253 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.012961589 = queryNorm
              0.9434456 = fieldWeight in 3237, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=3237)
        0.24 = coord(6/25)
    
  4. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.16
    0.1619282 = sum of:
      0.1619282 = product of:
        0.67470086 = sum of:
          0.009944658 = weight(abstract_txt:information in 664) [ClassicSimilarity], result of:
            0.009944658 = score(doc=664,freq=3.0), product of:
              0.03794583 = queryWeight, product of:
                1.2092627 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012961589 = queryNorm
              0.26207513 = fieldWeight in 664, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=664)
          0.006320549 = weight(abstract_txt:with in 664) [ClassicSimilarity], result of:
            0.006320549 = score(doc=664,freq=1.0), product of:
              0.04045583 = queryWeight, product of:
                1.2486168 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012961589 = queryNorm
              0.15623334 = fieldWeight in 664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=664)
          0.039012022 = weight(abstract_txt:reports in 664) [ClassicSimilarity], result of:
            0.039012022 = score(doc=664,freq=1.0), product of:
              0.13612677 = queryWeight, product of:
                2.290395 = boost
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.012961589 = queryNorm
              0.28658596 = fieldWeight in 664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5853753 = idf(docFreq=1225, maxDocs=44218)
                0.0625 = fieldNorm(doc=664)
          0.21222766 = weight(abstract_txt:entity in 664) [ClassicSimilarity], result of:
            0.21222766 = score(doc=664,freq=2.0), product of:
              0.38256007 = queryWeight, product of:
                4.702557 = boost
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.012961589 = queryNorm
              0.5547564 = fieldWeight in 664, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.2763524 = idf(docFreq=225, maxDocs=44218)
                0.0625 = fieldNorm(doc=664)
          0.22779068 = weight(abstract_txt:algorithm in 664) [ClassicSimilarity], result of:
            0.22779068 = score(doc=664,freq=3.0), product of:
              0.36881408 = queryWeight, product of:
                4.987253 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.012961589 = queryNorm
              0.6176301 = fieldWeight in 664, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=664)
          0.17940529 = weight(abstract_txt:similarity in 664) [ClassicSimilarity], result of:
            0.17940529 = score(doc=664,freq=1.0), product of:
              0.49328235 = queryWeight, product of:
                6.539999 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.012961589 = queryNorm
              0.36369696 = fieldWeight in 664, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=664)
        0.24 = coord(6/25)
    
  5. Chinenyanga, T.T.; Kushmerick, N.: ¬An expressive and efficient language for XML information retrieval (2002) 0.16
    0.16014604 = sum of:
      0.16014604 = product of:
        0.6672752 = sum of:
          0.005741551 = weight(abstract_txt:information in 462) [ClassicSimilarity], result of:
            0.005741551 = score(doc=462,freq=1.0), product of:
              0.03794583 = queryWeight, product of:
                1.2092627 = boost
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.012961589 = queryNorm
              0.15130915 = fieldWeight in 462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.4209464 = idf(docFreq=10677, maxDocs=44218)
                0.0625 = fieldNorm(doc=462)
          0.012641098 = weight(abstract_txt:with in 462) [ClassicSimilarity], result of:
            0.012641098 = score(doc=462,freq=4.0), product of:
              0.04045583 = queryWeight, product of:
                1.2486168 = boost
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.012961589 = queryNorm
              0.31246668 = fieldWeight in 462, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.4997334 = idf(docFreq=9868, maxDocs=44218)
                0.0625 = fieldNorm(doc=462)
          0.045919675 = weight(abstract_txt:evaluate in 462) [ClassicSimilarity], result of:
            0.045919675 = score(doc=462,freq=1.0), product of:
              0.1378788 = queryWeight, product of:
                1.9962642 = boost
                5.3287 = idf(docFreq=582, maxDocs=44218)
                0.012961589 = queryNorm
              0.33304375 = fieldWeight in 462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.3287 = idf(docFreq=582, maxDocs=44218)
                0.0625 = fieldNorm(doc=462)
          0.032006454 = weight(abstract_txt:document in 462) [ClassicSimilarity], result of:
            0.032006454 = score(doc=462,freq=1.0), product of:
              0.11929885 = queryWeight, product of:
                2.144158 = boost
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.012961589 = queryNorm
              0.26828802 = fieldWeight in 462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.2926083 = idf(docFreq=1642, maxDocs=44218)
                0.0625 = fieldNorm(doc=462)
          0.13151501 = weight(abstract_txt:algorithm in 462) [ClassicSimilarity], result of:
            0.13151501 = score(doc=462,freq=1.0), product of:
              0.36881408 = queryWeight, product of:
                4.987253 = boost
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.012961589 = queryNorm
              0.35658893 = fieldWeight in 462, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.705423 = idf(docFreq=399, maxDocs=44218)
                0.0625 = fieldNorm(doc=462)
          0.43945143 = weight(abstract_txt:similarity in 462) [ClassicSimilarity], result of:
            0.43945143 = score(doc=462,freq=6.0), product of:
              0.49328235 = queryWeight, product of:
                6.539999 = boost
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.012961589 = queryNorm
              0.890872 = fieldWeight in 462, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8191514 = idf(docFreq=356, maxDocs=44218)
                0.0625 = fieldNorm(doc=462)
        0.24 = coord(6/25)