Document (#36631)

Author
Ku, C.-H.
Leroy, G.
Title
¬A crime reports analysis system to identify related crimes
Source
Journal of the American Society for Information Science and Technology. 62(2011) no.8, S.1533-1547
Year
2011
Abstract
The popularity of online and anonymous options to report crimes, such as tips websites and text messaging, has led to an increasing amount of textual information available to law enforcement personnel. However, locating, filtering, extracting, and combining information to solve crimes is a time-consuming task. In response, we are developing entity and document similarity algorithms to automatically identify overlapping and complementary information. These are essential components for systems that combine and contrast crime information. The entity similarity algorithm integrates a domain-specific hierarchical lexicon with Jaccard coefficients. The document similarity algorithm combines the entity similarity scores using a Dice coefficient. We describe the evaluation of both components. To evaluate the entity similarity algorithm, we compared the new algorithm and four generic algorithms with a gold standard. The strongest correlation with the gold standard, r = 0.710, was found with our entity similarity algorithm. To evaluate the document similarity algorithm, we first developed a test bed containing witness reports for 17 crimes shown in video clips. We evaluated five versions of the algorithm that differ in how much importance is assigned to different entity types. Cosine similarity is then used as a baseline comparison to evaluate the performance of the document similarity algorithms for accuracy in recognizing reports describing the same crime and distinguishing them from reports on different crimes. The best version achieved 92% accuracy.

Similar documents (author)

  1. Leroy, G.; Chen, H.: Genescene: an ontology-enhanced integration of linguistic and co-occurrence based relations in biomedical texts (2005) 4.86
    4.85849 = sum of:
      4.85849 = weight(author_txt:leroy in 260) [ClassicSimilarity], result of:
        4.85849 = fieldWeight in 260, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.5 = fieldNorm(doc=260)
    
  2. Leroy, S.Y.; Thomas, S.L.: Impact of Web access on cataloging (2004) 4.86
    4.85849 = sum of:
      4.85849 = weight(author_txt:leroy in 657) [ClassicSimilarity], result of:
        4.85849 = fieldWeight in 657, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.5 = fieldNorm(doc=657)
    
  3. Kauchak, D.; Leroy, G.; Hogue, A.: Measuring text difficulty using parse-tree frequency (2017) 3.64
    3.6438675 = sum of:
      3.6438675 = weight(author_txt:leroy in 5787) [ClassicSimilarity], result of:
        3.6438675 = fieldWeight in 5787, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.375 = fieldNorm(doc=5787)
    
  4. Leroy, G.; Miller, T.; Rosemblat, G.; Browne, A.: ¬A balanced approach to health information evaluation : a vocabulary-based naïve Bayes classifier and readability formulas (2008) 3.04
    3.0365562 = sum of:
      3.0365562 = weight(author_txt:leroy in 3999) [ClassicSimilarity], result of:
        3.0365562 = fieldWeight in 3999, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.3125 = fieldNorm(doc=3999)
    
  5. Thirion, B.; Leroy, J.P.; Baudic, F.; Douyère, M.; Piot, J.; Darmoni, S.J.: SDI selecting, decribing, and indexing : did you mean automatically? (2001) 2.43
    2.429245 = sum of:
      2.429245 = weight(author_txt:leroy in 199) [ClassicSimilarity], result of:
        2.429245 = fieldWeight in 199, product of:
          1.0 = tf(freq=1.0), with freq of:
            1.0 = termFreq=1.0
          9.71698 = idf(docFreq=6, maxDocs=42740)
          0.25 = fieldNorm(doc=199)
    

Similar documents (content)

  1. Ellis, D.; Furner-Hines, J.; Willett, P.: Measuring the degree of similarity between objects in text retrieval systems (1993) 0.19
    0.19031687 = sum of:
      0.19031687 = product of:
        0.95158434 = sum of:
          0.1629498 = weight(abstract_txt:coefficients in 6716) [ClassicSimilarity], result of:
            0.1629498 = score(doc=6716,freq=5.0), product of:
              0.11196907 = queryWeight, product of:
                1.0380522 = boost
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.012947863 = queryNorm
              1.4553108 = fieldWeight in 6716, product of:
                2.236068 = tf(freq=5.0), with freq of:
                  5.0 = termFreq=5.0
                8.330686 = idf(docFreq=27, maxDocs=42740)
                0.078125 = fieldNorm(doc=6716)
          0.0072353766 = weight(abstract_txt:information in 6716) [ClassicSimilarity], result of:
            0.0072353766 = score(doc=6716,freq=1.0), product of:
              0.03811064 = queryWeight, product of:
                1.2112207 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.012947863 = queryNorm
              0.18985188 = fieldWeight in 6716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.078125 = fieldNorm(doc=6716)
          0.008045596 = weight(abstract_txt:with in 6716) [ClassicSimilarity], result of:
            0.008045596 = score(doc=6716,freq=1.0), product of:
              0.040905118 = queryWeight, product of:
                1.2548419 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.012947863 = queryNorm
              0.19668923 = fieldWeight in 6716, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.078125 = fieldNorm(doc=6716)
          0.06850333 = weight(abstract_txt:document in 6716) [ClassicSimilarity], result of:
            0.06850333 = score(doc=6716,freq=3.0), product of:
              0.11826044 = queryWeight, product of:
                2.133634 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.012947863 = queryNorm
              0.57925814 = fieldWeight in 6716, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.078125 = fieldNorm(doc=6716)
          0.70485026 = weight(abstract_txt:similarity in 6716) [ClassicSimilarity], result of:
            0.70485026 = score(doc=6716,freq=10.0), product of:
              0.49075446 = queryWeight, product of:
                6.519639 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.012947863 = queryNorm
              1.4362586 = fieldWeight in 6716, product of:
                3.1622777 = tf(freq=10.0), with freq of:
                  10.0 = termFreq=10.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.078125 = fieldNorm(doc=6716)
        0.2 = coord(5/25)
    
  2. Hook, P.A.: Using course-subject Co-occurrence (CSCO) to reveal the structure of an academic discipline : a framework to evaluate different inputs of a domain map (2017) 0.18
    0.18438901 = sum of:
      0.18438901 = product of:
        0.6585322 = sum of:
          0.021229604 = weight(abstract_txt:standard in 5325) [ClassicSimilarity], result of:
            0.021229604 = score(doc=5325,freq=1.0), product of:
              0.07193884 = queryWeight, product of:
                1.1767031 = boost
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.012947863 = queryNorm
              0.2951063 = fieldWeight in 5325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.7217007 = idf(docFreq=1033, maxDocs=42740)
                0.0625 = fieldNorm(doc=5325)
          0.006436477 = weight(abstract_txt:with in 5325) [ClassicSimilarity], result of:
            0.006436477 = score(doc=5325,freq=1.0), product of:
              0.040905118 = queryWeight, product of:
                1.2548419 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.012947863 = queryNorm
              0.15735139 = fieldWeight in 5325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=5325)
          0.14196971 = weight(abstract_txt:gold in 5325) [ClassicSimilarity], result of:
            0.14196971 = score(doc=5325,freq=2.0), product of:
              0.20266993 = queryWeight, product of:
                1.975058 = boost
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.012947863 = queryNorm
              0.70049715 = fieldWeight in 5325, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                7.925221 = idf(docFreq=41, maxDocs=42740)
                0.0625 = fieldNorm(doc=5325)
          0.046879306 = weight(abstract_txt:evaluate in 5325) [ClassicSimilarity], result of:
            0.046879306 = score(doc=5325,freq=1.0), product of:
              0.13964328 = queryWeight, product of:
                2.0078943 = boost
                5.371321 = idf(docFreq=539, maxDocs=42740)
                0.012947863 = queryNorm
              0.33570758 = fieldWeight in 5325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.371321 = idf(docFreq=539, maxDocs=42740)
                0.0625 = fieldNorm(doc=5325)
          0.057735935 = weight(abstract_txt:algorithms in 5325) [ClassicSimilarity], result of:
            0.057735935 = score(doc=5325,freq=1.0), product of:
              0.16044644 = queryWeight, product of:
                2.1522655 = boost
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.012947863 = queryNorm
              0.35984555 = fieldWeight in 5325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.757529 = idf(docFreq=366, maxDocs=42740)
                0.0625 = fieldNorm(doc=5325)
          0.13210627 = weight(abstract_txt:algorithm in 5325) [ClassicSimilarity], result of:
            0.13210627 = score(doc=5325,freq=1.0), product of:
              0.36952215 = queryWeight, product of:
                4.989298 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.012947863 = queryNorm
              0.3575057 = fieldWeight in 5325, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0625 = fieldNorm(doc=5325)
          0.25217488 = weight(abstract_txt:similarity in 5325) [ClassicSimilarity], result of:
            0.25217488 = score(doc=5325,freq=2.0), product of:
              0.49075446 = queryWeight, product of:
                6.519639 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.012947863 = queryNorm
              0.51385146 = fieldWeight in 5325, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.0625 = fieldNorm(doc=5325)
        0.28 = coord(7/25)
    
  3. Soulier, L.; Jabeur, L.B.; Tamine, L.; Bahsoun, W.: On ranking relevant entities in heterogeneous networks using a language-based model (2013) 0.16
    0.16313131 = sum of:
      0.16313131 = product of:
        0.6797138 = sum of:
          0.010025632 = weight(abstract_txt:information in 2665) [ClassicSimilarity], result of:
            0.010025632 = score(doc=2665,freq=3.0), product of:
              0.03811064 = queryWeight, product of:
                1.2112207 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.012947863 = queryNorm
              0.26306647 = fieldWeight in 2665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=2665)
          0.006436477 = weight(abstract_txt:with in 2665) [ClassicSimilarity], result of:
            0.006436477 = score(doc=2665,freq=1.0), product of:
              0.040905118 = queryWeight, product of:
                1.2548419 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.012947863 = queryNorm
              0.15735139 = fieldWeight in 2665, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=2665)
          0.038589098 = weight(abstract_txt:reports in 2665) [ClassicSimilarity], result of:
            0.038589098 = score(doc=2665,freq=1.0), product of:
              0.13499631 = queryWeight, product of:
                2.279613 = boost
                4.5736475 = idf(docFreq=1198, maxDocs=42740)
                0.012947863 = queryNorm
              0.28585297 = fieldWeight in 2665, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.5736475 = idf(docFreq=1198, maxDocs=42740)
                0.0625 = fieldNorm(doc=2665)
          0.21753323 = weight(abstract_txt:entity in 2665) [ClassicSimilarity], result of:
            0.21753323 = score(doc=2665,freq=2.0), product of:
              0.38849285 = queryWeight, product of:
                4.7362795 = boost
                6.3350143 = idf(docFreq=205, maxDocs=42740)
                0.012947863 = queryNorm
              0.5599414 = fieldWeight in 2665, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                6.3350143 = idf(docFreq=205, maxDocs=42740)
                0.0625 = fieldNorm(doc=2665)
          0.2288148 = weight(abstract_txt:algorithm in 2665) [ClassicSimilarity], result of:
            0.2288148 = score(doc=2665,freq=3.0), product of:
              0.36952215 = queryWeight, product of:
                4.989298 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.012947863 = queryNorm
              0.61921805 = fieldWeight in 2665, product of:
                1.7320508 = tf(freq=3.0), with freq of:
                  3.0 = termFreq=3.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0625 = fieldNorm(doc=2665)
          0.17831458 = weight(abstract_txt:similarity in 2665) [ClassicSimilarity], result of:
            0.17831458 = score(doc=2665,freq=1.0), product of:
              0.49075446 = queryWeight, product of:
                6.519639 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.012947863 = queryNorm
              0.36334786 = fieldWeight in 2665, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.0625 = fieldNorm(doc=2665)
        0.24 = coord(6/25)
    
  4. Wu, T.; Pottenger, W.M.: ¬A semi-supervised active learning algorithm for information extraction from textual data (2005) 0.16
    0.16259983 = sum of:
      0.16259983 = product of:
        0.6774993 = sum of:
          0.011576602 = weight(abstract_txt:information in 4238) [ClassicSimilarity], result of:
            0.011576602 = score(doc=4238,freq=4.0), product of:
              0.03811064 = queryWeight, product of:
                1.2112207 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.012947863 = queryNorm
              0.303763 = fieldWeight in 4238, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.024919491 = weight(abstract_txt:identify in 4238) [ClassicSimilarity], result of:
            0.024919491 = score(doc=4238,freq=1.0), product of:
              0.08005005 = queryWeight, product of:
                1.2412692 = boost
                4.980782 = idf(docFreq=797, maxDocs=42740)
                0.012947863 = queryNorm
              0.31129888 = fieldWeight in 4238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.980782 = idf(docFreq=797, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.009102553 = weight(abstract_txt:with in 4238) [ClassicSimilarity], result of:
            0.009102553 = score(doc=4238,freq=2.0), product of:
              0.040905118 = queryWeight, product of:
                1.2548419 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.012947863 = queryNorm
              0.22252847 = fieldWeight in 4238, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.054573223 = weight(abstract_txt:reports in 4238) [ClassicSimilarity], result of:
            0.054573223 = score(doc=4238,freq=2.0), product of:
              0.13499631 = queryWeight, product of:
                2.279613 = boost
                4.5736475 = idf(docFreq=1198, maxDocs=42740)
                0.012947863 = queryNorm
              0.40425715 = fieldWeight in 4238, product of:
                1.4142135 = tf(freq=2.0), with freq of:
                  2.0 = termFreq=2.0
                4.5736475 = idf(docFreq=1198, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.22780707 = weight(abstract_txt:crime in 4238) [ClassicSimilarity], result of:
            0.22780707 = score(doc=4238,freq=1.0), product of:
              0.40063053 = queryWeight, product of:
                3.4009702 = boost
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.012947863 = queryNorm
              0.56862134 = fieldWeight in 4238, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                9.097941 = idf(docFreq=12, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
          0.34952036 = weight(abstract_txt:algorithm in 4238) [ClassicSimilarity], result of:
            0.34952036 = score(doc=4238,freq=7.0), product of:
              0.36952215 = queryWeight, product of:
                4.989298 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.012947863 = queryNorm
              0.9458712 = fieldWeight in 4238, product of:
                2.6457512 = tf(freq=7.0), with freq of:
                  7.0 = termFreq=7.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0625 = fieldNorm(doc=4238)
        0.24 = coord(6/25)
    
  5. Chinenyanga, T.T.; Kushmerick, N.: ¬An expressive and efficient language for XML information retrieval (2002) 0.16
    0.15985607 = sum of:
      0.15985607 = product of:
        0.66606694 = sum of:
          0.005788301 = weight(abstract_txt:information in 1463) [ClassicSimilarity], result of:
            0.005788301 = score(doc=1463,freq=1.0), product of:
              0.03811064 = queryWeight, product of:
                1.2112207 = boost
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.012947863 = queryNorm
              0.1518815 = fieldWeight in 1463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                2.430104 = idf(docFreq=10226, maxDocs=42740)
                0.0625 = fieldNorm(doc=1463)
          0.012872954 = weight(abstract_txt:with in 1463) [ClassicSimilarity], result of:
            0.012872954 = score(doc=1463,freq=4.0), product of:
              0.040905118 = queryWeight, product of:
                1.2548419 = boost
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.012947863 = queryNorm
              0.31470278 = fieldWeight in 1463, product of:
                2.0 = tf(freq=4.0), with freq of:
                  4.0 = termFreq=4.0
                2.5176222 = idf(docFreq=9369, maxDocs=42740)
                0.0625 = fieldNorm(doc=1463)
          0.046879306 = weight(abstract_txt:evaluate in 1463) [ClassicSimilarity], result of:
            0.046879306 = score(doc=1463,freq=1.0), product of:
              0.13964328 = queryWeight, product of:
                2.0078943 = boost
                5.371321 = idf(docFreq=539, maxDocs=42740)
                0.012947863 = queryNorm
              0.33570758 = fieldWeight in 1463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.371321 = idf(docFreq=539, maxDocs=42740)
                0.0625 = fieldNorm(doc=1463)
          0.031640332 = weight(abstract_txt:document in 1463) [ClassicSimilarity], result of:
            0.031640332 = score(doc=1463,freq=1.0), product of:
              0.11826044 = queryWeight, product of:
                2.133634 = boost
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.012947863 = queryNorm
              0.26754788 = fieldWeight in 1463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                4.280766 = idf(docFreq=1606, maxDocs=42740)
                0.0625 = fieldNorm(doc=1463)
          0.13210627 = weight(abstract_txt:algorithm in 1463) [ClassicSimilarity], result of:
            0.13210627 = score(doc=1463,freq=1.0), product of:
              0.36952215 = queryWeight, product of:
                4.989298 = boost
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.012947863 = queryNorm
              0.3575057 = fieldWeight in 1463, product of:
                1.0 = tf(freq=1.0), with freq of:
                  1.0 = termFreq=1.0
                5.7200913 = idf(docFreq=380, maxDocs=42740)
                0.0625 = fieldNorm(doc=1463)
          0.43677977 = weight(abstract_txt:similarity in 1463) [ClassicSimilarity], result of:
            0.43677977 = score(doc=1463,freq=6.0), product of:
              0.49075446 = queryWeight, product of:
                6.519639 = boost
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.012947863 = queryNorm
              0.8900169 = fieldWeight in 1463, product of:
                2.4494898 = tf(freq=6.0), with freq of:
                  6.0 = termFreq=6.0
                5.8135657 = idf(docFreq=346, maxDocs=42740)
                0.0625 = fieldNorm(doc=1463)
        0.24 = coord(6/25)