1. Begin by collecting crime data from the STL Metropolitan Police Website

2. Look at the Data Values

3. Adjust Data Structures to Match that Needed for Analysis

Rows: 261
Columns: 15
$ Complaint       <chr> "20-000005", "20-000030", "20-000083", "20-000204", "2~
$ CodedMonth      <chr> "2020-01", "2020-01", "2020-01", "2020-01", "2020-01",~
$ DateOccur       <chr> "1/1/2020 0:18", "1/1/2020 2:40", "1/1/2020 10:57", "1~
$ Crime           <int> 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000~
$ District        <int> 3, 6, 5, 3, 1, 4, 2, 4, 5, 6, 5, 5, 5, 2, 4, 6, 6, 4, ~
$ Description     <chr> "HOMICIDE", "HOMICIDE", "HOMICIDE", "HOMICIDE", "HOMIC~
$ ILEADSAddress   <chr> "3004", "5470", "1219", "4114", "4401", "1517", "4000"~
$ Neighborhood    <int> 22, 72, 53, 16, 16, 36, 27, 60, 38, 74, 48, 54, 50, 28~
$ LocationName    <chr> "", "", "", "SOUTH GANGWAY", "", "", "", "", "BERNARD ~
$ LocationComment <chr> "", "", "", "", "OUTSIDE", "IN STREET", "REAR ALLEY", ~
$ CADAddress      <chr> "", "5406", "1219", "4114", "", "", "4011", "2507", "4~
$ CADStreet       <chr> "", "GENEVIEVE", "EUCLID", "MINNESOTA", "", "", "SHAW"~
$ XCoord          <dbl> 899171.6, 892799.8, 888944.8, 895415.3, 891839.7, 9056~
$ YCoord          <dbl> 1007325.0, 1043342.0, 1028564.0, 1000658.0, 1000169.0,~

4. Prepare Data for Manipulating Date/time Fields

Classes 'data.table' and 'data.frame':  261 obs. of  15 variables:
 $ Complaint      : chr  "20-000005" "20-000030" "20-000083" "20-000204" ...
 $ CodedMonth     : Date, format: "2020-01-28" "2020-01-28" ...
 $ DateOccur      : POSIXct, format: "2020-01-01 00:18:00" "2020-01-01 02:40:00" ...
 $ Crime          : int  10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 ...
 $ District       : int  3 6 5 3 1 4 2 4 5 6 ...
 $ Description    : chr  "HOMICIDE" "HOMICIDE" "HOMICIDE" "HOMICIDE" ...
 $ ILEADSAddress  : chr  "3004" "5470" "1219" "4114" ...
 $ Neighborhood   : int  22 72 53 16 16 36 27 60 38 74 ...
 $ LocationName   : chr  "" "" "" "SOUTH GANGWAY" ...
 $ LocationComment: chr  "" "" "" "" ...
 $ CADAddress     : chr  "" "5406" "1219" "4114" ...
 $ CADStreet      : chr  "" "GENEVIEVE" "EUCLID" "MINNESOTA" ...
 $ XCoord         : num  899172 892800 888945 895415 891840 ...
 $ YCoord         : num  1007325 1043342 1028564 1000658 1000169 ...
 - attr(*, ".internal.selfref")=<externalptr> 

5. Review Reporting Delays

     Reporting.diff  YCoord   XCoord   CADStreet CADAddress LocationComment
  1:       164 days       0      0.0   BELLERIVE        112            <NA>
  2:       150 days       0      0.0       WELLS       5203  BOARDING HOUSE
  3:       127 days       0      0.0 VANDEVENTER       2822            <NA>
  4:       125 days       0      0.0        <NA>       <NA>                
  5:        51 days       0      0.0      DELMAR       5453            <NA>
257:        -3 days 1027233 905683.5      HEBERT       1922            <NA>
258:        -3 days 1025961 906344.3        <NA>       <NA>            <NA>
259:        -3 days 1043190 886571.2   STRATFORD       6335       RESIDENCE
260:        -3 days 1024302 907688.3     MADISON       1306            <NA>
261:        -3 days 1026529 907935.3        10TH       2712            <NA>
       LocationName Neighborhood  ILEADSStreet ILEADSAddress Description
  1:           <NA>            1     BELLERIVE           112    HOMICIDE
  2:                          51          <NA>          5203    HOMICIDE
  3: ZX GAS STATION           56   VANDEVENTER          2821    HOMICIDE
  4: BP GAS STATION           64         GRAND           209    HOMICIDE
  5:           <NA>           49        DELMAR          5453    HOMICIDE
257:           <NA>           63     HEBERT ST          1922    HOMICIDE
258:           <NA>           63  ST LOUIS AVE          1420    HOMICIDE
259:           <NA>           70 STRATFORD AVE          6339    HOMICIDE
260:           <NA>           63    MADISON ST          1306    HOMICIDE
261:           <NA>           64     N 10TH ST          2712    HOMICIDE
     District Crime           DateOccur CodedMonth Complaint
  1:        1 10000 2020-02-15 22:30:00 2020-07-28 20-007630
  2:        5 10000 2020-07-01 00:01:00 2020-11-28 20-039980
  3:        5 10000 2020-05-24 01:14:00 2020-09-28 20-021821
  4:        6 10000 2020-07-26 02:40:00 2020-11-28 20-032905
  5:        5 10000 2020-05-08 14:00:00 2020-06-28 20-019553
257:        4 10000 2020-03-31 05:00:00 2020-03-28 20-014426
258:        4 10000 2020-07-31 22:30:00 2020-07-28 20-033932
259:        6 10000 2020-08-31 08:25:00 2020-08-28 20-039270
260:        4 10000 2020-08-31 18:26:00 2020-08-28 20-039382
261:        4 10000 2020-05-31 02:43:00 2020-05-28 20-023001

6. Bring in the Neighborhood Details

OGR data source with driver: ESRI Shapefile 
Source: "C:\Users\jim_PC_dell\Desktop\Crime-master\St Louis Shape files\nbrhds_wards\BND_Nhd88_cw.shp", layer: "BND_Nhd88_cw"
with 88 features
It has 6 fields
Integer64 fields read as strings:  NHD_NUM 

7. Look at the data frame after adding in Neighborhood data

Rows: 88
Columns: 6
$ NHD_NUM    <chr> "43", "29", "28", "40", "41", "42", "39", "44", "36", "37",~
$ NHD_NAME   <chr> "Franz Park", "Tiffany", "Botanical Heights", "Kings Oak", ~
$ ANGLE      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
$ NHD_NUMTXT <chr> "43 Franz Park", "29 Tiffany", "28 Botanical Heights", "40 ~
$ SHAPE_area <dbl> 11012014, 5887342, 11586012, 4706723, 9245751, 9771242, 179~
$ SHAPE_len  <dbl> 14740.430, 10467.847, 14700.023, 9239.956, 12357.106, 12518~

8. Group by Month and Count Number of Homicides per Month

# A tibble: 12 x 3
# Groups:   CodedMonth [12]
   CodedMonth Crime     n
   <date>     <int> <int>
 1 2020-07-28 10000    47
 2 2020-06-28 10000    31
 3 2020-08-28 10000    30
 4 2020-11-28 10000    24
 5 2020-09-28 10000    21
 6 2020-05-28 10000    20
 7 2020-12-28 10000    20
 8 2020-04-28 10000    18
 9 2020-10-28 10000    15
10 2020-01-28 10000    14
11 2020-03-28 10000    11
12 2020-02-28 10000    10

9. Plot Homicides per Month Using ggplot2 Library

10. Look at Neighborhood’s by Name and Count Numbers

# A tibble: 61 x 6
   NHD_NAME         Crime     n cumulative total cumul.percent
   <chr>            <int> <int>      <int> <int>         <dbl>
 1 Baden            10000    15         15   261          5.75
 2 Hamilton Heights 10000    14         29   261         11.1 
 3 Jeff Vanderlou   10000    14         43   261         16.5 
 4 Walnut Park West 10000    11         54   261         20.7 
 5 Carondelet       10000    10         64   261         24.5 
 6 Dutchtown        10000    10         74   261         28.4 
 7 Walnut Park East 10000    10         84   261         32.2 
 8 Greater Ville    10000     9         93   261         35.6 
 9 Wells Goodfellow 10000     8        101   261         38.7 
10 Mount Pleasant   10000     7        108   261         41.4 
# ... with 51 more rows

11. Homicides by Neighborhood

12. Time of Day Homicidess

# A tibble: 261 x 4
# Groups:   CodedMonth [12]
   CodedMonth DateOccur           hr.day day.cat  
   <chr>      <dttm>               <int> <fct>    
 1 2020-01-28 2020-01-01 00:18:00      0 night    
 2 2020-01-28 2020-01-01 02:40:00      2 night    
 3 2020-01-28 2020-01-01 10:57:00     10 morning  
 4 2020-01-28 2020-01-02 02:10:00      2 night    
 5 2020-01-28 2020-01-02 12:25:00     12 afternoon
 6 2020-01-28 2020-01-03 21:21:00     21 evening  
 7 2020-01-28 2020-01-09 13:00:00     13 afternoon
 8 2020-01-28 2020-01-14 03:00:00      3 night    
 9 2020-01-28 2020-01-14 12:23:00     12 afternoon
10 2020-01-28 2020-01-18 18:14:00     18 afternoon
# ... with 251 more rows

13. Let’s Look at the Geospatial Aspects of the Homicide Analysis

 Reporting.diff        YCoord            XCoord        CADStreet        
 Length:261        Min.   :      0   Min.   :     0   Length:261        
 Class :difftime   1st Qu.:1003364   1st Qu.:886653   Class :character  
 Mode  :numeric    Median :1026723   Median :893117   Mode  :character  
                   Mean   : 918686   Mean   :801786                     
                   3rd Qu.:1033512   3rd Qu.:897716                     
                   Max.   :1060370   Max.   :909769                     
  CADAddress        LocationComment    LocationName       Neighborhood      
 Length:261         Length:261         Length:261         Length:261        
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
 ILEADSStreet       ILEADSAddress      Description           District    
 Length:261         Length:261         Length:261         Min.   :0.000  
 Class :character   Class :character   Class :character   1st Qu.:3.000  
 Mode  :character   Mode  :character   Mode  :character   Median :5.000  
                                                          Mean   :4.157  
                                                          3rd Qu.:6.000  
                                                          Max.   :6.000  
     Crime         DateOccur                     CodedMonth        
 Min.   :10000   Min.   :2020-01-01 00:18:00   Min.   :2020-01-28  
 1st Qu.:10000   1st Qu.:2020-05-21 17:21:00   1st Qu.:2020-05-28  
 Median :10000   Median :2020-07-14 20:47:00   Median :2020-07-28  
 Mean   :10000   Mean   :2020-07-15 09:48:42   Mean   :2020-07-29  
 3rd Qu.:10000   3rd Qu.:2020-09-20 15:07:00   3rd Qu.:2020-09-28  
 Max.   :10000   Max.   :2020-12-28 01:03:00   Max.   :2020-12-28  
  Complaint           NHD_NAME        
 Length:261         Length:261        
 Class :character   Class :character  
 Mode  :character   Mode  :character  

14. Important to understanding the geospatial structures of the data

15. Must Account For Inconsistent Coordinate Data

    Reporting.diff YCoord XCoord   CADStreet CADAddress LocationComment
 1:       164 days      0      0   BELLERIVE        112            <NA>
 2:       150 days      0      0       WELLS       5203  BOARDING HOUSE
 3:       127 days      0      0 VANDEVENTER       2822            <NA>
 4:       125 days      0      0        <NA>       <NA>                
 5:        51 days      0      0      DELMAR       5453            <NA>
 6:        50 days      0      0        <NA>       <NA>            <NA>
 7:        33 days      0      0     MAFFITT       5376            <NA>
 8:        21 days      0      0       ALICE       4561            <NA>
 9:        21 days      0      0        <NA>          1            <NA>
10:        20 days      0      0        <NA>       4949            <NA>
11:        19 days      0      0        <NA>       <NA>            <NA>
12:        15 days      0      0     CABANNE       5811            <NA>
13:        15 days      0      0        <NA>       <NA>            <NA>
14:        14 days      0      0       VISTA       3635            <NA>
15:        12 days      0      0        <NA>       <NA>            <NA>
16:        11 days      0      0        <NA>       <NA>            <NA>
17:         8 days      0      0        <NA>       <NA>            <NA>
18:         6 days      0      0       ALICE       2145            <NA>
19:         5 days      0      0        <NA>       <NA>            <NA>
20:         4 days      0      0    BROADWAY       8105            <NA>
21:         4 days      0      0      ALDINE       4578            <NA>
22:         4 days      0      0   RIO TINTO       7837            <NA>
23:         3 days      0      0        <NA>       <NA>            <NA>
24:         1 days      0      0    NEWBERRY       4544            <NA>
25:         1 days      0      0    BROADWAY       8200            <NA>
26:         0 days      0      0      SELBER       5831            <NA>
27:        -2 days      0      0    BROADWAY       8551            <NA>
    Reporting.diff YCoord XCoord   CADStreet CADAddress LocationComment
      LocationName Neighborhood             ILEADSStreet ILEADSAddress
 1:           <NA>            1                BELLERIVE           112
 2:                          51                     <NA>          5203
 3: ZX GAS STATION           56              VANDEVENTER          2821
 4: BP GAS STATION           64                    GRAND           209
 5:           <NA>           49                   DELMAR          5453
 6:           <NA>           60                   DODIER          1929
 7:           <NA>           50                  MAFFITT          5372
 8:           <NA>           68                    ALICE          4561
 9:           <NA>            0        UNKNOWN 20-050582             0
10:           <NA>            0                  UNKNOWN             0
11:           <NA>           71                  LILLIAN          5210
12:           <NA>           48                  CABANNE          5811
13:    PARKING LOT           56             NORTH MARKET          3905
14:           <NA>            0 UNKNOWN CITY OF ST LOUIS             0
15:           <NA>           78               BLACKSTONE          1474
16:           <NA>           67                      LEE          3844
17:           <NA>           35               WASHINGTON           405
18:           <NA>           68                    ALICE          2144
19:           <NA>           64                    MOUND           120
20:           <NA>            2                 BROADWAY          8105
21:           <NA>           56                   ALDINE          4576
22:           <NA>            1                RIO SILVA          7859
23:           <NA>           65                     <NA>          <NA>
24:           <NA>           54                 NEWBERRY          4544
25:       Circle K            2                 BROADWAY          8200
26:           <NA>           50               GOODFELLOW          3401
27:           <NA>           74                 BROADWAY          8608
      LocationName Neighborhood             ILEADSStreet ILEADSAddress
    Description District Crime           DateOccur CodedMonth Complaint
 1:    HOMICIDE        1 10000 2020-02-15 22:30:00 2020-07-28 20-007630
 2:    HOMICIDE        5 10000 2020-07-01 00:01:00 2020-11-28 20-039980
 3:    HOMICIDE        5 10000 2020-05-24 01:14:00 2020-09-28 20-021821
 4:    HOMICIDE        6 10000 2020-07-26 02:40:00 2020-11-28 20-032905
 5:    HOMICIDE        5 10000 2020-05-08 14:00:00 2020-06-28 20-019553
 6:    HOMICIDE        4 10000 2020-05-09 16:20:00 2020-06-28 20-019670
 7:    HOMICIDE        5 10000 2020-07-26 20:20:00 2020-08-28 20-033059
 8:    HOMICIDE        6 10000 2020-12-07 20:08:00 2020-12-28 20-055207
 9:    HOMICIDE        0 10000 2020-11-07 09:31:00 2020-11-28 20-050582
10:    HOMICIDE        0 10000 2020-04-08 21:00:00 2020-04-28 20-015525
11:    HOMICIDE        6 10000 2020-12-09 13:26:00 2020-12-28 20-055459
12:    HOMICIDE        5 10000 2020-12-13 03:30:00 2020-12-28 20-056007
13:    HOMICIDE        5 10000 2020-08-13 01:03:00 2020-08-28 20-035970
14:    HOMICIDE        0 10000 2020-07-14 13:12:00 2020-07-28 20-030853
15:    HOMICIDE        5 10000 2020-03-16 20:45:00 2020-03-28 20-012642
16:    HOMICIDE        6 10000 2020-12-17 23:00:00 2020-12-28 20-056813
17:    HOMICIDE        4 10000 2020-12-20 21:26:00 2020-12-28 20-057230
18:    HOMICIDE        6 10000 2020-12-22 09:54:00 2020-12-28 20-057462
19:    HOMICIDE        4 10000 2020-12-23 23:30:00 2020-12-28 20-057726
20:    HOMICIDE        1 10000 2020-12-24 05:10:00 2020-12-28 20-057741
21:    HOMICIDE        5 10000 2020-12-24 12:25:00 2020-12-28 20-057800
22:    HOMICIDE        1 10000 2020-12-24 20:15:00 2020-12-28 20-057833
23:    HOMICIDE        4 10000 2020-12-25 01:39:00 2020-12-28 20-057853
24:    HOMICIDE        5 10000 2020-12-27 14:32:00 2020-12-28 20-058118
25:    HOMICIDE        1 10000 2020-12-27 18:07:00 2020-12-28 20-058138
26:    HOMICIDE        5 10000 2020-12-28 01:03:00 2020-12-28 20-058163
27:    HOMICIDE        6 10000 2020-03-30 12:55:00 2020-03-28 20-014351
    Description District Crime           DateOccur CodedMonth Complaint
 1:              Carondelet
 2:                 Academy
 3:           Greater Ville
 4:   Near North Riverfront
 5:         Visitation Park
 6:         St. Louis Place
 7:        Wells Goodfellow
 8:                O'Fallon
 9:                    <NA>
10:                    <NA>
11:              Mark Twain
12:                West End
13:           Greater Ville
14:                    <NA>
15:        Hamilton Heights
16: Fairground Neighborhood
17:                Downtown
18:                O'Fallon
19:   Near North Riverfront
20:                   Patch
21:           Greater Ville
22:              Carondelet
23:               Hyde Park
24:             Lewis Place
25:                   Patch
26:        Wells Goodfellow
27:                   Baden

16. Complete Records

     Reporting.diff  YCoord   XCoord CADStreet CADAddress LocationComment
  1:        47 days 1001653 884160.5      <NA>       <NA>            <NA>
  2:        37 days 1007223 883071.9    PARKER       5274            <NA>
  3:        27 days 1007325 899171.6                                     
  4:        27 days 1043342 892799.8 GENEVIEVE       5406                
  5:        27 days 1028564 888944.8    EUCLID       1219                
230:        -3 days 1027233 905683.5    HEBERT       1922            <NA>
231:        -3 days 1025961 906344.3      <NA>       <NA>            <NA>
232:        -3 days 1043190 886571.2 STRATFORD       6335       RESIDENCE
233:        -3 days 1024302 907688.3   MADISON       1306            <NA>
234:        -3 days 1026529 907935.3      10TH       2712            <NA>
     LocationName Neighborhood    ILEADSStreet ILEADSAddress Description
  1:         <NA>            5    CHRISTY BLVD          4934    HOMICIDE
  2:         <NA>           14      PARKER AVE          5274    HOMICIDE
  3:                        22 S JEFFERSON AVE          3004    HOMICIDE
  4:                        72   GENEVIEVE AVE          5470    HOMICIDE
  5:                        53    N EUCLID AVE          1219    HOMICIDE
230:         <NA>           63       HEBERT ST          1922    HOMICIDE
231:         <NA>           63    ST LOUIS AVE          1420    HOMICIDE
232:         <NA>           70   STRATFORD AVE          6339    HOMICIDE
233:         <NA>           63      MADISON ST          1306    HOMICIDE
234:         <NA>           64       N 10TH ST          2712    HOMICIDE
     District Crime           DateOccur CodedMonth Complaint
  1:        1 10000 2020-06-11 23:51:00 2020-07-28 20-025203
  2:        2 10000 2020-09-21 05:14:00 2020-10-28 20-042812
  3:        3 10000 2020-01-01 00:18:00 2020-01-28 20-000005
  4:        6 10000 2020-01-01 02:40:00 2020-01-28 20-000030
  5:        5 10000 2020-01-01 10:57:00 2020-01-28 20-000083
230:        4 10000 2020-03-31 05:00:00 2020-03-28 20-014426
231:        4 10000 2020-07-31 22:30:00 2020-07-28 20-033932
232:        6 10000 2020-08-31 08:25:00 2020-08-28 20-039270
233:        4 10000 2020-08-31 18:26:00 2020-08-28 20-039382
234:        4 10000 2020-05-31 02:43:00 2020-05-28 20-023001
  1:                  Bevo Mill
  2:              North Hampton
  3:                Benton Park
  4:           Walnut Park East
  5:              Fountain Park
230:        Old North St. Louis
231:        Old North St. Louis
232: Mark Twain I-70 Industrial
233:        Old North St. Louis
234:      Near North Riverfront

17. Now we need to convert the NAD83 Coordinates to WGS84 Structure

18. Get Incomplete Data Missing Coordinates

19. Combine Map Sets to View the Entire Picture of Homicide Location in St Louis

20. Final Map of Homicides with Neighborhood Overlays

25. Now We Look at These Homicides Plots with Density Contours


21. Another View Using Same Data Set Gives Us Heat Map

22. Here is a Very Interesting View Called a Cluster Map

23. This Illustrates the “Hayden Rectangle” Plotted Out

24. This is the Chief’s Box Overlaid with Homicides

25. View Crime based on Police Districts

26. This Overlays Homicides Within the Police Districts

27. Finally We Look at Police Districts with Crime Clustering

28. Food for Thought

### 1.  Begin by collecting crime data from the STL Metropolitan Police Website 

```{r, include=TRUE}
# Collect St Louis City crime UCR statistics
# pull in state coordinate system files from st louis police reports using data.table
crime <- fread("data/Group2018_2020.csv", stringsAsFactors=FALSE)


- The STL Metropolitan Police produces a monthly crime update.  

- Stored in a csv format and can be downloaded.  

- Located at .  

- The file provides all crime details collected from the preceding month.  

- Contains locations, neighborhoods, precincts, map coordinates and times of crimes in the St Louis Metropolitan Area.

### 2.  Look at the Data Values 

```{r, include=TRUE}


- Again, some fields are irrelevant to our analysis.   

- We will remove these elements using a tidyverse library called *dplyr*.  

- We will also have to restructure certain date/time variables.  

- Flags are not needed.  

- Don't see how count field is significant in the analysis.

### 3.  Adjust Data Structures to Match that Needed for Analysis 

```{r, include=TRUE}
crimeA <- crime %>%
  dplyr::select(-FlagCrime, -FlagUnfounded, -FlagAdministrative, -Count, -FlagCleanup) %>%
  filter(Crime == 10000) %>%
  distinct(Complaint, .keep_all = TRUE)



- I wanted to select a specific crime. In this case we will look at Homicides.  

- Some data fields are not relevant to the analysis so I've limited the data to the following 6 elements.  

- Homicides are UCR coded as *10000*.  

- Although the STLMPD website states rows are unique, they are *NOT*.  

- During this phase I also wanted to determine data types.   

- The mix is a combination of characters string and integers.  

- I will have to re-charactize some elements to more easily manipulate later.  

- "CodedMonth" and "DateOccur" are not date/time elements, so they need to be changed.

### 4.  Prepare Data for Manipulating Date/time Fields 

```{r, include=FALSE}
crimeA$CodedMonth <- str_c(crimeA$CodedMonth, "28", sep = "-") # use stringr to create add a day to the y/m structure
crimeA$CodedMonth <- as_date(crimeA$CodedMonth) # use lubridate to convert to actual y/m/d
crimeA$DateOccur <- mdy_hm(crimeA$DateOccur) # use lubridate to change string to date/time structure

```{r, include=TRUE}
Result of Changing String Value
# - "CodedMonth" is now a date format and "DateOccur" is now a POSIX date time data type.
# - Check structures of the data.


- Need to use some R libraries to convert data types.  

- Used *stringr* and *lubridate* libraries to change data types.  

- Changed "CodedMonth" to a string value closer to one resembling a year/month/day field.  

- Used 28 days as the day value so I do not have to constantly worry about the changing days/month values.  

- Since the data is collected as of the last day of the month, it will not affect the monthly crime perspective.  

- Next I created a concatonated string group and convert that field into a "POSIX" day/month/day variable.  

Check Final Data Structure

Make Date Structures Compatable and Calculate Reporting Delays

# - An interesting side note is to see the differences between  reporting day and actual incident date.
# - Some of the records are reported significantly longer than 30 days.
crimeB <- crimeA %>% mutate(Reporting.diff = CodedMonth - as_date(DateOccur)) %>%
  dplyr::select(Reporting.diff:Complaint) %>%
crimeB$Neighborhood <- as_factor(crimeB$Neighborhood) # change to factor for later join

### 5.  Review Reporting Delays 

```{r, include=TRUE}

### **6.  Bring in the Neighborhood Details**

```{r, include=TRUE}
### Now join neighborhoods with names
#add neighborhood shapes to a data frame
# From https://www.census.gov/geo/maps-data/data/cbf/cbf_state.html
hoods.sf <- readOGR("St Louis Shape files/nbrhds_wards/BND_Nhd88_cw.shp")
hoods.sf <- spTransform(hoods.sf, CRS("+proj=longlat +datum=WGS84"))
#mapviewOptions(fgb = FALSE)
hoods <- mapview(hoods.sf, map.types = c("OpenStreetMap"),
                layer.name = c("Neighborhoods"),
                alpha.regions = 0.1,
                alpha = 2,
                legend = FALSE,
                zcol = c("NHD_NAME"))


- Collected US Census data to bring in geospatial polygons that represent St Louis Neighborhoods.  

- Transformed mapview data into *WGS84* structure.  

- Check to make sure data is a geospatial object.  

- Use census geospatial data to generate a map.  

Convert Neighborhood Details

# - Change SF file into a data frame.
# collect neighborhood details from shape file
hoods.df <- as(hoods.sf, "data.frame")
class(hoods.df) # check class

### 7.  Look at the data frame after adding in Neighborhood data

```{r, include=TRUE}


- We have 88 neighborhoods and their name and number are factor types in R.  

- The polygon shapes are included in this data frame.  

Clean Up Data - Trim Neighborhoods and Prepare for Joins

# - Bring in the neighborhood name with their respective number codes.
# - Create a new data frame.
crimeC <- hoods.df  %>% dplyr::select(NHD_NUM, NHD_NAME)
# crimeC$NHD_NUM <- as.integer(crimeC$NHD_NUM) # convert to integer
# join homicide table with hoods table to get neighborhood names
crimeD <- left_join(crimeB, crimeC, by = c("Neighborhood" = "NHD_NUM")) 

### See the Final Data Frame 

### 8.  Group by Month and Count Number of Homicides per Month 

```{r, include=TRUE}
crimeA %>% 
  group_by(CodedMonth) %>%
  count(Crime) %>%


- Group data by coded month.  

- Count the number of *homicides per month*.  

- Data presented in a bar graph with totals displayed above the bar.  

- I added a smoothing line to get a better view of the crime movement.  

- Note that October 2018 was the peak.  

- It was when Channel 5 reported the sever increase in carjackings. Looks like homicids too. 

- It was also the timeframe when they reported establishing atask force.  

```{r, include=FALSE}
### Plot the count by month
crime.month <- crimeA  %>% 
  group_by(CodedMonth) %>%
  count(Crime) %>%
xx = ggplot(crime.month, aes(x = CodedMonth, y = n)) +
  geom_text(aes(label = n, y = n), size = 5, position = position_stack(vjust = 1.2)) +
  geom_col(color = "cornflowerblue") +
  geom_point() +
  stat_smooth() +  # add a smoothing regerssion for time series
  scale_x_date(date_breaks = "4 weeks", date_labels = "%m") +
  theme(axis.text.x = element_text(angle = 90)) +  # change tex to verticle
  labs(title = "Homicides Per Month", x= "Month", y = "C
       Homicide Count") 
### **9.  Plot Homicides per Month Using _ggplot2_  Library**

```{r, include=TRUE}
### Homicides by Month

### 10.  Look at Neighborhood's by Name and Count Numbers {data-background=#fae5e3}

```{r, include=TRUE}
### Neighborhood By Name   
### Group by Neighborhood and count
crimeD  %>%
                      na_level = "to_impute") %>%
  group_by(NHD_NAME) %>%
  count(Crime, sort = TRUE) %>%
  arrange(desc(n)) %>%
  mutate (cumulative = cumsum(n), total = sum(n), cumul.percent = cumsum(c(n/total *100)))


- Had to adjust the factor variables (NHD_NAME) and to account for missing variables (NA).  

- Count by crime and put in decending order.  

- This is a display of the highest crime neighborhoods.  

- 70% of the homicides are committed in the top 21 neighborhoods (23%)

### 11.  Neighborhoods Count by Month 

# - Group by Neighborhood Name.  

# - Chart puts data in a descending order and presents greater than 5.  
### Plot the count by month

hood.number <- crimeD %>%
                      na_level = "to_impute") %>%
  group_by(NHD_NAME) %>%
  count(Crime) %>%
  filter(n > 5) %>%

xy = ggplot(hood.number, aes(x = reorder(NHD_NAME, +n), y = n)) +
  geom_bar(stat = "identity") +
  geom_col(color = "cornflowerblue") +
  coord_flip() +
  theme(axis.text.x = element_text(angle = 90)) + # change tex to verticle
  labs(title = "Homicides by Neighborhood", x= "Neighborhood", y = "Homicide Count")

### **11.  Homicides by Neighborhood** 

```{r, include=TRUE}


- Group by Neighborhood Name.  

- Chart puts data in a descending order and presents greater than 5.  

```{r, echo=FALSE, include=FALSE}
### 12. Time of Day Carjacks 

## create and mutate an hour of day field using lubridate
hour.day <- as.integer(format(crimeA$DateOccur, "%H"))
crimeA <- crimeA %>% as_tibble() %>%
  mutate(hr.day = as.integer(format(crimeA$DateOccur, "%H"))) 

## This adds a new field to crimeA data frame to categorize a day into 6 hour blocks
## used a logic functons to segment day categories
## adds field to crimeA
crimeA$day.cat <- ifelse(crimeA$hr.day > 0 & crimeA$hr.day < 6, "night",
                         ifelse(crimeA$hr.day >= 6 & crimeA$hr.day < 12, 'morning',
                                ifelse(crimeA$hr.day > 12 & crimeA$hr.day <= 18, "afternoon",
                                       ifelse(crimeA$hr.day > 18 & crimeA$hr.day < 24, "evening",
                                              ifelse(crimeA$hr.day == 0, "night",
                                                     ifelse(crimeA$hr.day == 12, "afternoon", NA ))))))
## arrange as factors
day.lvls <- c("morning", "afternoon", "evening", "night")
crimeA$day.cat <- factor(crimeA$day.cat, levels = day.lvls)

### **12. Time of Day Homicidess**

-  Look at the time of the day that the homicides occurred

```{r, echo=FALSE, include=TRUE}

homicide_tod <- crimeA %>% select(c(2:3,16:17)) %>%
homicide_tod$CodedMonth <- as.character(homicide_tod$CodedMonth)


ggplot(homicide_tod) +
  geom_bar(mapping = aes(x = CodedMonth, fill = day.cat), position = "dodge")  +
scale_fill_discrete(name = "Time of Day", labels = c("Morning 6-12 ", "Afternoon 12-18", "Evening 18-24", "Night 24-6")) +
  theme(axis.text.x = element_text(angle = 90)) +
labs(title = "Monthly Homicide by Time of Day", x= "Time of Day", y = "Homicide Count") 


-  Create and mutate an hour of day field using lubridate.  

-  This creates a new field to crimeA data frame to categorize a day into 6 hour blocks.  

-  Used a logic functions to segment day categories

-  create a dataframe to thin out and group variables that focus on time of day.

### 13.  Let's Look at the Geospatial Aspects of the Homicide Analysis 

```{r, include=TRUE}
### Summary of the Characteristics of the Crime Data {data-background=#fae5e3}



- We will use the data we restructed earlier in the analysis.  

- We will use the crime D file.  

- Check the structure of the file we selected.  

### 14.  Important to understanding the geospatial structures of the data 

- XCoord and YCoord coordinates are based on the State Plane North American Datum 1983 (NAD83) format.  

- This data will have to be converted to lat/long values.  

- Some of the XCoords and YCoords have values of O.  This will need to be accounted for later in the analysis.  

Let's Review the Basic Data Structure

### 15.  Must Account For Inconsistent Coordinate Data 

crimeD.zeros <- crimeD %>% filter(XCoord < 1)

```{r, include=TRUE}
Missing Coordinates
crimeD.zeros # there are 20 homicide records that cannot be processed directly


- Collect those records whose X/Y values are zeros.  

- These records will need a different type of processing.  

Records That Can Be Directly Converted to Lat/Long
crimeD.complete <- crimeD %>% filter(XCoord > 1)

### 16.  Complete Records 

```{r, include=TRUE}


- These records are in much better shape.  

- They have both X and Y coordinates.  

### 17.  Now we need to convert the NAD83 Coordinates to WGS84 Structure 

```{r, echo=TRUE}
nad83_coords <- data.frame(x=crimeD.complete$XCoord, y=crimeD.complete$YCoord) # My coordinates in NAD83
nad83_coords <- nad83_coords *.3048  ### Feet to meters
coordinates(nad83_coords) <- c('x', 'y')
coordinates_deg <- spTransform(nad83_coords,CRS("+init=epsg:4326"))
# add converted lat-lonf and convert to numeric values
crimeD.complete$lon <- as.numeric(coordinates_deg$x)
crimeD.complete$lat <- as.numeric(coordinates_deg$y)


- Function transforms all the State Plane Coordinate values into NAD84 lat/long coordinates.  

- More modern mapping structure used for GPS Mapping.  

Review Charistics of Downloaded Crime Data

### 18.  Get Incomplete Data Missing Coordinates {data-background=#fae5e3}

- Used _censusxy_ library to pull latitude/longitude.  

- The geocode function from the library requires a street address and number, city, and zip code (if available).  

- It goes to the US Census Bureau to look up the address reported on police record and returns a lat/long.  

- It creates an _sf_ file and allows plotting of locations on a map.  

- Can only convert 22 instances with _censusxy_ since some addresses locations are missing. 

                **  cxy_geocode changed. class id function not output **

data <- mutate(crimeD.zeros, address.comb = paste(CADAddress, CADStreet, sep = " "), city = "St Louis", state = "MO")
crimeD_sf <- cxy_geocode(data, street = 'address.comb', city = 'city', state = 'state',  class = "sf")
STL_homicides.small <- mapview(crimeD_sf,
                 map.types = c("OpenStreetMap"),
                 legend = FALSE,
                 popup = popupTable(data,zcol = c("Complaint",

Locations Obtained From US Census With Addresses Only ...

Larger Grouping that Contained Coordinates 
#- These records contain the X/Y plotted locations.   
### create an sf file that will map coordinates

data.one <- mutate(crimeD.complete, address.comb = paste(CADAddress, CADStreet, sep = " "), city = "St Louis", state = "MO")
crimeD_one.sf <- st_as_sf(data.one, coords = c("lon", "lat"), crs = 4326, agr = "constant")
STL_homicides <- mapview(crimeD_one.sf, map.types = c("OpenStreetMap"),
                        legend = FALSE,
                        popup = popupTable(data.one, zcol = c("Complaint",

### 19.  Combine Map Sets to View the Entire Picture of Homicide Location in St Louis

```{r, include=TRUE}
total_homicides <- STL_homicides + STL_homicides.small


Bring Up Neighborhood Map


- Add neighborhoods.   

- From   

### **20.  Final Map of Homicides with Neighborhood Overlays**

```{r, include=TRUE}
#- Combine all the maps.
total_homicides <- STL_homicides + STL_homicides.small + hoods

- These records are overlaid on the neighborhood polygons.  

- They have both X and Y coordinates.  

```{r, echo=FALSE}
Now We Look at Some Plots Targeting the Intensity of the Crime Area

# - Start with a quick plot of the homicides locations. 

###  reduce crime to violent crimes in downtown 
violent_crimes <- crimeD.complete %>% 
    Crime == 10000, 
    -90.3238 <= lon & lon <= -90.1794334,
    38.0 <= lat & lat <=  39.0 ) 
# use qmplot to make a scatterplot on a map
qmplot(lon, lat, data = violent_crimes,
       maptype = "toner-lite", color = I("red"), zoom = 12)

###  **25.  Now We Look at These Homicides Plots with Density Contours**

```{r, include=TRUE}
###  Density contour plots
qmplot(lon, lat, data = violent_crimes, maptype = "toner-lite",
       geom = "density2d", color = I("red"), zoom = 12)

- Peaks illustrate highest crime numbers for that area.  

- Contours indicate similiar occurrances.  

### **21. Another View Using Same Data Set Gives Us Heat Map**  

```{r, include=TRUE}
###  This provides a good look at the density of homicides in the city
qmplot(lon, lat, data = violent_crimes, geom = "blank", 
       zoom = 14, maptype = "toner-background", legend = FALSE) +
  stat_density_2d(aes(fill = ..level..), geom = "polygon", alpha = .35, colour = NA) +
  scale_fill_gradient2("Homicides\nHeatmap", low = "white", mid = "yellow", high = "red", midpoint = 20)


- Darker areas indicate higher level of homicides.  

Another View of Crime Area Numbers

# - Use clusters to illustrate numbers in an area
zz <- leaflet(data=crimeD.complete) %>% 
  addTiles() %>%
  setView(-90.222, 38.608, zoom = 11) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addCircleMarkers(lng = ~lon, 
                   lat = ~lat, 
                   fillColor = blues9,
                   stroke = FALSE, fillOpacity = 0.8,
                   clusterOptions = markerClusterOptions(),
                   popup = ~DateOccur) %>%
    addPolygons(data= hoods.sf, label = ~NHD_NAME,
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.005,
              highlightOptions = highlightOptions(color = "white",
                                                  weight = 2,
                                                  bringToFront = TRUE))

###  **22.  Here is a Very Interesting View Called a Cluster Map**

```{r, include=TRUE}


- It uses clusters counts to illustrate homicice numbers in selected city areas.  

- As you drill down it recalculates the numbers over city areas.

####  Task force focus  
### Created database that defines the crime focus area
police_crime_focus <- fread("police_crime_focus.csv", stringsAsFactors=FALSE)
### Create a spatial file of the police crime focus
#  police_crime_focus
police_point.sf <- st_as_sf(police_crime_focus,
                            coords = c("lon", "lat"),
                            crs = 4326, agr = "constant")
###police points
### Create matrisx of lat/long
df <- data.frame(police_crime_focus$lon, police_crime_focus$lat)
# You need first to close your polygon 
# (first and last points must be identical)
df <- rbind(df, df[1,])
### Create a lolygon of the area of the police box
police.polygon <- st_sf(st_sfc(st_polygon(list(as.matrix(df)))), crs = 4326)
# police.polygon
police.box <- mapview(police.polygon, map.types = c("OpenStreetMap"),
                layer.name = c("Police Box"),
                legend = FALSE,
                alpha.regions = 0.3,
                alpha = 6,
                label = NULL,
                color = "red",
                col.regions = "red")
## Show police box in red

### 23.  This Illustrates the "Hayden Rectangle" Plotted Out

```{r, include=TRUE}


- From intersection of Goodfellow and MLK.  

- North along Goodfellow to W. Florissant.  

- Then Southeast along W. Florissant to Prarie.  

- Then southwest along Prarie/Vandeventner to MLK.  

- Back to MLK and Goodfellow.  

# Add in Police Box               
STLtotal_homicides <- STL_homicides + STL_homicides.small + police.box

### **24.  This is the Chief's Box Overlaid with Homicides** 

```{r, include=TRUE}



- This is how it plots out with homicides.  

- A better prediction here, but the box still misses the south side hotspot.  

- Also, note the area running west along Interstate 55 and Northwest along Interstate 70.

- And the mayor said she would give him an *A*?  

```{r, message=FALSE}
#add police district  shapes to a data frame
police_district.sf <- readOGR("police-districts/GIS.STL.POLICE_DISTRICTS_2014.shp")
police_district.sf <- spTransform(police_district.sf, CRS("+proj=longlat +datum=WGS84"))
police_district  <- mapview(police_district.sf, map.types = c("OpenStreetMap"),
                 layer.name = c("DISTNO"),
                 alpha.regions = 0.1,
                 alpha = 7,
                 legend = FALSE,
                 zcol = c("DISTNO"))

### **25.  View Crime based on Police Districts**

```{r,  include=TRUE}


- Established in 2014.  

- These are the 6 police districts.  

- Now they are considering restructuring them again.  

- They want to increase the number.  

- Improvement or just more overhead?  

# combine total crimes and pokice districts
district_homicides <- police_district + STL_homicides + STL_homicides.small

### **26.  This Overlays Homicides Within the Police Districts**

```{r, include=TRUE}

```{r, echo=FALSE}
# Provide cluster view with current police districts using 
xxx <- leaflet(data=crimeD.complete) %>% 
  addTiles() %>%
  setView(-90.222, 38.608, zoom = 11) %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addCircleMarkers(lng = ~lon, 
                   lat = ~lat, 
                   fillColor = blues9,
                   stroke = FALSE, fillOpacity = 0.8,
                   clusterOptions = markerClusterOptions(),
                   popup = ~DateOccur) %>%
  addPolygons(data=police_district.sf, label = ~DISTNO,
              color = "#444444",
              weight = 1,
              smoothFactor = 0.5,
              opacity = 1.0,
              fillOpacity = 0.005,
              highlightOptions = highlightOptions(color = "white",
                                                  weight = 3,))


### **27.  Finally We Look at Police Districts with Crime Clustering**

```{r, include=TRUE}


- Review crimes by each of 6 police districts.  

### **28.  Food for Thought**

- Need to collect more data for greater understanding of crime parameters.  

- This data set has close to 8,000 instances of "FIREARM" defined crime.  Where are the locations?  

- Need to plot heroine and cocaine locations to see overlaps.

- There is no gang data available since 2012.  St Louis does not have a Gang Division. Does it need one?  

- UCR reporting structure is poorly constructed for nation as a whole.  How could it be improved?


