By: Joshua Miller
One thing I've wanted to visualize from the
hbg-crime.org dataset is what times of day
have the most and least crime, in which parts of the city. Using the
Gadfly plotting package with
Julia makes that easy.
First, pull down the current dataset:
$ wget http://hbg-crime.org/reports.csv
Then launch Julia, and import all the libraries we'll be using.
using DataFrames using Datetime using Gadfly
We'll read the reports into a DataFrame:
data = readtable("reports.csv")
Then we need to convert the time of the report into an hour of the
day, from 0 (midnight to 1:00 am) to 23 (11:00 pm to midnight):
formatter = "yyyy-MM-ddTHH:mm:ss" function hourofday(d::String) Datetime.hour(Datetime.datetime(formatter, d)) end @vectorize_1arg String hourofday @transform(data, Hour => hourofday(End))
We're just creating a quick function that takes a
converts it to a
DateTime, then extracts the hour; after that, we
just vectorize that function and apply it to the "End" column from the
The final data we need is just to group those results by Neighborhood
results = by(data, ["Neighborhood", "Hour"], nrow) complete_cases!(results)
complete_cases! function just strips all of the non-classified
data out, as it tends to give Gadfly some problems. Speaking of which,
all that's left is to create the plot and draw it to an SVG file:
p = plot(results, y="x1", x="Hour", color="Neighborhood", Guide.XLabel("Hour of Day"), Guide.YLabel("Number of Reports"), Geom.bar(position=:dodge)) draw(SVG("results.svg", 6inch, 6inch), p)
color= attribute tells Gadfly to use the "Neighborhood" column
to group different columns.
Crime spikes everywhere after dark and decreases during the day, but
unsurprisingly Downtown sees a disproportionate spike around 1:00-2:00
am when the bars let out.
Full source is available on Github.