-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.txt
111 lines (84 loc) · 3.15 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
=====================================
CRIME COUNTS PER NEIGHBORHOOD
Compiled by nuel
-------------------------------------
In this README:
0. Introduction
1. About this dataset
2. Content and structure
3. Delict codes explanation
================
| INTRODUCTION |
----------------
This project was part of a larger effort to investigate the
cyber division of the Dutch police -- in particular their
latest developments in terms of predictive policing. This
dataset is an effort to take back some transparency and
insight into the data that's being used behind the scenes
to predict and "prevent" crime.
When technology that is barely understood produces
questionable results that are taken as infallible truths, and
when these become guiding principles to enact violence,
marginalized people will always end up paying the price. 1312
======================
| ABOUT THIS DATASET |
----------------------
In 2018, Dutch police in The Hague (whose jurisdiction extends
to several nearby cities) ran a public website where residents
could check local crime stats. The numbers were very local: they
were grouped per neighborhood, which in many cases is not bigger
than just a few streets.
However, the website only let you view one month at a time
and always filtered on one specific type of crime.
That's why I decided to scrape the raw data. In the folder
"data", you'll find the complete dataset that the police
website uses. This includes crime counts, but also the
geographical boundaries for each neighborhood.
If you want to scrape a fresh copy yourself, the scripts I
wrote to compile the dataset are also included.
=========================
| CONTENT AND STRUCTURE |
-------------------------
- SQLite
In `data/delicts.db` you'll find the complete dataset
as a sqlite database. This is probably the easiest to
work with if you want to do some pattern analysis.
- JSON
The dataset is also available as JSON. There's a folder
for each year of data, which contains a file for each
district. These, in turn, group delict counts per
neighborhood, per month, per delict code. Basically,
the structure is like this:
> Year
> District
> Delict code
> Month
> Neighborhood
- Boundaries
The file `data/areas.txt` contains the name and
geographical boundaries of each district code. (The
same code used for the JSON filenames.) They are in a
custom format that uses commas and pipe characters to
separate values.
- Scripts
If you want to compile the dataset yourself, the scripts
I wrote to scrape and organize the data are also included.
Just run `python script_name.py`.
===========================
| DELICT CODE EXPLANATION |
---------------------------
The dataset uses delict codes to distinguish crime types.
Their names can be found by inspecting the web form and
are listed below:
[100] Theft while on scooter
[101] Theft from bike
[102] Street robbery
[104] Burglary (both attempted and successful)
[105] Theft while on or in motorized vehicle
[106] Motor vehicle theft
[107] Shoplifting
[108] Pickpocketing
[109] Violent threat
[110] Abuse
[111] Vandalism
[117] Robbery