Criminal Class – Research and Technology

See the complete project site here.

Research and Technology

This project was for LIS 697-07, Geographic Information Systems, taught by Jeremiah Trinidad-Christensen. This project fulfills the Research and Technology requirements. The goal of the project was to find public data, interpret and refine the data, and create a product that illustrates insights from the data using GIS tools and methods. I chose to look at criminal justice data on drug arrests in the United States to look at differences in arrest rates by racial group and compare that to drug use data to see if there were any significant discrepancies.

My Role: I was the sole creator

Description and Methods:


Finding Data

On the instructors advice I looked for arrest data from the Uniform Crime Reporting project of the Federal Bureau of Investigation, as a dataset provided by the National Archive of Criminal Justice Data (hosted by the Inter-university Consortium for Political and Social Research at the University of Michigan).


2010 Arrests by Race:

2010 Arrests at the County-level:

For drug use data came from Substance Abuse and Mental Health Services Administration 2008-2010 estimates on substance abuse study.

2010 substate estimates:

Shapefiles for the state- and county-level arrest rate and drug use maps came from the US Census TIGER geodatabases:

The shapfile for the substate section-level drug use maps came from the Substance Abuse and Mental Health Services Administration (SAMHSA)


Interpreting the Data

Getting the data ready for the project took a great deal of work, mostly to do with the UCR data. The census data was pretty much set to go from the beginning, I only had to set a few of the columns to text so that leading zeroes were not automatically deleted, this was particularly important for the county FIPS column. The drug use data was also relatively easy to use, I only had to clean it up and simplify it. Since SAMHSA provided their own shapefiles, joining the data and shapfiles was a simple task. Most of the data work I did with that data was eliminating data I did not need from the table, such as data on mental health issues, prior to joinging the table to the shapefile.

The UCR data presented a series of problems starting with downloading the data. I originally downloaded the county-level data for 2010 ( ) that was only available in text format or for use with SAS or SPSS. I have no experience with either of those statistical programs, so I downloaded the text version. The file was so large I had trouble opening it on my computer, so I had to split the file with a splitting program and open it as pieces. After doing that and eliminating much of the data on arrests I did not need (for things like arson, murder, theft, etc.) I rejoined the files and started working with the data. This was when the limitations to the data became more apparent. In the accompanying documentation, the study creators note that the UCR data is often estimated from data reported by local law enforcement agencies and thus is not necessarily a complete record of arrests throughout the year. Also, some reporting agencies do not participate in the reporting process, leading to empty values or values that are estimates based on reporting from one or two months. A further issue is that in some cases reporting agencies cover more than one county, as is the case in New York City, or the reporting issue was a state agency, as in the case of Alaska’s boroughs. In that case the data was attributed to a single county and users must weight the data by population to represent the counties that are noted as not available values. One final problem was that the UCR uses different geographic identifier numbers for states and counties than the census, so those needed to be normalized.

The UCR and SAMHSA data sets both had lots of extra data so my first step was to delete the extraneous columns in Openoffice calc. I used the codebook from these studies to tell which columns were the ones I wanted. Then I used Openrefine, a data cleaning program, to filter down to the data I needed. For the state UCR data this involved getting only drug arrests, and drug possession arrests. The UCR data was separated by agency, not county so the subtotal function in Excel to subtotal the arrest data by county or state. Using excel macros and a linkage file offered by NACJD, study number 2565, I converted the UCR county numbers to FIPS county numbers so I could match them up with Census data. The macro I used is a complex find and replace script that searches two columns of numbers (the corresponding UCR and FIPS numbers from the linkage file) and replaces the UCR number with the correct FIPS number.

 To calculate the arrest rates by 1,000 people I used a formula I got from the NACJD site (# of arrests/# of people)* 1000. For the state level data, I did this using census racial data and formulas in Excel to calculate the arrest rates for whites and blacks. For the county-level data I calculated how big a percentage of all arrests were for drug possession, by (# of drug possession arrests/# of arrests)*100.


With the data clean, I created csvt files for each data file and loaded them into qgis. I loaded in the shapefiles (either the TIGER files from the Census or the substate shapefile from SAMHSA). I converted the shapefiles to the WGS84 CRS by saving them as different files, because I intended to use the maps in Leaflet, which requires that CRS. Later, I realized I also should have simplified the shapefiles, which I did using the simplify geometries tool. Then I joined the shapefiles with the tables by the matching id numbers. I then generated geojson files for the state and substate files and a static map for the county-level data.

For the county-level map, I looked at how much time is spent on drug possession arrests versus other kinds of arrests. I colored the counties according to the percentage of arrests that were for drug possession.

For state and substate data I used a leaflet tutorial on making an interactive web map and modified it to for my needs. I colored states by the disparity in arrest rates between whites and blacks, or by overall drug use (substate regions were also colored this way). 

Drug Arrests by state 2010

This map shows the drug arrest rates in the US in 2010 by state. The states are colored according the to the difference of arrest rates between blacks and whites per 1,000 people.

States colors get darker to show an increase in the chance that a black person will be arrested as opposed to a white person. See the map in full size here.



Making the maps made several things obvious. In every state blacks have a higher chance of being arrested for drug crimes than whites, sometimes by several degrees. This is despite fairly widespread drug use across the US and across racial groups. Considering how much even an arrest, much less a period of incarceration, can negatively effect a person’s job, education and housing prospects it is hard to not view this as the author Michelle Alexander puts it, “The New Jim Crow.”

I chose to use a simple layout for the web maps to focus attention of the states and their colorings. I wanted to leave Canada and Mexico in to provide perspective but kept them gray. The dotted lines for state boundaries were included in the template from leaflet, but I liked the extra definition they gave to the states. I also liked the hover over details. I chose the red (arrest rates) and blue (drug use) colors purposefully to give negative and positive feelings. This project is making a particular political argument, though I did not skew the data to do so, and I felt like using a more neutral color like purple would be pretending at a degree of impartiality that I neither feel nor want to express in this case.

The county -level map was surprising in some ways. Some of the counties with the most drug possession arrests were very small border counties, particularly Harris and Jim Wells Counties in Texas and Edgecombe County in North Carolina where they were arresting more people than technically lived in the counties. I assume that is because of drug trafficking, but it would be interesting to look more into that issue. The metropolitan county with the largest percentage was Baltimore County in Maryland which had around 80 percent of their arrests as possession arrests.

Objective: Research

I achieved the requirement of research in this project by collecting, refining and interpreting the data for this project and created a visualization that demonstrates critical thinking about what the data says about criminal justice politics in the US. I also carried out research related to the topic both about how to present it and historical background for the prison-industrial complex.

Objective: Technology

This requirement was fulfilled by the development of the maps for this project. I used several digital tools to refine the data used and to create the maps and the site they live on. I learned some basic javascript, used some open digital tools, and built a site to present and house the maps to better communicate the information gathered.