NICAR 2017
Jacksonville, FL
City Terrace 9
Friday, March 3, 3:30 - 4:30pm
Alexandra Kanik (@act_rational)
Dive even deeper into data mapping in this continuation of QGIS I. This class will cover joining data tables to maps, aggregating point data for easier analysis and preparing geographic data for display online. It is strongly recommended that you take QGIS I immediate preceding this session, but it is not a strict requirement.
Here are the notes from the first QGIS session.
Here's a list of other QGIS classes happening at NICAR17.
More often than not, you're going to get more data than you want to use. For example, we've got county data for the entire United States, but we're just looking at Florida accidents.
Instead of making QGIS process that huge shapefile every time we make a change, let's filter out all the counties outside of Florida.
Open a new map. Go to Project > Project Properties
and check Enable 'on the fly' CRS transformation.
Add US counties. Add vector layer tl_2015_us_county.shp
to the map.
Filter. As with most things in QGIS, there is more than one way to filter a shapefile. I'll show you my two favorites.
Right click the tl_2015_us_county.shp
layer and select Open Attribute Table.
In the top menu bar of the attribute table pop up you should see an icon that looks like an E on top of a yellow square. . Click that.
You should get another popup with some drop down options and a large textarea. Click on Fields and Values. What drops down should look familiar to you: it's all the available fields on this shapefile. We're only interested in the first one though, STATEFP. Give that field a double click and you should see it appear in the textarea to the left. Then, type an equals (=) sign and number 12. The whole equation is STATEFP = 12
Click "Select".
PROTIP: If you're trying to filter by a field but you don't know all the values available to you in that field, QGIS will let you know.
QGIS is a subtle beast. Clicking "Select" seemingly does nothing, but if we close out of this popup and the attribute table popup, and zoom into Florida on our U.S. counties shapefile, you'll see she's all yellow! That means we've selected all the counties in Florida.
Now we want to save this selection as its own shapefile, so right click the U.S. counties shapefile again and select "Save as...".
tl_2015_fl_county
because that reminds we that I got this original counties shapefile from the Census Bureau (tigerline) and that it's now a Florida-specific counties shapefile. Naming is important.Now we have just the state of Florida as its own shapefile.
Right click the tl_2015_us_county.shp
layer and select "Filter".
You should see a very similar popup to the one we had before. And we're going to do just what we did last time, enter STATEFP = 12
.
With this dialog box we get the option to test our filter, so go ahead and click "Test" once you've entered the query into the filter expression textarea.
You might have received an error message that looks like this:
Sometimes QGIS is a jerkface and it imports numbers as text. All we've got to do to fix this error is add some double quotes around the number 12. Now when we hit "Test" we should get another popup that tells us that "The where clause returned 67 row(s)". This information should make us very happy because there are, in fact, 67 counties in Florida.
Click "OK" and you should see that what was once the U.S. is now only Florida. However, if we removed our tl_2015_us_county.shp
layer now and added it back, we'd see that it was once again the entire U.S. We still need to save this as its own shapefile.
Again, as we did before, right click the U.S. counties shapefile and select "Save as...".
tl_2015_fl_county
because that reminds we that I got this original counties shapefile from the Census Bureau (tigerline) and that it's now a Florida-specific counties shapefile. Naming is important.Now we have just the state of Florida as its own shapefile.
Last session you saw how to join county accident data to your shapefile. But what if you don't have that county-by-county accident data? Never fear! We can create that data right in QGIS using Points in Polygon
.
Let's check out two different takes on aggregating this accident data.
Click Vector > Analysis Tools > Points in Polygon
Once you have all that set, click OK.
A new shapefile should have been added to our map. If we open up its attribute table we should see our new column (PNTCNT) at the end. The value in this column corresponds to the number of accidents in each county.
The above analysis is good if we want to see our data in terms of accidents, but what if we want to visualize it in terms of deaths?
Let's go back into that Vector > Analysis Tools > Points in Polygon
section.
Once you have all that set, click OK.
Open the newly added shapefile attribute table. You'll see we've now got two additional columns: PNTCNT and FATALS_sum.
At this point, we could export our data as a CSV, bring it into an editor like Excel, and calculate things like percent of accidents that involve drunk drivers. Buy go through all those steps. Let's do it all in QGIS!
Open a new QGIS map
Add the Florida counties shapefile you created earlier
Add the accidents data
Filter the accidents data to include only accidents that involve a drunk driver
Save filtered data as its own shapefile called dd-accidents.shp
. Make sure to add it to the map when saving.
Clear the filter on your original accidents data so you now have two accidents shapefiles on your map: all accidents and drunk-driving accidents.
Follow the Crash-Level Aggregation steps above to create a new Florida counties shapefile that counts total accidents per county:
accidents-in-county.shp
Create another accident aggregation shapefile, this time using the newly create accidents-in-county.shp
and the dd-accidents.shp
accidents-in-county.shp
dd-accidents.shp
dd-accidents-in-county.shp
Open the attribute table of the shapefile we just created - dd-accidents-in-county.shp
. It should have two columns at the end: totalCNT and ddCNT.
Click the Open Field Calculator button
Create a new field that calculates the percent of accidents that involve drunk drivers by county:
("ddCNT" / "totalCNT")*100
Click OK.
When you open the attribute table on dd-accidents-in-county.shp
, you should another custom column: percDD
And now let's talk about everyone's favorite mapping topic: projections. Some year maybe we'll get someone really smart to teach a session on projections alone. But this is not that year.
Today, we'll just go over the practical side of projecting. The stuff that'll help you stay sane as you work with data in QGIS.
On-the-fly projecting is QGIS's way of trying to make your life easier. It allows you to add shapefiles of different projects to the same map and they appear inline. HOWEVER, when you're trying to do proximity analyses, like calculating distance or area, you're gonna be in a world of pain if you rely on on-the-fly projecting.
So let's see how you project shapefiles correctly in QGIS.
tl_2015_fl_cnty_UTM16.shp
26916
NAD83 / UTM zone 16N
You just projected that Florida shapefile into a new projection!
Repeat the above steps for our accidents data. Change the filename of course.
You'll see that when QGIS adds our new shapefiles to the map, it doesn't appear that anything has changed. But trust me, it has. You can't see it, but the way that these two shapefiles related to each other is vastly different from what it was before we projected the two layers.
Here's a pretty good read on projections: what the differences are, common problems, how to work with them and more.
So, back to some fun stuff. Just before, we did a county-by-county analysis of accidents. But counties are kind of an arbitrary unit of measure when we're talking about density of accidents. As you can see, some counties are much larger than others. That is likely to influence location-based analysis, i.e. larger counties will obviously have more accidents. So let's look at a unit that is more equally balanced.
Install MMQGIS Plugin. First, let's install a plugin that will help us create the hex bins: MMQGIS.
Plugins > Manage and Install Plugins...
Add shapefiles to new map. Next, we need to add our projected shapefiles to a new map. Create a new map and add our projected Florida counties shapefile and our projected Florida accidents shapefile. This analysis will not work if we don't use our PROJECTED shapefiles.
Create hex bins. Select MMQGIS > Create > Create Grid Layer.
7734
for our Y Spacing. The X Spacing we can leave. It will follow suit.florida-grid.shp
Perform points in polygon. This time, we're going to use the grids shapefile as our Input polygon vector layer.
Vector > Analysis Tools > Points in Polygon
florida_grid.shp
florida-accidents-in-bins.shp
Filter out bins without accidents.
florida-accidents-in-bins.shp
"PNTCNT" > 0
Color-code bins so we can see what's up!
For most interactive maps, you're going to want to save your map data as a geojson
file. I usually display interactive mapping data in Leaflet.js.
geojson
file extension if you set Format to GeoJSON.WGS 84
. This has to do with the default projection of mapping programs like Leaflet.Simplifying geometries is sometimes necessary when you're preparing your geographic data for display online. Overly-detailed data can cause programs like Leaflet.js to load suuuuuuuper slowly.
There are ways of doing this in QGIS, but honestly... using MapShaper is much easier and more efficient. MapShaper lets you see how your adjustments are going to affect the quality of you shapefile in realtime.