-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow text specification of states, counties #4
Comments
This is good. I like it. A rake task would be required to query each of the summary levels on a state by state basis and build out the dictionary with the responses. But this feature would also allow the gem to return human readable results:
Just a couple issues to think through:
|
I like how you structured the return object. You're right - gives the user more to work with.
{AIANNH: {short_name: 'American Indian Area', long_name: 'American Indian Area/Alaska Native Area/Hawaiian Home Land'}} and return the Parent-Independent QueryingI definitely want to be able to query objects independent of nesting, and it would be fantastic if the gem allowed this. However, I think that until the API itself can handle parent-independent querying, we should enable it only for objects nested one level deep. Those levels would be:
Enabling parent-independent queries for fields that are nested one level deep, e.g. ZCTAs (nested in STATE), we only have to know the 52 IDs for the states, running something like: (1..52).each { |id| @client.find("P0010001", "ZCTA5", "STATE:#{id}") } However, for multiple nesting levels, we would need to know all of the ids of every level above it, and querying, say, all block groups would return tens of thousands of objects. One more question here is how to look up, for example, a single state-independent ZCTA, as in: @client.find('P0010001', zcta: 02139) |
FYI I have the hash syntax (not the text lookup) for the |
Good stuff. I created another gem around the same time as this one, census_shapes, which imports the census summary level boundaries into a postgis database. There are a couple of files in there that might be of use here: Additionally:
state (2 digits) county (3 digits) census tract (6 digits) block group (1 digit) block (2 digits) Additionally, SD, CD, SLDU, SLDL and PLACE are under STATE. See the page 16 of the Census SF1 PDF In regards to creating the geography dictionary, I would probably write a script to create yaml, like us_states, for every summary level. The only additional data I would add, would be parental hierarchy. In fact, if I remember correctly, for the TIGER dataset every state has an SF1 file which serves as an index. That SF1 file contains a list of every geography in the state at every level. It's unfortunately not a csv, and not easily parsed, but I have some code somewhere that will do it. With that being said, it might just be easier to write a rake task that queries the census api to build the index with the results. |
Got the following message via email from github / @beechnut, but it didn't show up even though the email link brought me here. Pasting it in and commenting for posterity.
I spent a month trying to grok that damn sf1 doc. Don't worry about it. When I say write geography I mean an actual physical geographic entity. Summary levels are types of geography as determined by the US Census. So California is a 'geography', and the sumlevel is 040, STATE. On page 16, that diagram shows the relationship between all sumlevels from a hierarchical point of view.
Correct. Basically, I was just suggesting a way to do quick geographical look up - especially if we get fuzzy search in there - so you can find the proper geography before querying the census api.
Agreed.
Perhaps, the post didn't come through because of the above code?
860 is the sumlevel ID for ZCTA5 - Zip Code Tabulation Area. My understanding is that the way the Post Office assigns Zip Codes to new addresses is fairly organic, and therefore the boundaries for zipcodes are not very well structured and always in flux. ZCTA5's attempt to solve this by determining the majority zipcode for any given block, and then grouping blocks with the same zipcode into larger geographies, ZCTA5s.
Yeah, sorry, wish I would have remembered sooner. Been a while since I worked on this stuff. |
Just posted the actual comment -- I'd accidentally hit 'Comment' before I was finished. Thank you for the seemingly precognizant feedback! EDIT: Annnd now it looks like the actual comment didn't get posted. Ugh. |
Anyway, what didn't come through was, with a YAML file containing states and counties, it's not hard to search for nested geographies. ---
- name: Massachusetts
id: 25
counties:
- name: Plymouth County
id: 23
- name: Suffolk County
id: 25
- name: Worcester County
id: 27
... And to get the right ids (pseudocode): @client.find('P0010001', county: 'Suffolk County', state:'MA')
y = YAML.load(File.read('lib/yml/states_test.yml'))
state = y.select{|e| e['abbr'] == within.value or e['name'] == within.value }.first #=> Object for Massachusetts
county = state[level.key.pluralize][level.value] #=> 25 |
Yeah, I thought of this. My only concern was that not all sumlevels are hierarchal under states. But perhaps those sumlevels are just at the root of the document like state? If that is the case then we would want to add a type field,
To differentiate between the different root document sumlevels. But in that way you could have all the geographies in one document, which I am fine with. |
One year later, I'm returning to this, as I'm going to need this gem for a project at work in the near future. In that year I've gotten much better at Ruby, so I'm looking forward to contributing again. |
Let users indicate states and counties by name instead of numerical code, using hash syntax.
Also should accept symbol as a wildcard field name, plural field names, and multiple 'level' values as an array:
This will mean the keys will be upcased to become API URL parameter names. The values will be looked up in a hash and converted to digits for the URL parameter values.
When multiple geometry parameters need to be specified for 'in', I imagine the following:
The text was updated successfully, but these errors were encountered: