6

You go to BLS.gov. Click on any one of the links up on their menu that you fancy - I specifically am interested in Consumer Expenditures or disposable income...potential demand..things of those nature...or almost anything else. You notice the level of detail many of their variables drill down to....at most...sadly only the county and most likely only the state level.

Well, I want that data for each zip code. I know paid products that can do it (which somehow seem to derive their information from bls.gov for the current year...confusing me quite a bit), but what about fairly recent (I can't handle the option of only having data from 2007 for instance), high quality, open data sets for economic factors like these here?

I suspect any open governmental dataset like this would involve using the TIGER files (although I checked there and they seemed pretty limited) - which if you can find would still be useful to me but....just not match up very well to the actual zip codes I'm forced to use in my project.

USPS zip codes would be great if there's a DB of that somewhere that's relatively recent.

Anything commercial (besides esri or arc) that does this?

Open to anything you got. Also I don't want shapefiles - I'd need to have it as just a cvs or even a text format or some derivative thereof I can import into excel.

fgregg
  • 5,108
  • 16
  • 37
Taal
  • 415
  • 3
  • 12
  • 1
    I can think of two likely reasons for how someone is able to post CPI index or Consumer Expenditures numbers at the zip code level: 1. They are actually posting a number aggregated from a higher level 2. They are using predicted measures based on a model (so a model of numbers also based on a model). This is probably your best option for CPI broken down by area to some extent. To be honest, I am not sure why anyone would want CPI broken down further than the MSA level. There are other economic indicators broken down by ZIP5 or smaller if interested. – Kotebiya Sep 08 '13 at 12:04
  • Ah I know, and and I suspected there may be some behind the scenes calculations going on for many things - and who knows how accurate they are at times. I actually did find some other resources for it in open data for zip code (I believe it was the advanced fact finder on the census website..but the report kept crashing before it'd download. – Taal Sep 08 '13 at 12:13
  • I understand you're thought about the CPI thing and agree, really I knew what those acronyms meant, I just saw data and was like "gimme." In my blurred frenzy of just trying to get data I forgot the meaning of that acronym for some reason but it would of been one of many that I wanted. Also many people I think believe that (unless its CPI data) there is a certain "homogeneity between neighboring zip codes mentality. That they should be generally the same. I had pondered and researched this question myself over many different random variables - one was alcohol consumption (I used commercial – Taal Sep 08 '13 at 12:18
  • ata. I found out that the zip code right next to mine drinks FIVE times less alcohol than mine does. As our populaton # and density are about the same. I did this across all sorts of statistics that couldn't be generalized like political views or probably religious beliefs...or race. I used the granular ones - and when you use those (well even with broader ones this seemed to happen) you start to see almost unexplainable differences between zip codes even in extremely rural areas...even if you go to the county level! those same things are still there. Data was from paid db source tho. – Taal Sep 08 '13 at 12:26
  • I should of just added this into the question by now...but rsearchers can also use "PUMS" data to achieve something like this. It is from what Iunderstand a survey coonducted on 2 million americans in as diverse and random fashion as possible. It asks them a tremendous amount of questinos all of which the answers are recorded verbatim in the PUMA files you can download off the census website. Be careful opening them tho as they each contain 1,000,000 rows. And 150-200 columns of data each. The amount of data in those is overwhelming actually, but you could do some intense analysis with it. – Taal Sep 08 '13 at 12:43
  • And one last thing...with these PUMS files each row one has a "pums area associated witht hem that is somewhat loosely corrected with a zip code as you can see here: http://gothos.info/wp-content/uploads/2011/03/nyc_zcta_to_puma.png and at his blog. – Taal Sep 08 '13 at 12:47
  • The Public Use Microdata Sample Areas are divided by a minimum population threshhold of 100,000 in 2000 (It will be in 2010 for the 2012 release). That is almost certainly going to be a geographic extent much larger than any zip code. And it was 5,029,145 people that were surveyed in 2011, but they only publish an amount approximately equal to 1% of the population per year of coverage in the PUMS. You can, however, get housing costs (rent/mortgage/taxes/bills) to the extent you are looking for. – Kotebiya Sep 08 '13 at 13:07
  • @Kotebiya, you should post your original comment about modeling as an answer. – fgregg Sep 08 '13 at 13:32
  • @Kotebiya I have the two files and they are two million rows total. Your page says unweighted...perhaps mine are weighted...I'm not sure although I did see what looked like to be several weights at the end. I am confused as to how the files I'm looking at now contradict what you're saying (I see all the variables for all the rows) - unless they are just 99% based upon past years back til the 1920s...when the PUMS hadn't even been created yet. I could not find that 1% figure anywhere, could you point me to a link? Also what is your explanation for the map I posted in my comment? – Taal Sep 08 '13 at 14:04
  • Hre try this http://factfinder2.census.gov/faces/nav/jsf/pages/searchresults.xhtml?refresh=t - this may work for you guys - it crashes for me. Try to set the geography to zip code. – Taal Sep 08 '13 at 14:16
  • 1
    The magical document. This document explains that the sample was filtered down to one percent of the population (p. 3). The number of persons in the sample in 2011 was 3,145,802 (p. 1). I am going to take a guess and say that you are importing the ACS PUMS into excel which has a row limit of 2^20 (aka 1,048,576 rows). Also, I just checked AFF and it is probably having some technical difficulties at the moment. – Kotebiya Sep 08 '13 at 14:53
  • Nice find, yup I remembered it hitting the row limit...I guess I had assumed they just fit it to that many - lol. I think I misinterpreted you originally actually - in that I thought you meant for some reason 1% of the people surveyed would be included in the new PUMS. – Taal Sep 08 '13 at 14:56
  • Well, they're making a sample of the population...and hopefully their methodologies are diverse too (although seeking out too much diversity could cause the data to lose its ability to be natural - weird concept I know). Would you agree that the 1% of the population sample set could be extrapolated to get a reasonable average of what the rest of the US is like? As an aggreate, I could see it working fine.....looking at a raelly specific variable like spending habits...I'd have less confidence in. – Taal Sep 08 '13 at 14:59
  • 1
    That issue is precisely why they aggregate across 5 years. It entirely masks time-varying trends (such as an economic recession in the small area context). It is good for area-varying trends though, and maybe time-varying trends in the decadal context. However, people use much smaller samples to generalize to the larger population all the time. – Kotebiya Sep 08 '13 at 15:06

2 Answers2

5

I used some BLS data for a recent project. It took me a little while to dig through. But there are essentially 3 ways to get access to their data.

  1. Is from their 'downloadble' sets published as links from HTML. It sounds like you have got to this.

  2. Quandl have some of the BLS data curated. That have made it nicely searchable, filterable and available under REST.

  3. All of the data the BLS publish is here in raw format: ftp://ftp.bls.gov/pub/ the doc folder tells you what each of the sub directories in time.series directory is for. This is the overview. As far as I know this is totality of what they have published at the most granular level. Overview tells us this the cx file and the cx file tell us this.

The work is in curating this data. There are joins needed through structured keys. This is not hard, but you need to go through the process.

To the best of my knowledge, if the data is not here, the BLS don't have it. I would love to be proved wrong and shown where more granular data is.

CodeBeard
  • 426
  • 3
  • 6
  • I'll have to look into more - by curating the data...do youmean transforming it into zip codes?...which seems difficult. Yeah, the BLS did keep emphasizing time series which really made me angry. I was kind of able to get a piece of it out of the american facts finder at the census website and making sure to select zip at the geography...usually the app would crash tho. – Taal Sep 08 '13 at 13:53
  • By 'Curated' I mean turned into something more useful. Flattening out all the lookups etc. – CodeBeard Sep 08 '13 at 14:40
2

As @Kotebiya says in the comment, if the data is not published in the areal units you want, the only way to get the data in your target units is through a geostatistical model.

This is called "areal interpolaton" and it is a very difficult problem and an active area of research. This ArcGIS help page is a good place to start.

fgregg
  • 5,108
  • 16
  • 37