35

I am involved a little in both Vermont House redistricting and Burlington City Council redistricting.

I have noticed, at least two census blocks that the Census is clearly over reporting population. Both are visible using District Builder in this map of Burlington Vermont.

One census block is Block #16005, which is part of the University of Vermont "Green". Absolutely no buildings, residential or otherwise, and the Census reports 20 people.

enter image description here

The other is Block #4460, which is the Ethan Allen Homestead with one caretaker (and perhaps a spouse or partner) living on the property. Yet the Census reports it as 12 persons. I know, for certain, that it's false.

enter image description here

I presume they are transferring counts from neighboring blocks so that overall counts for cities, counties, and states, are still correct. I imagine they are trying to protect some private information by skewing these numbers. But is that the reason they are doing it? I really don't fully understand why they are doing this.

Has anyone else noticed something like this in their area? Does anyone know the skinny on it?

There are other really silly blocks that they have carved out, but that question can wait for later.

Rick Smith
  • 35,501
  • 5
  • 100
  • 160
  • 4
    I do Census work for local government. The silly shapes is just how the Census divides up the areas, it is every street centerline, admin boundary, geographic boundaries (rivers) all combined together and then used as the block boundary, which leads to tiny areas or areas with no people. A household is very unlikely to straddle one of these boundaries, though I can think of an exception in my own city. – RomaH Dec 28 '21 at 21:33
  • Oh no, @RomaH , I haven't even gotten into the silly shapes yet. Up in the UVM Redstone Campus there are some extremely silly shapes. And there are all sorts of small tracts with zero population that are assigned a census block. – robert bristow-johnson Dec 29 '21 at 05:49
  • 4
    relevant: https://www.slowboring.com/p/census-privacy – eps Dec 29 '21 at 15:07
  • Thank you, @eps . It does appear to be quite relevant. – robert bristow-johnson Dec 29 '21 at 16:16
  • If you are actually involved in the official redistricting, you should reach out directly to the Census Bureau. They may have additional data products that you can use with more detailed data, subject to a confidentiality agreement and appropriate controls. – Him Sep 29 '23 at 14:06
  • 1
    Well @Him, I was actually involved in redistricting. But the maps have been drawn for more than a year. I know that city and state governments can get confidential information from the census that private schlubs cannot. – robert bristow-johnson Sep 29 '23 at 14:24

1 Answers1

60

The Census Bureau practices disclosure avoidance to, as you suspect, protect the privacy of individual respondents. The Census Bureau wants to ensure that a consumer of the data it produces cannot de-anonymize the data they publish. They use a variety of techniques to accomplish that including both data swapping and noise injection either of which could presumably account for the issues you are seeing.

The linked page on disclosure avoidance contains a number of papers and other resources that go into much more detail about the different techniques that are used if you want to get into more detail.

Justin Cave
  • 6,203
  • 32
  • 29
  • 1
    This doesn't explain why they'd inject noise into a block with zero residents, unless the absence of buildings doesn't imply zero residents (i.e., there may be homeless encampments) or you're suggesting what might be considered a bug in the application of noise. – Tech Inquisitor Dec 28 '21 at 18:54
  • 7
    @TechInquisitor - If you're injecting noise or data swapping, you need the aggregate results to remain correct. So it is entirely possible that they removed 20 people from a neighboring block and they just happened to appear in the vacant block (I don't know that they have special rules for handling completely empty blocks). Of course, it could be that there were some homeless people there or that someone entered their street address incorrectly and the mapping software put them in the vacant block. – Justin Cave Dec 28 '21 at 19:01
  • 13
    There are publicly available papers that describe the methodology and more if you keep digging, @TechInquisitor You should not be able to exactly explain why there are 20 residents there in any particular block because the noise injection should be random and more-or-less unexplained otherwise it isn't very good obfuscation. BUT, the aggregate data should still be correct, useful, with a known level of noise or error. – RomaH Dec 28 '21 at 19:14
  • 12
    @TechInquisitor The homeless encampments is absolutely a good reason to do it: otherwise census data could be used to track down illegally camping homeless people by looking at only the ones with > 0 population! – Joe Dec 28 '21 at 19:38
  • Putting 20 people in an empty block wouldn't have any value in hiding data. Thus I would suspect that it's homeless people. – Loren Pechtel Dec 29 '21 at 02:12
  • 2
    @LorenPechtel As Joe suggested, a general practice of reporting both empty and almost-empty blocks as if they had a small number of people hides which of these blocks are really empty and which are only near-empty. If you report parks where you found no one living as zero and where there was someone living as 20, then the data leaks exactly which parks have homeless people living in them. – Ben Dec 29 '21 at 04:01
  • @LorenPechtel , I can absolutely guarantee that there are no homeless encampments in the UVM Green. It makes no sense to me that the census tries to tell us that there are 20 people living there. – robert bristow-johnson Dec 29 '21 at 05:53
  • 12
    @robertbristow-johnson: somehow marking which empty blocks are safe to report as empty would require extensive knowledge of the blocks, would probably be politically fraught, and would defeat the purpose entirely. Any block not marked as "safe" could be assumed to have people living in it; further: any block removed from the "safe" list would certainly have people living in it. In both cases, the Census's greater interest in anonymizing data precludes the existence of such a list. Thus: protecting privacy requires the Census to report the UVM Green having about 20 residents. – minnmass Dec 29 '21 at 09:26
  • @robertbristow-johnson, the Census is not trying to tell you that 20 people live on UVM Green. It is trying to tell you that there are zero or more people living on UVM Green, and that the actual number is within the margin of error allowed for that block. – Christopher Harwood Dec 29 '21 at 18:32
  • 1
    Sorry @ChristopherHarwood, I am totally unconvinced by that explanation. When we assign blocks to Districts or Wards using District Builder, we are assigning 20 people to a ward and our wards must be within 10% of each other lest we risk a judge slapping us down. I think that the "disclosure avoidance" needs to have more thought put into it. – robert bristow-johnson Dec 29 '21 at 19:26
  • 2
    @robertbristow-johnson A lot of thought has been put into it - this is a very active debate for statistical agencies across the world who publish data of this kind. Unfortunately there's no perfect solution - tradeoffs have to be made between data quality, computational difficulty (some methods require solving tough optimisation problems), and being able to guarantee a certain level of confidentiality protection. 1/2 – GB supports the mod strike Dec 30 '21 at 06:35
  • 3
    @robertbristow-johnson Another consideration is that even without disclosure control in play, data at this fine detail tends to be unreliable. If somebody fails to respond to the census, or a new apartment block gets built on what was previously empty land, 2020 counts can be wildly off a couple of years later. Local knowledge can help mitigate that, but it gets very labour-intensive if you're doing this in bulk. 2/2 – GB supports the mod strike Dec 30 '21 at 06:39
  • @GeoffreyBrent, "guarantee a certain level of confidentiality protection", but when an enumerated person becomes 72 years old, the Census Bureau (and the National Archives) discards that 'confidentiality protection'. Looks like the protection has an expiration date. – BobE Jan 12 '22 at 19:14
  • @BobE I don't think that's quite right. I'm not in the USA, but my understanding US Census keeps personally identifiable information confidential for 72 years after collected, not until the subject is 72 years old. So the earliest possible release of PII about a person would be when they turn 72 - but that would only allow release of data collected when they were zero years old. The most recent 72 years of their life would still be protected... 1/2 – GB supports the mod strike Jan 14 '22 at 01:26
  • @BobE A lot of countries have similar provisions allowing release of PII a long time after collection; the rationale is that after many decades have passed it's less likely to still be sensitive, and it's valuable to historians since it contains info that's passed out of living memory. Different places have different rules on how long the period is, and on whether consent is required. Statistical work very often requires finding a middle ground between maximising use of information and protecting privacy. – GB supports the mod strike Jan 14 '22 at 01:32
  • Where I lived in 1950, my parents names, my siblings names , all information in the 1950 census have been considered "protected" for the past 72 years. However, in April of 2022 that PPI is no longer protected. Is it because I'm old >72 that my PPI becomes available to the public? (Was I expected to die?) I'm simply pointing out the inconsistency in US. As for statistical work, I have serious reservations as to the usefulness of my name and the names of my family members ---- 72 years ago. – BobE Jan 14 '22 at 03:48