0

I have a list of coordinates for where different people live over an eight-year period. They are repeat cross-sections of populations served by several county agencies for free workforce training for low income populations. We suspect (and basically know) that, as the city has gotten more expensive and gentrified, lower income folks are moving further from the city and into various other areas further from service providers.

I could track the overall walk or changes by tracking the "center" of those served on a yearly basis, but how do I get something like a variance or standard deviation of that point in two dimensions? Is there an agreed upon measure of two-dimensional dispersion that I can use to measure this? Or would it have to be two values (dispersion of X and dispersion of Y)? Is this going to be sensitive to changes in sample size on a yearly basis?

dcoy
  • 337
  • 1
    https://stats.stackexchange.com/questions/13272/2d-analog-of-standard-deviation might be helpful – Henry Apr 19 '23 at 00:05
  • If your worry is distance from service providers, one option which might be easily understood could be average distance (Euclidean or road or travel time or something else) to the nearest service provider rather than to the centre of population – Henry Apr 19 '23 at 00:08
  • Thanks, that existing post is exactly what I was looking for. For those wanting a summary, they are recommending Euclidian distance to the center or the group centroid for each point to get one SD calculation instead of one for X and one for Y. I'm not sure how I missed it, as it is using similar language. I assumed we'd be doing what you described with the distance to nearest provider, but there is not a complete account of locations for all agencies in the early annual cross-sections. – dcoy Apr 19 '23 at 02:04

1 Answers1

0

There is a book by Neft (1966) Statistical analysis for areal distributions. He suggests in Chapter IV, and I agree, that the "harmonic mean" of an areal distribution is ideal for describing its central tendency, for a variety of reasons.

If you have a set of data points (coordinates) indexed by $i=1,\cdots ,P$ the harmonic mean center is located at position $j$ where the following expression is minimised:

$$ H_j=\frac{1}{\sum_{i=1}^P \frac{1}{r_{ij}}} $$

with $r_{ij}$ being the great circle distance between coordinate $j$ and point $i$.

This has some benefits over using Euclidian distance based measures of center

  • it's less sensitive to outliers, as the Euclidian distance-based center is based upon the square of distances
  • it will always be based within the area of study, even if the area of interest is concave
  • if you are spread over a large area, you will correctly account for the curvature of the earth

The value at the minimum, is the measure of dispersion you are interested in.

You should be able to calculate this manually, either by a) using a two-dimensional optimiser built into whatever software you are using, or b) generated a grid of points, calculating $H_j$ at each point, and then finding the minimum.

Alex J
  • 2,151