I have a number of large geodataframes and I want to automate the implementation of a Nearest Neighbour function using a KDtree for more efficient processing. The process I want to achieve here is to find the nearest neighbour to a point in one dataframe (gdA) and attach a single attribute value from this nearest neighbour in gdB.
This is effectively the function performed by the NNjoin package in QGIS. The code written below (with help from this link GeoPandas: Find nearest point in other dataframe) succeeds in generating the same distances as NNjoin, but unfortunately does not return anything like the same attribute values.
OPTION 1
from scipy.spatial import cKDTree
def ckdnearest(gdA, gdB, bcol):
nA = np.array(list(zip(gdA.geometry.x, gdA.geometry.y)) )
nB = np.array(list(zip(gdB.geometry.x, gdB.geometry.y)) )
btree = cKDTree(nB)
dist, idx = btree.query(nA,k=1)
df = pd.DataFrame.from_dict({'distance': dist.astype(int),
'bcol' : gdB.loc[idx, bcol].values })
return df
SW_closest = ckdnearest(gdA, gdB, 'attribute_value')
OPTION 2
from sklearn.neighbors import KDTree
def ckdnearest(gdA, gdB, bcol):
nA = np.array(list(zip(gdA.geometry.x, gdA.geometry.y)) )
nB = np.array(list(zip(gdB.geometry.x, gdB.geometry.y)) )
btree = KDTree(nB)
dist, idx = btree.query(nA, k=1)
idx = idx[:,0]
index = idx.flatten()
df = pd.DataFrame.from_dict({'distance': dist.astype(int),
'bcol' : gdB.loc[index, bcol].values})
return df
SW_closest2 = ckdnearest(gdA, gdB, 'attribute_value')}
BallTree, I think you can changeBallTreetoKDTree, I haven't tried it. – sutan Aug 26 '20 at 03:02