Can KD-tree introduce bias to nearest neighbor search?

Question

When performing k-nearest neighbor analysis on a large dataset, using a kd-tree algorithm can greatly speed up the search. I've tried researching this question, but have not found an answer - can the use of a kd-tree introduce bias into the nearest neighbor search? Does the splitting that kd-tree performs mean that certain groups get excluded from being nearest neighbor?

Follow up questions: If there is potential to introduce bias, how can you test whether it is actually occurring?

Are there other algorithms that are better than kd-tree for large datasets?

Many thanks

beuhbbb · Accepted Answer · 2020-02-05T07:24:51.727

Standard search procedures using kd-tree structures to estimate the k nearest neighbors compute the exact list of k nearest neighboors (NN). In other words, you get the same result than those given using a (time-consuming) exhaustive search. So, in principle, there should be no bias due to the use of kd-tree to solve the NN problem.

Nevertheless, some approximate algorithm relying on kd-tree-like structure also exists (e.g. best bin first) but to my knowledge are not the "standard implementation".

In particular conditions (searching NN sequentially for similar points), the kd-tree structure can be extended with a caching structure to speed-up the process while remaining exact (e.g. cached kd-tree).

Can KD-tree introduce bias to nearest neighbor search?

1 Answers1