I want to know which is the computational cost of the $\texttt{knnsearch}$ algorithm in Matlab. $\texttt{[~, distX] = knnsearch(X,X,'K',N,'Distance','chebychev');}$ where N is length of X.
1 Answers
Consider this post on SO: https://stackoverflow.com/questions/23175491/can-someone-tell-me-about-the-knn-search-algo-that-matlab-uses Where it states that: this algorithm is O(log(n)) rather than O(n^2) as THE OP in this post wrotes its own knn search algorithm which does not performed well. Unfortunately, the contents of the MEX file from matlab are not available for inspection to confirm that the power is O(log(n)). However this post, is a little bit old, but maybe there is now a newer information.
So it should be O(log(n)). But we dont know if this is in the best, worst or average case.
Update
We stick it together:
Update 2: It has to be O(n) * O(n²), but this doesnt change the O(n²*1, see later) as the length of the outer loop is fixed to length of 10 and is not a high n. So the end result is still correct.
1.) your for loop is indeed O(n) Imagine if the knn were at its worst O(n²) we had O(n) * O(n²) which would result in O(n * n²) where values of e.g. n = 10 for your outer for loop would result in O(10 * 1000000). The 10 is no longer of interest. Thus, your program is not so dependent of your for loop, but more on the knn algorithm itself, if you stick to higher values of n.
2.) In matlab it states [these parentheses show my own addition] that:
The default value [of the knn algorithm in matlab] is 'kdtree' when X has 10 or fewer columns, X is not sparse, and the distance metric is a 'kdtree' type. Otherwise, the default value is 'exhaustive'.
As your X is determined by N, you would use an exhaustive search.
Exhaustive search equals Brute Force: https://ijarcce.com/wp-content/uploads/2012/03/IJARCCE9A__a__Arshu__comparison.pdf
The computational cost for brute force is:
Training time complexity: O(1)
Training space complexity: O(1)
Prediction time complexity: O(k * n * d)
Prediction space complexity: O(1)
From: https://towardsdatascience.com/k-nearest-neighbors-computational-complexity-502d2c440d5
Where:
- n equals your X (as we see in the comments: X = randn(N(i),1)),
- k, as you state in your comments, equals n
- and d was 1 as you also stated in the comments. Thus, we got O(n * n * 1) your brute force knn equals O(n²) and with greater for loops the n in the loop won't matter anymore.
If I'm wrong about X, as you stated you have length of N, but this is only in your for loop... and you dont have n² (maybe with higher n to check), you could calculate the computational cost from kdtree instead if your X <=10. That would result in more n complexity.
Training time complexity: O(d * n * log(n))
Training space complexity: O(d * n)
Prediction time complexity: O(k * log(n))
Prediction space complexity: O(1)
Thus the only question! you should answer is, if you really have brute force or kd-type. The for loop has no essential role. If you can figure out your method in matlab or set it directly in your call, than you should know.
- 1,658
N = [10,25,50,100,250,500,1000,2500,5000,10000]'; t = zeros(length(N),1); for i = 1:length(N) X = randn(N(i),1); tic; [~,distX] = knnsearch(X,X,'K',N(i),'Distance','chebychev'); t(i) = toc; end plot(N,t,N,N.*(N));
– Massimiliano Romana Mar 27 '21 at 19:39