Fisher LDA - What is the difference between a discriminant function and a linear decision boundary?

Question

I am studying Fisher LDA, the case where there are K=2 classes of data.

It is my understanding that Fisher LDA looks for the 1-dimensional space onto which the data should be projected in order to maximize the ratio of the between class variance to the within class variance.

If this direction is a vector $w$, then how do I compute the linear decision boundary?

Many online resources write this as $y(x)=w^{t}x+w_{0}$.

If $w$, then isn't $w^{t}x$ the projection of a vector $x$ onto that line? And if so, what is $w_{0}$ and how can one solve for it?

I suppose my main question is this: is the decision boundary a function orthogonal to the direction vector that separates data points before projection, or is the decision boundary a point on a line through the vector of maximum separation $w$ that separates the projected points (and if so what is $w_{0}$?

Since what discriminates in LDA is the extracted discriminant variate, which is single in the 2-class case - the boundary, strictly speaking, is a point on the discriminant line: it is the point of zero discriminant score. However, you are in right to extend the point onto the parental p-dim. space of the p analyzed variables; that will form a p-1-dim plane perpendicular to the discriminant line (and coming through that zero point on it). "Perpendicular" because nothing discriminative is being added extra to the single discriminant function. — ttnphns, Jun 10 '18 at 16:23

ReneBt · Answer 1 · 2018-06-11T08:26:14.027

The dimensions of the decision boundary match the number of decision models you have. The reason $K-1$ models is common is that the $K^{th}$ model is redundant as it is the samples that have not been positively assigned by the previous $K-1$ models. One exception is if you want to have an invalid result where the data does not conform to the expected distribution of any group then you would need $K$ models and samples that fall outside this will have an invalid label. In such situations you would then use $K$ models. So for simplicity I will use the number of models $m$ below, which will typically be $K$ or $K-1$. $v$ will be the number of variables in a sample vector, $o$ the number of observations in the data.

$w^tx$ is indeed a projection of your function w into a vector x, and as it is the inner product of the two, its dimensions depends on the outer dimensions of each matrix/vector. For this to work the dimensions of $w^t$ must be $mxv$ and for $x$ the dimensions are $vxo$.

If $w^t$ is a vector, and $x$ is a vector the product will be a scalar, a single number i.e. if $m>1,o=1$ then you get a $mx1$ vector where each element is the scalar for each model. If $m=1,o>1$ you get an $1xo$ vector where each element is the score for model 1 for each observation. If $m>1,o>1$ then you get a $mxo$ matrix of scores for each model and each observation.

If $o>1$ i.e. x is a matrix, you would to replicate $w0$ to the outer dimension of $x$ which is the number of samples $o$, i.e. you apply the offset to each sample.

If $m>1$ i.e. w is a matrix, you would calculate $w0$ for each model $x$ which is the number of models $m$, i.e. you apply the individual offsets to each model.

$w_0$ is the offset you choose to tune false positives vs false negatives, e.g.through a ROC curve.

This is a useful resource https://www.google.com/url?sa=t&source=web&rct=j&url=http://www.cedar.buffalo.edu/~srihari/CSE574/Chap4/4.1%2520DiscFns.pdf&ved=2ahUKEwiqnM_ZjMbbAhUCLMAKHT_eCvQQFjAAegQICRAB&usg=AOvVaw0AvH7n8IK72weXR1ZtT6NT

Thank for that link, I am finding it very useful. I am under the impression that for $K=2$ classes measured in $m=2$ dimensions (as the graph in the link shows), that the hyperplane (a line which lives in 2-space) is normal to $w$, which is the weight vector computed with Fisher's ratio? If so it should make sense to me that setting $g(x) = 0$ would classify vectors whose distance from $g(x) \geq 0$ elements of one class and vectors whose distance from $g(x) \leq 0$ to be elements of the other. — TYBG, Jun 09 '18 at 20:25
see slide 4 'It corresponds to a (D-1)- dimensional hyperplane in a D-dimensional input space ' — ReneBt, Jun 11 '18 at 07:58
I've updated my answer and hopefully it walks through it a bit better. The graphs in the linked article show $K=2$ and $m = 1$, not $m = 2$ in the early slides and keeps things simple by always assuming $m = K-1$. I've tried to avoid this assumption and hopefully it doesn't make things more confusing. — ReneBt, Jun 11 '18 at 08:29

Fisher LDA - What is the difference between a discriminant function and a linear decision boundary?

1 Answers1