0

For the perceptron algorithm, what will happen if I update weight vector for both correct and wrong prediction instead of just for wrong predictions? What will be the plot of number of wrong predictions look like w.r.t. number of passes? The algorithm of perceptron is the one proposed by Rosenblatt (1958) as below:

enter image description here

My question is that what will happen if we remove if condition and execute update for all instances in each pass.

  • If you look at the cross-entropy loss function, loss is equal to zero when you have the correct prediction. Hence, in this case, error is not propagated backwards, and the weights not updated accordingly. – Buomsoo Kim Sep 15 '17 at 23:27

2 Answers2

1

In the learning algorithm of the perceptron, the weights are not updated after a correct response.

The learning rule says that the weight vector $\mathbf{w}=(w_i,...w_n)$ is updated according to $w_i(t+1)=w_i(t)+(d_j-y_j(t))x_{ji}$ (see wikipedia). So if the output $y_j$ (obtained applying the $j$th input vector $x_j$) is equal to the desired output $d_j$, that is $d_j=y_j(t)$, the rule becomes $w_i(t+1)=w_i(t)+0$, hence there is no change to the weight.

matteo
  • 3,203
  • I'm not sure if we are talking about the same algorithm. I added the algorithm I talked about in the question. Could you please have a look at it? Thank you! – ZigZagZebra Sep 16 '17 at 15:44
  • Oh I see. I have reported a more "modern" and general version of the algorithm. In the original one, inputs $\mathbf{x}$ and outputs $y$ are $\in {1,-1}$. I think that if you update after each pass (regardless of correct or wrong output) then the perceptron will not find the solution. This becase while the original algorithm minimize a meaningful loss function $L(w)=\frac{1}{m} \mathop \sum \limits_1^m {\rm{max(0,}}{{\rm{y}}_i}\left\langle {w \cdot {x_i}} \right\rangle {\rm{)}}$, your modified version does not and add random changes to the parameters after each correct response. – matteo Sep 17 '17 at 00:09
1

First, the algorithm you cite is not from Rosenblatt (1958). Second, the cited algorithm is, at best, unclear, if not incorrect. What is s.t. supposed to mean? "subject to", as in constrained optimisation? What happens if there is more then one $i$ for which the if condition is satisfied? Does the algorithm ever stop?

I'll try to answer your question under the assumptions:

  1. that we are actually talking about the following algorithm:

Perceptron algorithm

and

  1. that you want to know what happens if the if condition (line 5) is removed, so that the lines 6 and 7 are executed unconditionally:

modified Perceptron algorithm

In that case, the algorithm obviously never converges. But, as in each pass through the while loop exactly the same things happen, $\vec{w}$ changes by the same amount in each pass. Thus, in pass $t$, $\vec{w}(t) = t \cdot \vec{w}(1)$.

Note:

  1. The direction of $\vec{w}$ never changes after the first pass, only its size. Consequently, the number of wrong predictions w.r.t. number of passes will be flat (i.e. constant) after the first pass.
  2. In each pass, you add to $\vec{w}$ all elements $\vec{x}_i : y_i = +1$ and subtract all $\vec{x}_i : y_i = -1$. In other words: $$ \vec{w}(t+1) = \vec{w}(t) + \sum_{i : y_i = +1} \vec{x}_i - \sum_{i : y_i = -1} \vec{x}_i $$ Now, the first sum is obviously the mean of all $\vec{x}_i : y_i = +1$, weighted by the number of such $\vec{x}_i$. The same, but with $y_i = -1$, holds for the second sum. Consequently, $\vec{w}$ is the weighted difference between the means of the two classes.

The figure below shows two classes, the line (blue) connecting their means and the direction of $\vec{w}$ (dashed, orange):

Result of the modified Perceptron algorithm

Igor F.
  • 9,089