Gradient descent and epoch

Question

Suppose our hypothesis space is $$\mathcal{H}=\{f:f(x)=f_\theta (x), \theta\in \Theta\},$$ where $\theta$ is the trainable parameter.

Suppose we have a dataset $\{x_i,y_i\}_{i=1}^N.$

In the notes from my professor, he defines the empirical risk minimization as $\,\Phi(\theta)=\frac{1}{N} \sum_{i=1}^N L(f_\theta(x_i),y_i)$.

Now we can evaluate $\nabla \Phi(\theta)$, which is a function of $x_i,y_i,\theta$.

Suppose we have a initialization $\theta_0$ and we do the gradient decent with some fixed learning rate $\eta$.

We update $\theta_0$ with $\theta_0 - \eta \nabla\Phi(\theta_0)$ until it converges (suppose it will converge).

My question then arises.

For each update, we need to use the entire dataset $\{x_i,y_i\}_{i=1}^N$ and we need to update many times until $\theta_0$ converges.

So, do we just keep reusing the dataset for all these updates, and call the dataset used for each update an epoch?

What I need is a confirmation that if the the dataset used for each update is called an epoch. Thanks.

score 2 · Accepted Answer · answered Nov 25 '22 at 09:09

2

Yes, on each epoch you are using the same dataset. Gradient descent basically runs in a for-loop. Using a Julia-like pseudocode, it would be something like below

for epoch in 1:n_epochs
   theta = update(theta, data)
end

There is also batch gradient descent, where each epoch there is an inner loop that iterates over batches of the dataset

for epoch in 1:n_epochs
   for batch in split_to_batches(data)
      theta = update(theta, batch)
   end
end

When the batch size is 1, we call it stochastic gradient descent.

answered Nov 25 '22 at 09:09

Tim

138,066

Can the batch size be greater than 1 in SGD? Let me be more specific. Suppose $N$ has a factor $k$ and $1<k<N$. Can we let the batch size be $k$? Thanks. – Sam Wong Nov 25 '22 at 10:48
@SamWong it can, but then we just don't call it a stochastic gradient descent but batch gradient descent. – Tim Nov 25 '22 at 10:59
Gotcha. Thanks man. – Sam Wong Nov 25 '22 at 11:02

Gradient descent and epoch

1 Answers1