Revised Answer
I originally deleted this answer because I didn't feel it was helpful, but I will leave it here in case your discussion with whuber will make the answer more clear (deleting the answer unfortunately wipes out all the comments with it, so I have simply marked out the portion that isn't helpful). Since I seem to be misconstruing the language used in this setting, I will leave it to better minds to provide an answer here.
Old Answer
Both are generally saying the same thing, however they have slightly different meanings behind them which may or may not be important.
- $E(u|x)=0$ basically says that the error term, given $x$, should be expected to average to zero. This is the "general" case and doesn't have any specificity. The assumption fails if $x$ and $u$ are correlated.
- $E(u_i|x_i)=0$ essentially says the same thing, but includes $i$ as an index indicating each $i$ observation. This may matter if you get into specific regression contexts, such as if you include time $t$ or cluster $j$ into the mix of the linear equation. Then your values may switch, for example, to $y_{ij}$, a $y$ response belonging to $i$ individual in $j$ group.
- Usually $U$ and $X$ here mean the same thing (the vectors $u$ and $x$), they are just not indexed ($x_i$ signified an individual $x$ whereas $X$ just refers to the variable as a whole).
For your last question:
On the other hand for the assumption which two error terms have an expected value of $0$, I wonder how this is calculated?
Not sure what you mean, but I guess you are referring to error terms for both variables, which you would not have, as $x$ is fixed and known, whereas $y$ is random, hence the error term we usually stick onto a linear regression equation. Of course this is assuming that the $x$ here is measured with perfect precision, but that's a completely different topic.
Edit
Some of my answer may not have been very illustrative so I will just show you how this works in practice. Here I will use an example in R by fitting a regression and showing this in action.
#### Load Library ####
library(tidyverse)
library(broom)
Save Data as Tibble
cars <- mtcars %>%
as_tibble()
cars
Plot Points
cars %>%
ggplot(aes(x=wt,
y=mpg))+
geom_point()
Fit Model
fit <- lm(mpg ~ wt,cars)
resid(fit)
The residuals here, the $u$ here, is as we said not actually zero, but fluctuations around zero. This is why they are called expected errors, the $u_i$ individually and $u$ as a whole vector:
1 2 3 4 5 6 7
-2.2826106 -0.9197704 -2.0859521 1.2973499 -0.2001440 -0.6932545 -3.9053627
8 9 10 11 12 13 14
4.1637381 2.3499593 0.2998560 -1.1001440 0.8668731 -0.0502472 -1.8830236
15 16 17 18 19 20 21
1.1733496 2.1032876 5.9810744 6.8727113 1.7461954 6.4219792 -2.6110037
22 23 24 25 26 27 28
-2.9725862 -3.7268663 -3.4623553 2.4643670 0.3564263 0.1520430 1.2010593
29 30 31 32
-4.5431513 -2.7809399 -3.2053627 -1.0274952
However, if you take their mean or sum, you get something very close to zero if the model is not horribly specified (note this is in scientific notation, hence the e here:
> sum(resid(fit))
[1] 2.303713e-15
You can plot the distance of the fitted terms from the regression line to visualize the errors and how they sum to zero:
#### Show Residuals ####
cars %>%
lm(mpg ~ wt, data = .) %>%
augment() %>%
ggplot(aes(wt, mpg)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
geom_segment(aes(xend = wt, yend = .fitted)) +
labs(title = "Fitted Values and Distance from Line",
x = "Weight of Car",
y = "Miles per Gallon")
Shown here, where the black segments above the regression line are positive residuals and the segments below it are negative residuals:
