1

In machine learning iid assumption means that examples in the dataset are independent and drawn from the same probability distribution (i.e., identically distributed).

Here, the probability distribution is denoted by $p(x,y)$ where $x$ is vector and $y$ is a scalar. I have a confusing understanding $p(x,y)$. Are both $x$ and $y$ random variables? When people say iid, are they referring to $x$ or $y$ or both? Or do we have here a single random variable?

Sanyo Mn
  • 1,252
  • 12
  • 19
  • 2
    $x$ is a realisation of a random vector $X$ (i.e. multivariate random variable) with (say) $k$ components $x_1,...,x_k$ and $y$ is a random variable $Y$(i.e. a random vector of size $1$). $(x,y)=(x_1,...,x_k,y)$ is a random vector with $k+1$ components, the first $k$ of which form a realisation of $X$. This Wikipedia article will help you become acquainted with random vectors: https://en.wikipedia.org/wiki/Multivariate_random_variable. – Mickybo Yakari Jan 20 '20 at 10:28
  • @MickyboYakari Can we say both X and Y iid? – Sanyo Mn Jan 20 '20 at 11:08
  • 2
    No because being i.i.d. is a property that applies to samples. One would have to say that $(\boldsymbol{X_1},Y_1),(\boldsymbol{X_2},Y_2),...,(\boldsymbol{X_n},Y_n)$ are i.i.d., where $n$ is the sample size and each index pertains to one observation in the sample. We basically sampled $n$ réalisations of the random vector $(\boldsymbol{X},Y)$ independently. – Mickybo Yakari Jan 20 '20 at 11:19

0 Answers0