In the Maximum Likelihood section of the Deep Learning Book (Section 5.5), the dataset (examples) is denoted by $\mathbb{X} = \{\boldsymbol{x}^{(1)}, \cdots, \boldsymbol{x}^{(m)}\}$. Note here the bold letters. Then the data-generating distribution has a $\textbf{x}$ (note that the calligraphy is different) as argument $p_{data}(\textbf{x})$. Later on, we see again the data-generating distribution with another calligraphic different argument: $p_{data}(\boldsymbol{x})$. According to the notation used by the authors the notation $\boldsymbol{x}$ is the most understandable (every example is a vector, or a random variable).
Question #1: Does $p_{data}(\textbf{x})$ and $p_{data}(\boldsymbol{x})$ are the same?
Question #2: Why the parametric family of probability distributions is called $p_{model}$? In other words, which model this function refers to? Maybe the parametric model that Elements of Stats., page 265 refers to?
Interestingly, in Bertsekas and Tsitsiklis, 2002, this function $p_{model}$ is considered as joint PDF $p_X(x; \theta)$, which is, effectively, the Likelihood function.
Then we have the same situation with $p_{model}(\textbf{x}; \boldsymbol{\theta})$ and $p_{model}(\boldsymbol{x}; \boldsymbol{\theta})$. Note again the difference in calligraphy.
Question #3: Does $p_{model}(\textbf{x}; \boldsymbol{\theta})$ and $p_{model}(\boldsymbol{x}; \boldsymbol{\theta})$ are the same?
