According to Wikipedia:
A generative model is a statistical model of the joint probability distribution $P ( X , Y )$ on given observable variable $X$ and target variable $Y$;
A discriminative model is a model of the conditional probability $P ( Y ∣ X = x ) $of the target $Y$, given an observation $x$.
Based on this definition, How are autoregressive LLMs such as GPT series generative model, since they just model $P(X_t∣X_1,X_2,…,X_t−1)$ which is the second case.
Edit: I found out that from the GPT-2 paper that they do formulate it to model the joint distribution $ p(x) = \prod_{i=1}^{n} p(s_n|s_1, \ldots, s_{n-1}) $ (eq.1). So technically GPT series are generative models, although I am not quite sure how do they model $p(s_1)$ (essentially what makes it looks like a discriminative model).