Why are LLMs generative models

Question

According to Wikipedia:

A generative model is a statistical model of the joint probability distribution $P ( X , Y )$ on given observable variable $X$ and target variable $Y$;

A discriminative model is a model of the conditional probability $P ( Y ∣ X = x ) $of the target $Y$, given an observation $x$.

Based on this definition, How are autoregressive LLMs such as GPT series generative model, since they just model $P(X_t∣X_1,X_2,…,X_t−1)$ which is the second case.

Edit: I found out that from the GPT-2 paper that they do formulate it to model the joint distribution $ p(x) = \prod_{i=1}^{n} p(s_n|s_1, \ldots, s_{n-1}) $ (eq.1). So technically GPT series are generative models, although I am not quite sure how do they model $p(s_1)$ (essentially what makes it looks like a discriminative model).

I've always thought that the term was more in reference to the lay use of "generative" in that it generates something like a paragraph, rather than returning a number like a regular regression does, than it was about generative vs discriminative learning as technical nomenclature in statistics. "Generative AI" is about generating something, rather than a comment about the math behind the generation, I've thought. — Dave, Sep 01 '23 at 17:41
@Dave It seems that is indeed the case, but I just want to make sure.. — Sam, Sep 01 '23 at 17:44
Agreed; I also think the "discriminative = boundary between classes" and "generative = distribution of class/data" intuition from the following answer is nice to keep in mind: https://stats.stackexchange.com/questions/12421/generative-vs-discriminative?rq=1 -- -- because (very abstractly/informally) an LLM captures some distribution of text such that generated text is a sample from that distribution -- even if the underlying training objective relates to next-word prediction (i.e., something that "looks" discriminative). — chang_trenton, Sep 02 '23 at 01:24
@AryaMcCarthy I had not seen that post before (watch there be a comment of mine in it). I have seconded your VTC and am happy to see that my intuition that “generative” AI is a marketing term (not an unreasonable one) separate from the notions of “discriminative” vs “generative” machine learning as technical nomenclature. — Dave, Sep 02 '23 at 23:49
upon reading the accepted answer from the suggested link, I'm even more confused now, because as far as I know, GPT series model requires some input in order to generate its output. This sounds awfully like discriminative model which is described as 'Discriminative models can only generate new samples when provided with a given value of $(X)$. They can't generate arbitrary new $(x,y)$ samples.' and 'The rule of thumb here: 'generative' models can be sampled from, without needing any input. Text generation is a separate matter, which can be done by both classes of model'from the accepted answer — Sam, Sep 03 '23 at 02:48
You don’t sound confused—you sound like you understand it. — Arya McCarthy, Sep 03 '23 at 17:51
@AryaMcCarthy I'm confused about: based on my knowledge on how GPT models work and those statement I quoted, it seems GPT models are discriminative models, but as I cited in my updated content the authors of the GPT-2 model claimed it is formulated as a generative model. — Sam, Sep 04 '23 at 14:56

Why are LLMs generative models

0 Answers0