0

Suppose we have a list L of all root words in English numbered from 1 to n. The data is any English text (let's say a text from a book) where each word is replaced by its root. You are given the data word by word.

Suppose we have a random variable $X$, which represents the number (in the list L) of a root word. When doing probabilistic modeling using this random variable $X$, what about the dependency in the instances of $X$. In other words, if the previous root word is "the", then we know that the next word is noun. That means an instance of a word is dependent on the previous instance and it is not random. Also, we know that instances like the articles "the" and "a" will occur more frequently than many other words. Moreover, if a word occurs (even if it is rare), then we know that it might occur soon in the following sentences.

Does knowing all this information makes $X$ not a random variable, and why?

  • "Dependent on the previous instance" does not imply "not random"! For much more about this situation, search for "Markov chain." For definitions and descriptions of random variables, see https://stats.stackexchange.com/questions/50/what-is-meant-by-a-random-variable. – whuber Jun 05 '23 at 20:24

0 Answers0