I am reading the 1992 paper of Bikchandani, Hirshleifer and Welch on information cascades. They claim and prove that, given an environment of sequential decision making, an information cascade will always form. The proof is in the appendix of their paper and involves some statistical arguments that are not clear to me. A simpler proof in the body of the paper is considering groups of two individuals making a decision in sequence. For n even, P(no cascade after n individual decisions in sequence) = P (no cascade after first 2 people) * P(no cascade after next 2 people) * ... * P(no cascade after last 2 people) that ends up being $(p-p^2)^{n/2}$.
This argument is clearer to me, but it seems made after the realization that two identical announcements are needed to start a cascade. Also, it seems calculated from the perspective of being outside the experiment, e.g. the moderator of the experiment who knows that p means a H (high signal) and 1-p refers to L (low signal) and not from the perspective of participants who should take into account being in both states of the world. This is my understanding of the above term.
Why is there no loss of generality in arriving at $(p-p^2)^{n/2}$ not considering both states of the world?
I have also been reading Banerjee's 1992 paper on herding. I would be very grateful for any sources that explain these papers and any notes on information cascades.
Addition: I understood how to interpret the formula above by reading Zhukov's slides and looking at the tree diagram he displays from Hirshleifer: http://leonidzhukov.net/hse/2014/socialnetworks/lectures2/lecture6.pdf. My problem was with interpreting the tree diagram that he shows, but I understood it after I accompanied Zhukov's slides with what Bikchandani et al. wrote. Thank you to the below commenter for the answer, too.