Its really easy. No formula is needed.
Protein matrices are based on observed frequencies from nucleotide alignments for a variety of genes. They are not based on a priori formulas (it's not like nucleotides). Later matrices incorporated phylogenetic criteria using maximum likelihood - it's really complicated how it works.
Blosum used odds-ratio (OR) I think without a phylogenetic tree. I suspect raw OR. Explaining with/without a phylogenetic tree is a full lecture in phylogenetic theory, just think it'll be a raw value calculated directly.
So lets just run with raw OR based matrix ...
So ... * is a stop-codon it carries the lowest weight in the matrix because each protein only has one of them. So a stop-codon frequency in comparison to any amino acid is going to be extremely low - because for that aligned position there's only stop-codons (i.e. no other amino acids). Thus its -4 here (lowest value in the matrix).
What about the 1?
However, when you compare a stop codon frequency ... between proteins, well its occurrence is always 1 to 1, right? Each protein has one stop codon, compare it to another protein ... thats got one stop-codon too. So one million proteins which are nicely aligned to be homologues (thats how its calculated) have one million stop codons ... thus its 1 no matter how many protein there are in the alignment. The alignment position is always the same because stop codons (at least in Blosum) are always homologous.
Once more, comparing a stop codon (in a protein), with the other stop codon in another protein some the same homologous "amino acid" site - no matter which protein it is - its always going to be 1 because its a universal feature of all proteins.
Thats how it works and I would simply cite
Henikoff, S.; Henikoff, J.G. (1992). "Amino Acid Substitution Matrices from Protein Blocks". PNAS. 89 (22): 10915–10919
I've checked the paper and the authors didn't use stop codons, nor did they use any phylogenetic criteria (that was easy to guess). I know they didn't include stop codons because because,
- The Sigma function for matrix operations has a
20 above it (i.e. 20 amino acids), there's no allowance for stop codons.
- The matrices they present omit stop codons.
When NCBI are leveraging this matrix they need answers for all possibilities, the idea that X (any amino acid) or * is omitted would not be cool for them. Thus they could have approached the authors to fill in this information, this would probably mean redoing the alignments to include the stop codon (which the authors wouldn't have liked), or NCBI could have added it after the event (its a pretty safe addition) because the log-ratio of an invariant site is always 1. The authors/NCBI would just need to be sure the stop codon was in the same position for all protein alignments in Blosum.
You might find the wiki more understandable https://en.wikipedia.org/wiki/BLOSUM