Multiple sources (see for instance this or this) discuss how genetic data will have scalability problems, given the huge file size of the human genome. The most straightforward encoding (see here) of the human genome requires about 700 Mb.
I came across this paper claiming to be able to store the human genome in about 4 Mb, having a reference genome and exploiting the fact that all human genomes are mostly equal. This paper is much older than the the other references discussing the scalability problems. Why is this technique not widely used?