I've got some code that does a Dept of Defense secure erase of disks, and would like to know about a good sanity check to run after the fact.
Bottom line, the last write phase of a secure erase of a HDD is to write random data on every byte on the HDD, per the spec. The official standard does not put any constraints on it beyond that. So I use a decent cryptographic algorithm to write data ... but that is not an issue here.
What I would really like to determine is what a good sanity check is on a HDD that has been erased. As the spec does not allow one to add any meaning to what is written on the disk, is there a way to assess the probability that the data is random?
I read every byte on the surface and create 256 buckets for each possible byte value, and currently count the number of consecutive bytes, and also look at standard deviation to see if any of the byte values are significantly different than others. With that in mind, what is a reasonable # of consecutive bytes for a HDD of X bytes to consider that results are 'inconclusive', or the data might have been tampered with? Same for std deviation, and is there a better model? (Yes, I know somebody could still hide data all over the place if they wanted to, I'm just looking for a way to detect if the process may not have completed, or perhaps the very beginning or end of the disk were never randomized)