How to check a file was encrypted (really & correctly)

Question

I would like to audit how an implementation of an encryption algorithm is really performed with the following given data of the problem:

the encryption mechanism is reversible (this isn’t a signature),
the algorithm is pretended to be AES, but might be implemented correctly or not, or worse something else,
the key is not known (in the case I am interested in, I want to check that all files are really and correctly encrypted, but I could play with specifically designed ones),
I don't have access to the source code.

At least I would like to be able to detect that the file is not encrypted. Next I would like to be able to detect that the RNG is running in a tiny set or not really random.

In a first approach I thought to make an analysis of the randomness quality of the encrypted file: average value + standard deviation (with a tool like ent). But I immediatly thought of artificial files with a perfect average value and standard deviation which are perfectly regular and not the result of any encryption. Then my first approach is wrong.

The environment in which I will make this audit is a Unix one. ( I cannot use tools or algorithms I cannot read, compile and check myself. )

Practical case:

I would like to check that my iPhone is correctly encrypted and what is the AES key derived from either a constant, my password, a hardware uniq identifier.
I would like to perform the same validation of an iPhone of any staff member which will ask me to make the same validation for his professionnal iPhone. This is a service toward users to provide them trust in what they think is encrypted and has to be.

Whether or not you know the plaintext is irrelevant if you don't have the key. — forest, Aug 05 '19 at 12:02
If you're auditing a real thing, the data must be real. It must therefore have some characteristic stochastic properties that you could look for. Or not have them. Or are you just encrypting one time pads/TRNG output? — Paul Uszak, Aug 05 '19 at 12:15
I fixed an error in my problem description: I don’t know the clear text in most cases (caches of a running OS), but I can generate it and then encrypt it (with plain files text). — dan, Aug 05 '19 at 16:08
You can use the execution time of the encryption to compare with some implementations. Also, try to perform cache attack if not properly implemented. — kelalaka, Aug 05 '19 at 20:04
Well, what data structure would you generate? Unless it's pretty random, it will probably be distinguishable from encrypted by the fact that it can be compressed fairly easily. It all comes down to what your plain texts are, but probably doable. — Paul Uszak, Aug 05 '19 at 20:26
Note that by definition, "encryption" is reversible. Otherwise it is referred to as "hashing" or "signature". — Nayuki, Aug 06 '19 at 04:11
Do you have access to the binary that's doing this? That makes it feasible but time consuming to find out what's really happening — pjc50, Aug 06 '19 at 09:25
More details about the exact scenario could help. For example, are you able to freely request plain-text messages of your choice to be encrypted? And is the encryption-oracle software that you have the binary for, or is it a black-box dongle that you connect to your computer, or is it a remote service that you request over the internet, or what? And why can't you get access to a key if it's for your own encrypted content? Answering questions like this may help readers to identify approaches to validating the system. — Nat, Aug 06 '19 at 15:35
If you could do this, it would be a huge security hole. Say you know the message is either "Yes, attack at dawn!" or "No, do not attack!". If your proposed test existed, it would pass with one plaintext and fail with the other. But you would know nothing an attacker doesn't know and you'd be able to decrypt the data! — David Schwartz, Aug 06 '19 at 22:52
Generally spoken, incorrect implementations enable successful attacks; therefore in general terms the audit consists of performing known attacks and see whether they succeed. (Of course if the algorithm is unknown one must do that for all potential algorithms.) — Peter - Reinstate Monica, Aug 07 '19 at 11:54

score 29 · Answer 1 · answered Aug 05 '19 at 18:18

29

If you can't get access to the key for at least some sample uses, there's no way to be sure. For example, it's impossible to distinguish AES-128 from AES-256 if you don't have access to the key. That's true of any encryption method: without knowing the key, you can't distinguish the ciphertext from random data of the same length.

A professional auditor would normally get access to some test keys, if they can't normally access those keys through some administration interface.

You can make a statistical test, but all this will tell you is that the encryption is not completely botched or skipped.

If your vendor is not completely dishonest and they claim to have used AES, they probably did use AES. A far more common problem than not using AES is using AES wrong. Here too, you can't be sure that they got it right, but you can at least check for some common problems.

Check how the length of the ciphertext varies depending on the length of the plaintext. The ciphertext should include an initialization vector and an authentication tag, each of which normally adds 16 bytes. If the ciphertext is not 32 bytes larger than the plaintext, something is probably wrong, but there are cases where it can be ok (e.g. for disk encryption where the sector number is used to build unique IVs and no threat requires authentication).

Pass the same inputs in different conditions and make sure that the resulting ciphertexts are completely different. If you can arrange to encrypt multiple messages with the same key, make sure that identical messages result in completely different ciphertexts. This validates that initialization vectors are generated, if not correctly, then at least non-stupidly.

If there's a way for you to submit modified ciphertexts, do that and check that they are rejected with a generic “invalid ciphertext” error, rather than a specific error due to invalid content. This validates that authenticated encryption is used. There are threat models where it's ok not to have authentication, but you need to tread very carefully.

Two things that you definitely cannot know by looking at the output is whether the keys are generated and stored securely. (As opposed to e.g. using a non-cryptographic random generator to generate keys, and writing a copy of the secret keys to an unprotected location.) You can only audit this by looking at the behavior of the system.

answered Aug 05 '19 at 18:18

Gilles 'SO- stop being evil'

19,134
4
50
92

For the last check, I used a canari method: a huge key (a string of 128 U), and during and after the encryption process on a system without not too much free room, I ran a grep canari /dev/kmem. What is your critical analysis of this method? (Perhaps interesting enough to fork another question?). – dan Aug 05 '19 at 23:53
This can be much improved if you know how to detect electronic code book. – Joshua Aug 06 '19 at 01:02
@dan: I wouldn't try that. You are really looking for leftover garbage when the process exits; memory is lazily cleared by the kernel. – Joshua Aug 06 '19 at 01:05
@Joshua ECB wouldn't have an IV, and would give the same result for encrypting the same message twice. You don't need extra detection methods for ECB. – Gilles 'SO- stop being evil' Aug 06 '19 at 08:31
1

@dan I don't understand what you mean by “a huge key (a string of 128 U)”. An AES key would be 32 bytes, and you said you didn't have access to the key. – Gilles 'SO- stop being evil' Aug 06 '19 at 08:32
This is a key part of the problem: I don’t have any evidence that the GUI which ask for a key is putting it as the exact same one (I mean ==) to the AES algorithm. Then I tried to overload it to detect the kind of weakness I discovered accidentally within some password interfaces. Even if I know that a string length limit verification isn’t a proof that the input to AES is the exact user input. – dan Aug 06 '19 at 09:01
5

@dan A GUI where you input a key? That's really weird. Humans shouldn't ever see a secret key. A password is a completely different thing from a key. – Gilles 'SO- stop being evil' Aug 06 '19 at 09:18
“That's true of any encryption method: without knowing the key, you can't distinguish the ciphertext from random data of the same length.” – really? Whose definition is that? I'd find it perfectly plausible for an algorithm to leave some sort of traces in the ciphertext, as long as those don't allow inferring anything about the plaintext. Of course that'll mean the algorithm is inefficient (data-entropy wise), but it could conceivably be worth it for e.g. performance reasons. – leftaroundabout Aug 07 '19 at 12:29
1

@leftaroundabout Indeed, that's indistinguishability from random, which is stronger than ciphertext indistinguishability. I was simplifying. Symmetric ciphers that are used in practice do exhibit indistinguishability from random noise apart possibly in the nonce/IV. That's because if the ciphertext excluding the IV and authentication tag is the same size as the plaintext, there's no noise that couldn't be a ciphertext. – Gilles 'SO- stop being evil' Aug 07 '19 at 13:26

score 10 · Answer 2 · answered Aug 05 '19 at 11:57

10

Unless the file has a plaintext header which indicates that it has been encrypted, there is no way to distinguish ciphertext from uniform random data. You can heuristically guess that a file is encrypted if it has absolutely no structure and appears completely random, but you cannot definitively prove it.

Any cipher whose output could be distinguished from random would be considered broken.

answered Aug 05 '19 at 11:57

forest

15,253
2
48
103

5

I cannot trust a plain text header (as a magic number) as it can be falsified. – dan Aug 05 '19 at 15:55
1

Well, there's no difference between the headers of gpg --encrypt and gpg --sign and even gpg --store, all of them are OpenPGP messages with very similar headers and identical ASCII armor – you can't easily tell by looking which one is encrypted and which one is not. (Hint: The latter two aren't.) – user1686 Aug 06 '19 at 06:07

score 8 · Answer 3 · answered Aug 06 '19 at 00:53

In addition to what the other answers have stated, "proper" encryption using AES-256 (block mode choice aside) can still allow backdoors, such as by maliciously choosing IVs/nonces. Phil Rogaway and others discuss this in more detail in their paper "Security of Symmetric Encryption against Mass Surveillance" (abstract available here).

score 4 · Answer 4 · answered Aug 07 '19 at 00:47

This question is very easy to answer:

The implementation isn't correct and you absolutely should not use it. Any other attitude towards this black box is hopelessly attackable.

Your stance should be: "I must be able to see the source, audit the source, and build the source myself into a binary". Anything short of that is irresponsible on your part. Do not accept the insecure half-measures proposed in the other answers.

You do not need to demand access to the keys used to do the encryption, but you absolutely must be able to verify that the algorithm is correct, and this simply cannot be correctly done only by observing its inputs and outputs. This is not even restricted to encryption -- any program can have a backdoor input that does something nasty and is essentially impossible to discover just by external experimentation -- but its consequences to you are much higher for cryptographic primitives than for other pieces of noncritical software.

I'm not sure the OP wants to use the encryption program; I understand he wants to audit it, possibly on somebody else's behalf. Wild guesses would include educational or professional assignments where the actual user does not have access to the sources or pretends not to (in the case of education). — Peter - Reinstate Monica, Aug 07 '19 at 11:51
@PeterA.Schneider I'm not sure that changes my answer significantly; the only responsible audit result should be "DON'T USE THIS", whether it's on your own behalf or somebody else's. — Daniel Wagner, Aug 07 '19 at 13:54

score 3 · Answer 5 · answered Aug 08 '19 at 01:08

If you're wondering about the iPhone's encryption specifically, then this work may have already been done for you. Many Apple/iPhone products have passed formal FIPS 140-2 certification, which does extensive tests on the sorts of things that you're concerned about. If you want to see details about which products have been certified for which algorithms/key sizes, go to NIST's CMVP website and search for vendor "Apple". Apple also has details on their website about the security certifications they've completed on various products.

FIPS 140-2 focuses on the cryptographic module by itself, things like the algorithm implementations or key management practices. This would be enough to show that the random number generator is sufficiently random, or that what they claim is AES really is true AES.

What this doesn't cover is how they use the crypto engine (i.e., did the filesystem really encrypt this particular file). Testing this yourself will be difficult on a mobile device since you can't (for instance) transplant the hard drive to a different system to read the raw data and check for plaintext. Apple's certification page lists a number of additional certifications that I'm not too familiar with. I'd recommend taking a look at those certification programs and seeing if any of them cover the sorts of tests you're wanting to do. After all, companies like Apple spend a lot of time and money going through these certification processes so that you don't have to do tests like these yourself.

: thank you, but I need an answer I am directly responsible of (for my colleagues). I should be able to reproduce it as any scientific analysis. — dan, Aug 08 '19 at 11:28

score 1 · Answer 6 · answered Aug 07 '19 at 04:08

Many answers have pointed out that what you seek to do is not possible. Proving that something is encrypted with key X is not possible without having key X and a signature for what was encrypted. If it were possible, then the encryption algorithm would be a faulty one. AES does not fit that bill.

However, if you are really in a bind, where management has told you that you must do this, despite proof that you cannot, there is a solution. Disassemble and reverse engineer the encryption implementation, and confirm by your own eyes that that particular set of bits, when executed as a Unix executable, properly encrypts the data. It's a horrible task, but it's infinitely more doable than proving what cannot be proven.

Beyond that, one step you might try is to develop a threat model. For instance, if you can assume that all files are encrypted in the same way as your test files, you can at least make some statements by inspecting the results of encrypting some known plaintext.

score 0 · Answer 7 · answered Aug 06 '19 at 17:30

Have you got a reference set some plain text versions of these files at all? A representative set of 10,000 files, large and small, from 1 kb to 1 GB could be passed through the system. Measurements can be taken. , that you could trial encrypt with this cypher, albeit a freshly made new key. See if you can correlate the file sizes probably match to 4KB or 64 KB blocks etc.

Why do you want to audit the 'encryptability' of this enterprise if you say you collectively do not have the master key? Or perhaps does the key exist beyond your reach in someplace? If so, you can arrange proofs to be made from it.

Knowing exactly how long it takes to run down to the CPU cycle can be extremely useful. Maybe you could simulate the processing in virtualbox and run it in slow motion to get a feel for the amount of floating point ops is happening. Maybe try to mount a side-channel attack on the CPU cache and memory controller etc.

It's probably running on AES hardware, with unique boot parameters. For example, the number of cycles to re-hash; or the values of some initial prime numbers etc. After that it's running on the AES hardware. This technique would sure make it a bunch harder cos if you stumble on the right key it might actually not work in time to notice etc.

I think generally speaking, you seek a ZK proof - a zero knowledge proof. That is some cool mathematics right there. Allows one to verify a maths operation but not really know whats going on.

How to check a file was encrypted (really & correctly)

7 Answers7