5

There are few reliable absolutes in this world. One I have relied on is the idea that checking if a file exists before doing something with it just creates an unwanted race condition. Meaning between the check and the attempt at usage, the OS was free to do whatever it liked with the file. So I had no right to expect anything to stay the same and that there was no good reason to ever do this.

However, after offering this advice recently I was presented with a counter argument that such a check can preserve limited file lock resources. Thus the race condition is diffused, not by pretending we know what we don't, but by claiming to know what is more likely.

At first this just sounded to me like someone trying to rationalize some bad code but I can't say I'm as certain as I once was.

If a use case that makes sense of such code really exists I'd love to hear it. If it does I'll need a compelling argument for its need because such code is going to be very difficult to test completely.

To boil it down, I've made a habit of telling people to stop checking if a file exists before opening it. Are there exceptions to this rule profound enough to consider it bad advice?

candied_orange
  • 108,538
  • Can you clarify the use case? Is it reading a file that should already exist? What happens if it doesn't exist? – JimmyJames Jan 10 '23 at 15:20
  • 2
    Limited file lock resource? The only limits are in place to mitigate against troublemakers, if desired, you can configure systems to allow ridiculous numbers of open files. If you intend on writing to the file, what is the point of "saving" a trivial resource? The counter argument isn't really there. Commenting rather than answering because the counter argument isn't substantiated. – whatsisname Jan 10 '23 at 15:40
  • @JimmyJames my argument was that it doesn't mater since the OS can both delete or create the file between use so whatever expectation you have it can be violated. The use case however is argued to, somehow, reduce the work that would need to be done since you now know which state is likely. I wish I could explain that better but it's exactly what I can't get my head around. – candied_orange Jan 10 '23 at 16:24
  • @whatsisname I honestly don't know why the lock is being valued so much higher than the file IO cost of the check. I'm curious to know if there ever is a good reason to see it that way. – candied_orange Jan 10 '23 at 16:26
  • Couldn't be language dependent? I mean, if files and directories are mapped to a virtual filesystem (say on a VM) assuming that the file was there once and writing might result in an IO error if, for some reason, the file or the directory disappears from the O.S filesystem before the VM gets notified or updated. IO errors are commonly treated as unrecoverable errors and you might want to avoid that at all costs. I do mention this because I recall some conversations about how Java handles files and how what seems ok in runtime (JVM) might not reflect the real situation of the real filesystem. – Laiv Jan 10 '23 at 16:32
  • @Laiv I've never had a VM behave that way. I suppose I had assumed the file system was stable. But why would a file lock be more traumatic to the file system then checking if a file exists? And what does that have to do with the language used? – candied_orange Jan 10 '23 at 16:48
  • It might not matter but I'm getting hung up on the various scenarios. If you are writing in append mode, and you check for the file's existence, then write to it, it could be deleted before you get a chance to write to it but then you'll just create the file. What's the point of checking in that case? The main scenario where I can see checking is if you don't want to write if it already exists. But that doesn't seem to be relevant here. – JimmyJames Jan 10 '23 at 17:43
  • @JimmyJames It's the getting hung up that scares me with this. If this use case is real can you imagine trouble shooting it? Yuck. As for not writing if it already exists yes it can be deleted after the check but since the usage attempt wont be made now that doesn't create the race condition. Doesn't mater if the other thread deletes or creates the file. So long as a usage attempt is made after a check (for existence or for no existence) you have a race condition. – candied_orange Jan 10 '23 at 18:18
  • Right, the "don't write if existing" scenario is one that I think is valid, although if you have more than one application or thread that might write the file, you could still have some conflicts. This reminds me of 'double-checked locking'. This is surely relevant. – JimmyJames Jan 10 '23 at 19:17
  • @JimmyJames the problem is "don't write if existing" implies "write if not existing" which does have the race condition. – candied_orange Jan 10 '23 at 19:44
  • 1
    After mulling it over, I think your first position on this is the correct one. The only argument I can think of is reducing the overhead of exception-handling but I think the impact of that is (usually) overblown. – JimmyJames Jan 10 '23 at 22:09
  • @JimmyJames proving an absolute is always difficult but a well considered answer that dispels the more obvious counter arguments would be appreciated. If some hedging is needed on the advice that would be welcome as well. Would like something I could use on newbie coders. This comes up in peer reviews and I really need to be simple, clear, and correct. – candied_orange Jan 10 '23 at 22:18

3 Answers3

5

When opening a file (regardless if it is opened for reading or writing), one has to do proper exception / error handling for the specific "open" statement (as well as for any further I/O). It should be pretty obvious that no file-existence test can make this obsolete for the reasons mentioned in the question. Hence, it should be clear that a prior test for file existence does not allow to simplify or omit the exception handling around the "open" in any way. In case the extra test does only duplicate some logic which is already inside the exception handler, then it only overcomplicates things and should be left out.

This is usually the case when trying to read a file. An extra "file exists" before opening makes not much sense - either the file can be opened for reading, or it cannot, which can be caught as part of the exception handler.

For writing into a file, however, I can imagine some different scenarios:

  1. Appending to a log file (where it does not really matter if the file exists beforehand - if not, it can just be created). The correct approach in Java, for example, is described here, it includes a call to createNewFile, which does the test for existence implicitly. But beware, in other programming languages the "append if not exists" might require an explicit test for existence.

  2. Writing or copying into a folder (and checking if that folder exists before writing into it). Here, a prior test if the folder exists can make sense to produce a precise error message (for example, for an UI, telling users they need to create the folder first). When write or copy fails afterwards, maybe because some process in the background renamed the new folder (which will probably a very rare case), this may result in a more technical failure message where the root cause may not be that clear, so the prior existence test can help to reduce the frequency of running into this case.

  3. Obtaining a write-lock to a lock file / using this as a semaphore. Here, a file-existence test would be completely nonsensical, since getting the write-lock must be an atomic operation, otherwise it would not work.

  4. Writing into a file with fixed block sizes at specific position like a file-based database (so the file must exist beforehand to make this a sensible operation). Since this usually involves to get exclusive write access to the file in stake, it is not really different from the former case - a file-existence test does not make much sense. In fact, in this scenario the file in stake will probably exists over a long term. The relevant information is not if it exists, but if it is writable, and obtaining the write-lock must be atomical.

  5. Overwriting an existing file (and prevent this to happen accidentally). In a UI application, asking the user beforehand if they intentionally want to overwrite an existing file can make a lot of sense. Even if the file in stake is just magically appearing after the file-existence test returned "false", the prior test can reduce the number of accidental collisions.

So I think this is not just "black-and-white" - there are scenarios where a prior existence test can make it easier to handle certain situations more gracefully, or lower the probability of collisions. One should always be aware, however, that this test is no replacement for the proper handling of failing I/O operations.

Doc Brown
  • 206,877
  • 2
    +1 A good point of reference for the scenarios here are the various Python open modes: https://docs.python.org/3/library/functions.html#open – JimmyJames Jan 11 '23 at 17:48
3

You allude to the answer to your problem in your first statement, that is mostly correct. In the case of software engineering, there are no reliable absolutes.

The bad advice is dogmatically telling someone to apply a method blindly. The advice is solid as long as it is given as guidance, not gospel. "It is usually best not to check before write" is solid advice. "Never check before write" is the kind of poor advice that make sites like this so valuable.

If you want to, you can then go on to justify the reasons, and describe the exceptions. Keep in mind, like most dogmatic rules, you can rework every exception to fit the rule and make yet another convoluted mess of code to fit some arbitrary programming dogma that makes no sense in that particular use case.

mattnz
  • 21,362
2

The use case I can see for this is:

  • check if file exists
  • if it does, prompt user
  • perform time-consuming operation
  • write to file

If you know that the existence (or absence!) of a file will cause problems later, it makes sense to check for it upfront before doing anything expensive. However you still also need to handle the case where the file appears between checking for it and actually opening it.

pjc50
  • 13,377
  • 1
  • 31
  • 35
  • The 'time consuming operation' I would add "or expensive" - might be nothing to the program or programmer, but time consuming for the user. On write, you still have to handle the edge case of the file appearing. – mattnz Jan 11 '23 at 03:18