44

Apparently even today there is no single "official" standard for C++ file extensions. There are just common conventions.

To me this stands out as an anomaly... file extensions are heavily ingrained and I can't think of any other examples of such a popular file type not having standards for this. At least it seems peculiar and stands especially in contrast to C.

I looked in my copy of Bjarne Stroustrup's The C++ Programming Language (2nd edition, 1991) and it has this to say (Ch.4, p114):

Header files are conventionally suffixed by .h and files containing function or data definitions by .c. ... Other conventions, such as .C, .cxx, .cpp, and .cc, are also found. The manual for your compiler will be quite specific about this issue.

I wonder why Dr. Stroustrup chose not to be specific about this issue himself?

According to the Historical Note in chapter 0, formal industry standardization of C++ had been going on for years prior. One would have thought that standardizing on this detail would have / could have occurred then.

Where did these "competing" file extensions come from? Did this ever come up for standardization, but the idea was declined or deferred? I can see that today it might take overcoming a lot of inertia to change something like this, but that wouldn't have been so originally.


Edit: A lot of comments are focusing on the merits of filename extensions. That in itself is certainly not what is being asked. This is a historical question looking for facts about past events. Given that today we have multiple alternate extensions for C++ the question wants to know, "why is that so".

Also, seems there are two closely related aspects - one is "why didn't C++ specify this". The other is "given that C++ didn't, specifically why did the naming conventions fragment".


Update: Since this is a "history" question I'm looking for answers that are factual, ideally citing sources. Speculation and guessing may be useful in the comments, but I don't think would make for solid answers.

StayOnTarget
  • 3,856
  • 1
  • 24
  • 41

5 Answers5

89

Because it's not important to ... anything.

The compilers don't care. The editors don't care. Back in the day, some operating systems didn't even HAVE "file extensions". DOS mandated them, DEC system mandated them. Unix didn't.

What's the standard extension for Fortran? For Pascal? For BASIC? Lots of convention, many system specific. But no standard.

You know what my file extension is for Lisp? It's .lisp. Well, a lot of legacy systems can't support a 4 letter extension. Guess it's .l then, or .lsp, or something else.

File names are local to the operating environment, and not necessarily portable. This is another reason the standard doesn't say anything about them.

Addenda:

I think there just needs to be some clarity here.

First, this is a retro site, so things need to be taken in the perspective of how it was and viewed forward, rather than using the lens of today and viewing backward.

Second, we're talking "standard" here. Standards are, roughly, "MUST". MUST do this, MUST do that.

1986 is just at the peak of the wild west of computing when things were really starting to settle down. There was a large diversity of systems, and things like a file extension STANDARD, a MUST, were not tenable. Today, while kernels vary, operationally modern computing is almost (almost) a mono-culture. Not so back in the day.

And, in the end, in the large, the language doesn't care. Extensions and conventions are a tooling issue. Modern things like Go and Java are more than just languages, they mandate a broader environment outside of just syntax and semantics. They're offering an opinionated approach to development beyond just the language.

Back then, the languages had to fit a variety of machines. Nowadays, the environments are bringing their machine with them.

We've come a long way in 35 years.

Will Hartung
  • 12,276
  • 1
  • 27
  • 53
  • Comments are not for extended discussion; this conversation has been moved to chat. – Chenmunka Jul 01 '21 at 17:25
  • 9
    The compilers do care. For example, unless overridden by an additional command line option, the gcc binary decides which language’s frontend to invoke based on the input file’s extension. (This isn’t limited to C++ versus C either.) I don’t know if this behaviour was present that early in C++’s history, but I wouldn’t be too surprised. – user3840170 Jul 02 '21 at 16:20
  • 4
    @user3840170 the fact that you can change them means they don't care, it's just a convention, default, and a convenience. – Will Hartung Jul 02 '21 at 16:46
  • Convention and convenience is all file extensions ever are. – user3840170 Jul 03 '21 at 07:50
  • 1
    Do you know Make? – Polluks Jul 06 '21 at 19:27
  • 1
    The 7th edition cc(1) didn't have an option to override how it handles file extensions. As far as it was concerned: .c - compile; .s - pass to the assembler; anything else - pass to the linker unmodified. – Jonathan Cast Sep 26 '23 at 21:33
36

The first edition of Stroustrup's "The C++ Programming Language" (1986) consistently uses a ".h" extension for C++ header files and ".c" for C++ source files. C and C++ source files were distinguished by which compiler you used, cc for C and CC for C++.

The second edition (1991) uses the same convention, but mentions other extensions:

Header files are conventionally suffixed by .h and files containing function or data definitions by .c.. They are therefore often referred to as ".h files" and ".c files", respectively. Other conventions such as .C, .cxx, .cpp, and .cc are also found. The manual for your compiler will be quite specific about this issue.

So as of 1986, there was a conventional extension for C++ source files: .c. Over the following years, it was found useful to use the extension to distinguish between C and C++ source files, or perhaps some compiler implementers simply decided to use a different extension -- but that decision apparently was made separately by different people and groups. By the time the inconsistency became an issue, it was too late to agree on a single extension for C++ source files.

Note that many implementations do use the file extension to distinguish between C and C++ source files. make has implicit rules to generate an object file from C and C++ source files, and looks at the extension to determine which compiler to invoke. Other build systems do similar things. The gcc command will also look at the file extension; gcc -c foo.c compiles C, and gcc -c foo.cpp compiles C++. (The separate g++ command is intended for C++, but the only real difference is that g++ links with the C++ library, which isn't relevant when you use the -c option that just compiles a source file to an unlinked object file -- except that g++ will compile *.c files as C++.)

Tools typically recognize multiple extensions for C++ source files. gcc recognizes .cc, .cp, .cxx, .cpp, .CPP, .c++, and .C. GNU make appears to recognize .cc, .C, and .cpp (the inconsistency is annoying).

If Stroustrup had used, say, .cc and .hh in the first edition of his book, we'd probably all be using that today. (And if he had used .C and .H we'd have problems with case-insensitive systems like Windows.)

Keith Thompson
  • 476
  • 3
  • 5
  • I wonder if there would be any difficulty with specifying that a C++ implementation may do anything it likes with a file that starts with /* LANGUAGE: if the next three characters are not C++, and that a C implementation may do anything it likes with such a file if LANGUAGE isn't followed by the letter C and a whitespace character, thus allowing implementations to distinguish between those languages (or a variety of others) without having to care about file names. – supercat Jul 02 '21 at 16:43
  • @supercat I'd say that would be neither difficult nor useful. One problem: existing C and C++ source files don't start that way, so the existing mechanisms (which seem to be perfectly adequate) would still have to be supported. And /* doesn't start a comment in all languages. The idea is similar to the "magic numbers" used by a lot of binary file formats and recognized by the file command, and I can see it being used if it had been agreed to, say, in the 1950s. – Keith Thompson Jul 02 '21 at 19:36
  • 2
    The g++ command, unlike the gcc command will compile .c files as c++, I belive the same is true for clang++ vs clang. – Peter Green Jul 22 '21 at 04:58
  • 1
    @PeterGreen You're right. clang++ gives a warning: "warning: treating 'c' input as 'c++' when in C++ mode, this behavior is deprecated [-Wdeprecated]", which implies that the behavior could change in a future release. (It seems odd to call the compiler's behavior "deprecated".) – Keith Thompson Jul 22 '21 at 22:00
19

I wonder why Dr. Stroustrup chose not to be specific about this issue himself?

You'll have to ask him. But based on what I've read from his website, he seems not to be strongly opinionated on stylistic matters like how to name files, where to put braces, or whether multi-word identifiers are written like_this or likeThis or LikeThis. His main concerns are resource-safety and type-safety.

Where did these "competing" file extensions come from?

I'm just guessing here, but:

  • .C — because C++ is a “big” version of the C language. (Not usable on platforms with case-insensitive filesystems.)
  • .cxx — 45° rotation of the plus signs
  • .cpp — stands for “C plus plus”
  • .cc — “C with classes”, maybe?
  • .c++ — straightforward, but an OS may disallow + in filenames or give it some special meaning in the shell. And inconvenient to type on many keyboard layouts.

I can see that today it might take overcoming a lot of inertia to change something like this, but that wouldn't have been so originally.

Stroustrup started development on C with Classes in 1979. The first C++ standard was released in 1998. That's 19 years later. Even if you start counting with the first publication of The C++ Programming Language in 1985, that's 13 years of people using the language before it was standardized. Plenty of time for different ways of doing things to arise.

Also, note that when C++ was finally standardized, it was explicitly intended to be usable on multiple platforms with different file-naming rules. Even those that don't allow . in filenames at all. This is also why the C standard headers were renamed (e.g., <math.h> to <cmath>).

dan04
  • 964
  • 7
  • 9
  • 24
    For the record, the C standard doesn't necessarily require math.h to be a file name—it's just a string that identifies a “header”, which need not be implemented by a file at all (e.g., in a freestanding implementation). – texdr.aft Jun 30 '21 at 17:18
  • 9
    The .C convention has bitten me quite badly. I remember an open source graphical debugger I once tried to compile for the Mac. It used the .C convention and it had C and C++ files in its source tree with the same name but differing only in the capitalisation of the extension i.e. both foo.c and foo.C existed containing C code and C++ code respectively. Whoever thought of that idea is destined for the seventh level of hell. – JeremyP Jun 30 '21 at 19:01
  • 14
    @JeremyP, you surely mean the idea of "case-insensitive filesystem"? – vonbrand Jul 01 '21 at 00:20
  • 1
    I remember the .C convention. Coming from a case insensitive OS, that threw me completely. The other weird one is directories called c and h and all files in it not having any extensions - it was one of the IBM ones, possibly AS400. There was also the c.filename and h.filename convention. – cup Jul 01 '21 at 06:00
  • 9
    @vonbrand No, case preservation is fine, but whoever invented the case-sensitive filesystem deserves to be shot. Tools should never have to pretend they don't know how to find the bin directory because it was actually saved as Bin on disk, etc. And I know why this is -- case insensitivity is more work (especially with numerous and ambiguous collation variants) and it's easier to treat filenames as an arbitrary byte sequence. But the tools should be responsible for hiding that, so that nobody has to suffer case sensitivity. – Miral Jul 01 '21 at 06:49
  • 6
    @vonbrand No I don't. NTFS, HFS+ and APFS get it right - case preserving, case insensitive. I understand why it wasn't done that way in UFS - early Unix machines could save cycles and memory by doing case sensitive comparisons but it's wrong. – JeremyP Jul 01 '21 at 07:43
  • 7
    Actually, fun retrocomputing fact: DOS was technically case sensitive too, just not case preserving. It had a canonicalisation function that converted all inputs to uppercase, then performed an exact (case sensitive) search on the filesystem itself. Which had the interesting property that if you used your handy sector editor to alter the directory entries you could make files with lowercase characters that would still exist happily but were completely inaccessible by conventional means. This was occasionally used for copy protection. I'm less sure about CP/M but I suspect it was similar. – Miral Jul 01 '21 at 09:05
  • 6
    @JeremyP "NTFS, HFS+ and APFS get it right - case preserving, case insensitive" Which one is 'right'? Because they do it subtly differently. eg APFS and HFS = different Unicode versions = different normalization tables.

    Sometimes this stuff matters: see CVE-2021-21300 where insensitivity led to an exploit.

    It's not simple (see turkish dotless-i vs dotted-i. Or SS.txt == ss.txt =?= ß.txt == ẞ.txt). And different paths => different filesystems => different behaviour.It might appear okay to most people but that's only because they're not exercising the failing edge cases.

    – Levi Jul 01 '21 at 09:30
  • @Levi there are always edge cases and many of them apply whether you go for case sensitive or case insensitive. The fact that some glyphs have multiple ways of being expressed is a problem that is orthogonal to case sensitivity. e.g. is é precomposed or decomposed. As for the ẞ glyph, that's a thorny problem. I think Germans would like ẞ and ss to be the same, but, again, this is nothing to do with case sensitivity. – JeremyP Jul 01 '21 at 09:56
  • @Levi: It's too bad file systems haven't evolved to support the concept of files having a "human readable name" separate from the name the machine uses to identify the file. Interestingly, the Commodore disk format did support such a thing, sort of--names were stored as 16 bytes on disk, but when listing directories they would appear, enclosed in quotes in, an 18-character-wide field. If a name contained an 0xA0 byte, the portion after that byte would appear outside the quotes, and would not be required to load the file. If there were a convention that file names... – supercat Jul 01 '21 at 16:35
  • ...would need to consist of a certain limited range of characters before a delimiter, but could contain arbitrary characters after that, and only the portion before the delimiter would be used for matching, that would have offered some pretty big advantages. File names in directory listings may be intended for humans, but in most other contexts file names are intended for use as identifiers by computers, and identifiers should use limited character sets even if human-readable names shouldn't be thus confined. – supercat Jul 01 '21 at 16:38
  • 5
    NTFS will do (and as far as I recall always has done) case-sensitive or case-insensitive lookups, at the caller's request. This ability is certainly available through the native API (I have used it) even though it might not be exposed via the Windows API. In fact, if you run Windows Subsystem for Linux, you'll note that the Linux utilities retain case-sensitive access to NFFS volumes. – dave Jul 01 '21 at 22:24
  • When talking about what APFS gets right, I'd first read these three posts. I'm assuming Apple has since fixed all relevant problems, but they provide important context: https://eclecticlight.co/2017/04/06/apfs-is-currently-unusable-with-most-non-english-languages/ https://eclecticlight.co/2017/04/07/apfs-and-macos-10-13-many-apps-and-tools-will-need-to-be-revised/ https://eclecticlight.co/2017/07/05/high-sierra-and-filenames-apple-is-relenting/ – ssokolow Jul 02 '21 at 06:26
  • 2
    That last sentence (about where <cmath> came from) is incorrect. There was a lot of debate on the standards committee about what the extension for the standard headers should be, and the unfortunate choice was to short circuit the debate by using no extension. (It's unfortunate because many tools depend on file extensions to determine how to handle files. My programmer's editor doesn't do syntax highlighting for files with no extension; you have to tell it the type of each file that has no extension. – Pete Becker Jul 02 '21 at 17:32
  • 1
    Re: human/machine names for files. Certain 1960s-designed systems had a "filename" and a "title". In the case I am familiar with, the filename was an exactly-12-alphanumeric string specified by the programmer, (certain conventions controlled the last 5), and the title was, as far as I recall it, a free-form string. – dave Jul 02 '21 at 18:02
  • @another-dave: I find it sad that so many people are oblivious to the fact that labels which will be used primarily as identifiers by machines use should be specified in a manner favoring that primary purpose. This comes up with things like hashtags versus trademarks, the former of which should be governed by social media company terms of service rather than trademark law. If one is writing text for humans to read, someone producing a product similar to another product should label it in a manner that would allow a reader to readily discern that it's similar to the other product... – supercat Jul 03 '21 at 19:57
  • ...but is different. Hash tags, however, are machine-readable. If a company like Twitter wants to specify that only owners of trademarks may use them as hash tags, that would reserve hash tags for exclusive use by trademark holders, but on the flip side if it were to specify that any trademark holders may only use their trademarks as hash tags if they grant permission for others to do likewise, that would make clear that trademarks holders could not claim exclusivity. The very nature of identifiers is that they need to match perfectly, and it should be easy to transcribe them so they do. – supercat Jul 03 '21 at 20:01
  • What rule of programming says that an application has to expose the files on which it works by filename? E.g., a word-processing document could display a list of document titles in its 'open' dialogue. It is not a fault of the file system that the application programmer took the lazy route. – dave Jul 03 '21 at 21:08
  • @another-dave, exposing files by title (or some other non-filename descriptor) only works as long as access is strictly limited to that one program. Once you get a second program in the mix, you need to settle on some shared convention, or the user will be reduced to opening everything to find the one file they're looking for. – Mark Nov 19 '23 at 20:56
9

One important historic reason is that many old computer architectures didn't have the concept of filename extensions.

For example the Tandem computers I worked with in the late 1990s had for their entire file system just the following: computer/drive/directory/filename

That was it, no subdirectories, no filename extensions, and every name limited to 8 characters.

Other systems had different but similarly limited constructs.

As such, defining a programming language or operating system (often such things were limited by hardware, hardcoded in the hardware) to require certain things like filename extensions would seriously limit their applicability on different platforms than the ones they were originally created on. For operating systems this was often not a problem as they were designed for one specific hardware specification, but programming languages have always been intended for implementation on a variety of platforms so you wanted them to be as flexible as possible.

jwenting
  • 199
  • 2
  • 1
    Not just old. The very concept of extension is a Microsoft-ism. Unix-based systems typically do not give any special meaning to dots within filenames. They infer file types by examining the first bytes of the content, not from the filenames. – spectras Jul 02 '21 at 12:11
  • 16
    @spectras - "Extensions" is very much NOT a Microsoftism. Most DEC systems (at least from the PDP-6 and PDP-10 monitor onwards) have had extensions, and stored in a separate field from the actual filename. You'll recall that Gates and Allen did initial BASIC development on a PDP-10. See page 14 in the PDP-6 system manual from 1965, – dave Jul 02 '21 at 12:40
  • 2
    Files on CTSS (1961 onwards) had what CTSS documentation described as "two names", one of which functioned as a type-of-content identifier. So, essentially the same as what we now call an 'extension", but the syntax was dfferent. – dave Jul 02 '21 at 14:34
  • 8
    DOS had 3 character extensions because CP/M had 3 character extensions and DOS was originally targeted by Seattle Software at the CP/M machine market. But file extensions did not originate with CP/M either. – jwdonahue Jul 02 '21 at 20:06
  • 1
    @another-dave You are right, I totally missed the point I wanted to make. What I meant to say is that the existence or the absence of extensions is not a matter of old vs new design. Plenty of modern designs do not have extensions when one goes out of Microsoft world. – spectras Jul 03 '21 at 16:47
  • 8
    In that sense, NTFS doesn't actually have extensions either. ".FOO" is just part of the filename on NTFS, just as it is on most Unix file systems (and X.FOO.BAR.MUMBLE is perfectly reasonable). Only in the older FAT variants is the extension a separate field in file system metadata. Admittedly, the Windows API has some (IMO, ill-advised) funkiness about trailing dots in an attempt to paper over the FAT/NTFS differences. In short, the systems are not actually as different as it might appear - it's just the historical conventions that make them look that way. – dave Jul 03 '21 at 17:31
  • 1
    Just because not all systems have the concept of extensions doesn’t mean it’s not worthwhile to establish a convention for the ones that do. – user3840170 Jul 23 '21 at 17:29
  • @user3840170 convention yes, requirement no. And the convention was pretty much established to be either .CC, .cpp, or .cxx (with by now .cpp being I think pretty much universal). – jwenting Jul 27 '21 at 11:10
  • Requirement for what? The question is why didn’t Stroustrup pick a recommended extension for C++ source files, for systems where such concepts are meaningful, so that one can either follow the recommendation and have some amount of assurance that others will usually do likewise, or ignore it and be prepared to deal with any fallout. Multiple competing and mutually incompatible standards is the opposite of such a convention. – user3840170 Jul 27 '21 at 14:45
-9

Linux does not need extensions. The type of file is stored in file metadata (Linux system).

And the extension is used for the OS desktop to open file in a specific tool only. Like if your file is .docx, it will be opened in Microsoft Office or LibreOffice.

Peter Mortensen
  • 260
  • 2
  • 7
  • 20
    No, Linux doesn't store file type in some type of metadata. Linux file managers can know what type a file is without extension through the use of libmagic or XDG magic files. – ElementW Jul 01 '21 at 10:32
  • both linux and other operating systems and applications on them can guess file types based on the file content. Of course that guess doesn't have to be very accurate (e.g. a million different xml schemas). – jwenting Jul 01 '21 at 12:31
  • 9
    C++ is way older than Linux so how is that relevant to anything? – pipe Jul 01 '21 at 15:15
  • 2
    @pipe: But Linux is an implementation of Unix, which is way older than C++ :-) And for the OP, a "desktop" in *nix is merely an application. Any sort of metadata used by a particular application (and associated tools) is specific to that application environment only. Thus if Gnome stores some file metadata, it won't be recognized by KDE (unless the developers shared), and certainly not from the command line. – jamesqf Jul 01 '21 at 15:47
  • 8
    There are OSes that use metadata to determine what type a file is, but Linux is not one of them. Examples of metadata using OSes are MacOS and BeOS. – Glen Yates Jul 01 '21 at 18:14