34

In Pascal, nil (the pointer value to "nothing") is a reserved word. Why wasn't it simply a predefined identifier as true and false are, for example?

This is stated in PASCAL User Manual and Report p. 109, as well in the ISO documents ISO 7185:1990 (Pascal) and ISO 10206:1990 (Extended Pascal) both in section 6.1.2.

Leo B.
  • 19,082
  • 5
  • 49
  • 141
JeanPierre
  • 445
  • 4
  • 8
  • 2
    You could fantasize on implementations where NIL could be a function that would return different values over time. – tofro May 07 '18 at 10:08
  • 4
    Because this is Pascal. There is no such logic as "simply". Why you have to write PROCEDURE and FUNCTION for every function - in C there is no keyword for this, it is obvious and defined by the context. – i486 May 08 '18 at 12:03
  • 9
    @i486 Actually, C's decision not to have a keyword for function (or type) declaration is problematic for compilers and humans. This is why almost no other languages follow its lead.Off the top of my head I can think of only Java and C++. – JeremyP May 08 '18 at 14:36
  • @JeremyP Pascal syntax is oriented to beginners - to help beginner understand (or think that understands) the program. C syntax is oriented to professionals. If you think that any programming language exists or is necessary only for beginners, then Pascal is the "winner". I don't think. – i486 May 08 '18 at 18:20
  • 6
    Re pascal's PROCEDURE and FUNCTION keywords: Pascal's syntax is largely a result of the first Pascal compiler being recursive descent. For that, you have to be able to tell what you're parsing with only one token of lookahead. – Wayne Conrad May 08 '18 at 20:49
  • @i486 It's one of those little things that allowed Turbo Pascal to be the unbelievably fast compiler that it was (and even more so if you didn't have a hard drive). But the logic is quite simple - procedures and functions are different things. Why would you define them the same way? They're different things in C too; void is not a real type. What's the benefit of doing it the C way? Some languages do treat procedures as functions that return e.g. unit, but C is not one of those languages. – Luaan May 09 '18 at 08:54
  • @Luaan The procedure is a function without return value. Or there is something different? Are they both sequences of commands - yes. Maybe you can answer why Unix and Linux are written in C, and not in Pascal? The compilation would be very fast... The development will be slow but this is not important ...(irony) – i486 May 09 '18 at 09:01
  • 2
    @i486 The motorbike is a bicycle, just without an engine. Totally the same, right? Unix is written in C because the guys who wrote the original Unix also wrote C. LISP would have been the obvious go-to language at the time. Pascal was used for the Apple II's OS, and they certainly didn't complain "the development was slower than with C" - quite the opposite. The vast majority of the code you write in Pascal was much simpler than in C, and the rest you probably wanted written in assembly anyway. Suggesting that having to type "procedure" makes Pascal "slower to code" is ridiculous. – Luaan May 09 '18 at 09:17
  • @Luaan Which Apple II OS is written in Pascal? Maybe UCSD which is not OS for Apple II (only) and is the most dramatic thing useful only for education. All other OS-s are written in assembly - or you don't agree? The development is slower of course because of all PROCEDURE/FUNCTION and := assignments. All extra characters need time to enter and time to read on screen. Linux is written decades after Unix not by the guys who wrote C. But it is not in Pascal. Is this mistake? – i486 May 09 '18 at 09:25
  • 1
    @Luaan The distinction between PROCEDURE and FUNCTION is has nothing to do withnil, and is not a rate-determining step in a Pascal compiler. A bicycle is a motorbike without an engine, not the other way around. Lisp was never an obvious choice for writing operating systems at any time. – user207421 May 09 '18 at 09:56
  • 3
    @Luaan Lisp wasn't the obvious choice at the time, it wasn't even an option. Common Lisp didn't exist yet (although of course there were other lisp implementations), and it would be another couple of decades for Lisp Compilers to come out that both ran fast and produced fast code.

    at (i486): Text I/O might've sucked at the time, but text I/O never sucked so hard that it actually was a limiting factor during development. If writing fewer characters would lead to faster development we'd all be writing in codegolf languages.

    – Cubic May 09 '18 at 10:21
  • @Cubic I'm talking about Lisp machines, not common lisp, though I'm not sure what common lisp has to do with anything. It's not like the Unix guys waited for "standard C" to write their OS - they just used what they had (and improved it as necessary). – Luaan May 09 '18 at 13:31
  • @i486 Most OSes at the time were written in assembly, yes. C was unique in how easy it was to port to other architectures - it was more like a high-level assembly than a full blown programming language. Today's assembly languages do more advanced stuff than C did back in the day. And as much as I like Linus Torvalds, by that point C was already a popular language in wide use for everything and their kitchen sink. I doubt he did any deliberations on what language to use. And since you referred to his design decisions, take a note of what he thinks about "saving characters" in the source code :) – Luaan May 09 '18 at 13:39
  • 1
    @Luaan I reiterate. Lisp machines weren't a thing when Unix development happened. Lisp was used by researchers, but no sane person would've considered writing an OS based on Lisp at the time (hence, no one did). – Cubic May 09 '18 at 16:16
  • @i486 Pascal needed the FUNCTION and PROCEDURE prologs because it supported nested constructs of them. If C would support nesting of procedures (which it doesn't), it would most probably have the same keywords. – tofro May 10 '18 at 05:53
  • @i486 I said most modern languages have an equivalent to PROCEDURE and FUNCTION not just Pascal, although most don't differentiate between the two. For example, in Swift functions are defined using the func keyword. It makes the compiler easier to design and the code easier to read (for everybody, not just beginners). – JeremyP May 10 '18 at 13:08
  • 1
    @i486 Also Unix was not the first operating system to be written in a high level language. It was preceded by MCP for the Burroughs large machine architecture, which was written in ALGOL which is the direct ancestor to Pascal and has PROCEDURE and FUNCTION. – JeremyP May 10 '18 at 13:13
  • 1
    @JeremyP "This is why almost no other languages follow its lead. Off the top of my head I can think of only Java and C++." - you can add also C#. And when we get C/C++/C# and Java - what other languages remain? These four are major and keep 90%+ of the real programming (of real and fundamental software, not experiments and exercises). – i486 May 10 '18 at 13:16
  • 2
    @i486 What languages remain? Python, Swift, Go, Kotlin, Pascal, Algol, Ada, Shell script, Basic, Haskell, Rust. You need to get out more and try some well designed languages. There's more to life than C and its immediate descendants. – JeremyP May 14 '18 at 15:32
  • @JeremyP Please compare the world share of usage of C/C++ and Algol + Ada. Do you see difference? Maybe 99.9% to 0.1%. Exotic, archaic and never widely used languages. – i486 May 14 '18 at 17:18
  • @i486 Well let's be honest. Most of the World's programming will be in VBA macros for Excel. Also don't forget Javascript which is on course to take over the world (sadly). – JeremyP May 15 '18 at 09:15
  • @JeremyP: algol uses a keyword, but the same one for both valued and nonvalued: PROCEDURE in a60, PROC in a68 (which abbreviated other things too). PL/I allows either of those. Pascal and Ada split to PROCEDURE and FUNCTION; Fortran already had SUBROUTINE and FUNCTION. Original Bourne shell had and POSIX/SUS still has only the parentheses foo() syntax; the Korn function foo syntax is a common extension but not universal. Although shell is scriptable, I'd call awk and perl more 'programmy' and those use function and sub respectively. – dave_thompson_085 Jun 26 '18 at 05:07

5 Answers5

65

The definition of PASCAL is, above all else, intended to be simple. PASCAL was designed as a pedagogical language (with aspirations to be useful for commercial purposes, but that was a secondary concern). For this purpose, the definition had to be small and orthogonal so that it could be explained simply and concisely. For ease of implementation, the number of special cases had to be kept to a minimum.

The boolean type is handled by the system as simply an example of an enumerated type. It is effectively equivalent to having a definition

type boolean = (false, true)

automatically included in the program. Specficially: it can be implemented by entering false and true into the symbol table with associated type of boolean. They can then never be used for any other type, as PASCAL identifiers are associated with a single type only in any scope.

nil, however, could not be defined by an existing language mechanism. The language simply does not provide any means of creating a pointer value other than via new, which creates a value while nil refers to an absence of value. Therefore, a new language feature was required to implement it, so a new keyword was added for that feature.

Also, nil does not behave the same way an identifier does: it does not have a predetermined type. The type of a nil expression is determined by its context -- it may become any type pointer that is required to make the expression type check. If it were implemented as an identifier rather than a keyword, that would have required a special case for polymorphic identifiers, of which only a single instance was required for the language and no way provided of defining new ones: clearly not a useful way of approaching the problem.

Jules
  • 12,898
  • 2
  • 42
  • 65
  • 1
    The fact that there would be no way to create nil if no predefined form existed wouldn't have to be a problem. Many predefined symbols for built-in procedures and functions would behave likewise. On the other hand, in Pascal dialects without a generic Pointer type that can be implicitly coerced to any other, there would be no way to create a constant which behaved like nil even if one could use nil to set its value. – supercat May 07 '18 at 17:14
  • 1
    Java has null for similar reasons to Pascal, any language that discourages direct pointer manipulation but allows for pointers needs a way to detect if a pointer is uninitialized. – Michael Shopsin May 09 '18 at 15:35
  • 4
    @MichaelShopsin - right. C got away without needing a keyword (at least until recent versions) due to the fact that it allows conversion of integers to pointers, but both Java and PASCAL prevent that. (Interestingly, I see a lot of people think of Java as being a stripped-down C++, but my experience was the opposite way around - I've always viewed it as an enhanced Object Pascal ... perhaps because the old OO extensions to Turbo Pascal were my first experience of working with an object oriented language, but I do think it really is a more similar system in many respects) – Jules May 09 '18 at 16:19
  • @MichaelShopsin Not really. I'm not that familiar with Pascal, but Java certainly doesn't need null. It just has it because C++ had NULL (which had it because C had NULL (which took it somewhere else probably, I'm not a computer historian)). – Cubic May 09 '18 at 16:21
  • 2
    @Cubic - the semantics of Java can't work without null. What should references in objects be initialized to by default? Not having a way to do that is a showstopper in a language with Java's structure. It can't just automatically create an object, but the semantics require a default value for every field in an object, and requiring the developer to provide one for everything would be too much hard work. – Jules May 09 '18 at 16:24
  • @Jules like you I used Object Pascal before C++, and Java's philosophy seems closer to OOP than C++. The fact that Java is written in C++ doesn't mitigate the the differences between languages. – Michael Shopsin May 09 '18 at 18:19
  • @Jules Simple. There shouldn't be such a thing as "default" references in the first place. – Cubic May 09 '18 at 20:10
  • @Cubic - having every reference contain a valid value at all times is important in order to guarantee type safety. If a reference could be used without being initialized to some known-valid value, you could use that ability to access an object through a reference of a different type, thus breaking the security constraints of the language. Java must have a way of being sure that all references are valid, which means that there must be a default value that can be applied to all of them. The only alternative is not allowing a reference to be initialized without an object as its target (i.e... – Jules May 10 '18 at 07:15
  • 1
    ... a reference must always point to a valid object, so you can't have a reference of a type if you don't have an object of that type first), but that results in a substantially different style of language that would be difficult to achieve Java's goals with. It could be done with modern Java, but don't forget Java 1.0 didn't have generics, so implementing something like Optional couldn't be done there. And null is a much simpler mechanism to implement than generics... – Jules May 10 '18 at 07:18
  • as PASCAL identifiers are associated with a single type only in any scope. Really? Never heard about that, can you provide a source? – JeanPierre May 12 '18 at 07:16
  • You point about the polymorphic type of nil sounds good. However, there are also (more or less) polymorphic functions and procedures that are predefined identifiers (read, write, abs). – JeanPierre May 12 '18 at 07:27
  • @JeanPierre - I'm pretty sure that read and write are also defined as keywords rather than identifiers. I'd have to check about abs, I don't remember anything special about iit. – Jules May 12 '18 at 09:30
  • @JeanPierre - looking at the revised language report, it seems the original definition of read and write only allowed characters as arguments (other argument types are described as a non-standard extension in CDC6000 Pascal). However both abs and sqr were defined polymorphically in the report. – Jules May 12 '18 at 09:53
  • @Jules Indeed. I was refering to the "Juli 1973" version of the report where the 13rd section Input and Output defines read as accepting char, integer, or real and write as char, integer, real, Boolean, or any array of characters. Thanks for pointing out this earlier version of the report! A 1970 version is also available at http://www.standardpascal.com/documents.html – JeanPierre May 12 '18 at 10:46
20

Unlike Boolean constants, the value of NIL cannot be assigned a particular type. That's why it has to be parsed in a special way, that is, it has to be a keyword.

Another reason for NIL to be a keyword is to disallow explicitly dereferencing NIL by writing NIL@. If NIL were a predefined constant of any pointer type, even a magic polymorphic pointer, it would have been syntactically allowed.

Leo B.
  • 19,082
  • 5
  • 49
  • 141
  • About you first point, it could be assigned a "special type", but I think you mean it's simpler to consider it a keyword; see my comment on @Jules' answer. Your second point is very interesting; what makes you think it is a reason for making this design choice and not a consequence of the choice? – JeanPierre May 12 '18 at 07:40
  • 1
    @JeanPierre That's right; the last paragraph of @Jules answer echoes what I meant ("...it has to be a keyword, otherwise things will get messy"). My second point is that if you want to disallow NIL@ in the compiler, then doing it at the syntax level by making NIL a keyword is much easier than doing it at the semantic level. It is hard to day now if this consideration has influenced the design choice, but as Pascal was a teaching tool, Wirth must have heard the question "What happens if you dereference NIL?" enough times to take care of it in the compiler. – Leo B. May 12 '18 at 16:38
5

Because neither 0 nor false is a pointer. Pointers are pointers and not numeric values that can be used in directly in mathematical expressions. Assigning false to a pointer doesn't make any sense so Pascal assigns nil to it.

Talking about "sense" there's a famous example in C++ where previously pointers were generally compared with the literal 0 or macro NULL and that posed several issues. For example NULL*3 would make no sense but it's valid. Using 0 also creates some confusion/feeling that the null pointer always contains 0 value. That's not correct. A null pointer is just a pointer that points to nothing, the binary representation isn't important and there are architectures that use non-zero values to indicate a null pointer.

Since C++11 they created a new keyword for it: nullptr which solves the problem and making pointer comparison a lot more sense.

So Pascal uses nil right from the beginning and there was no similar problem. It also makes Pascal more typesafe. Other languages like Java or C# also have the null keyword for the same reasons.

phuclv
  • 3,592
  • 1
  • 19
  • 30
  • 4
    This isn’t accurate: C++11 did create a keyword for it but they didn’t have to: they chose to. nullptr could be defined in user code. In fact, here it is: std::nullptr_t constexpr my_nullptr{}; (admittedly this is somewhat circular since nullptr_t is defined in terms of nullptr but that, too, isn’t necessary). – Konrad Rudolph May 07 '18 at 15:46
  • 3
    nullptr_t's addition to C++ helps do what the caller expects with regard to templating and overloading; NULL has the nasty habit of silently turning into an int. I guess this answer makes the argument that C++11 had to add a new keyword in order properly to separate the types, so therefore it's no surprise that there's a keyword in Pascal given that the types have always been unambiguously non-convertible? – Tommy May 07 '18 at 17:53
  • 1
    @Tommy: Types that are unambiguously non-convertible aren't a problem. The problem is that overloading source-code constructs for different purposes will wreak havoc if later extensions to the language make things convertible. For example, in early versions of Java, references and integers were not implicitly convertible, but later versions added implicit boxing (which was reasonable) and unboxing (which was less so). If there had been separate comparison operators for references and integers, adding implicit boxing but not implicit unboxing would have been fine, since... – supercat May 07 '18 at 18:55
  • 1
    ...code that tried to integer-compare an Integer (reference) to an int would be rejected at compile time unlesss the Integer were explicitly converted (indicating a programmer's recognition of a conversion that might fail). Absent implicit unboxing, however, the existence of implicit boxing would cause the comparison to be evaluated by boxing the "int" and then doing a nonsensical reference comparison. – supercat May 07 '18 at 18:59
  • @supercat I think my mistake there was in drifting too far from the original question of 'why a keyword, not merely a predefined identifier'; I think what I was trying to say is that given that reference types are unambiguously non-convertible in Pascal, not to integral types and not to other types of reference, there's no value that could be assigned to any reference type. Conversely a keyword can have a context-dependent meaning if it wants, so can always be "the sentinel for no value for whatever reference type I'm being assigned to". – Tommy May 07 '18 at 19:41
  • 1
    @Tommy: Indeed, there are only a few things one can do with nil. It can be used on a comparison operator with another pointer, it can be used on the right-hand side of an assignment operator, or it can be passed to a function. Except for the latter case, a simplistic compiler may be able to generate better code for "clear a pointer" or "check a pointer against null" than for "load a null value and then store it" or "load a null value and then compare it". – supercat May 07 '18 at 20:00
  • I don't see why making nil an id would imply to give it an integer or boolean value. It has a pointer value (which may be internaly coded as 0 or whatever, but this wouldn't be visible by the programmer). – JeanPierre May 12 '18 at 07:46
  • @JeanPierre it's visible to the programmer if the language designer wants to. As I said, if there were no nil keyword but instead some numeric constant like in C++, nil*3 would be a valid expression. And comparing if nilpointer = 0 would confuse many people because pointers are not numerical types – phuclv May 13 '18 at 12:42
4

There are excellent answers in this thread.

The clearest answer is that nil cannot be a value because it has no type. Whereas true and false are understood to be values of enumerated, i.e.,

type boolean = (false, true);

What type of pointer would nil be an instance of? Recall pointer definitions are:

type p = ^integer;

nil, if it is a value, is compatible with all pointer types.

Second, there is no other fixed value that can be assigned to pointers. In the example:

type p = ^integer;

var ip, xp: p;

new(ip);

xp := ip; ip := nil;

Thus we can see:

  1. nil is typeless.
  2. nil is the only pointer value that is fixed and not generated at runtime.

So nil does not behave like any other compiler constant, so it is not one, but is a keyword.

phuclv
  • 3,592
  • 1
  • 19
  • 30
Scott Franco
  • 139
  • 2
3

null pointer has a value reserved for indicating that the pointer does not refer to a valid object

Wikipedia is having good explanation of it. Thus null, or nil, is a value of pointer, chosen by the compiler, which, by default, is not valid within current application environment, or is not reachable by the user code.

Why this word "nil" ever exists? For convenience - many functions returning pointers sometimes need to return "invalid pointer" to the calling code, and this calling code will check for this making decision on successful execution of the function. Very good example is memory allocation routines. If there's enough RAM to allocate, they will return valid pointer to the area, if no more RAM - it will return value of null (nil), and calling program will know that malloc (for example) failed to find the space requested.

Why wasn't it simply a predefined identifier as true and false are, for example?

Because Boolean logic constant or variable can only be true or false, but pointer is having wider range of values (up to size of addressable CPU/MPU space).

@JeanPierre:

but I don't see how the range of pointer values affects the possibility to predefine an id with one specific value of this range

The exact value is selected at the time of compilation by the compiler according to its settings and architecture of the target system. nil or null exists to inform compiler to apply this value to all pointer operations. Writing code like if(ptr==0) may not be portable and not function properly because some systems may have pointer equal to 0 as valid addressable value.

Anonymous
  • 1,296
  • 8
  • 11
  • 5
    This does not appear to answer the question, but rather explain what a null pointer is. – O. R. Mapper May 08 '18 at 10:21
  • I agree with your description of the purpose of nil, but I don't see how the range of pointer values affects the possibility to predefine an id with one specific value of this range. – JeanPierre May 12 '18 at 07:50