When & why did pointers start to become viewed as risky?

Question

It seems that there has been a gradual shift in thinking about the use of pointers in programming languages such that it became generally accepted that pointers were considered risky (if not outright "evil" or similar aggrandizement).

What were the historical developments were for this shift in thinking? Were there specific, seminal events, research, or other developments?

For instance, a superficial look back at the transition from C to C++ to Java seems to show a trend to supplement and then entirely replace pointers with references. However the real chain of events was probably much more subtle & complex than this, and not nearly so sequential. The features which made it into those mainstream languages may have originated elsewhere, perhaps long before.

Note: I am not asking about the actual merits of pointers vs. references vs. something else. My focus is on the rationales for this apparent shift.

It was due to the decline of the Liberal Arts education. People could no longer comprehend Indirect Reference, one of the most fundamental ideas in computing technology, included in all CPUs. — , Sep 26 '17 at 17:43
@nocomprende: that's quite a claim you made there, perhaps you should provide some supporting evidence. — whatsisname, Sep 26 '17 at 18:18
Pointers are risky. Why do you think there has been a shift in thinking? There have been improvements in language features and hardware which allow writing software without pointers, although not without some performance penalty. — Stop harming Monica, Sep 26 '17 at 18:59
@Goyo as I specifically stated in the question, I am not disputing that pointers are risky. I was asking for specific developments which led that to be the generally accepted viewpoint. — StayOnTarget, Sep 26 '17 at 19:05
@DaveInCaz As far as I know that specific development was the invention of pointers. — Stop harming Monica, Sep 26 '17 at 19:17
@whatsisname if people understood Indirect Reference (because they had had a Liberal Arts education) then they would not have thought that pointers were 'risky', 'dangerous', 'hard to reason about', etc. Someone created C, and Unix was written (or rewritten) in it, and then it was all downhill from there! Oh, for the great minds of old time. — , Sep 26 '17 at 20:43
@nocomprende: what you just wrote is not facts nor evidence, just opinion. There were far fewer programmers in 1970, you don't have any evidence the population today is any better or worse at the "indirect reference". — whatsisname, Sep 26 '17 at 21:06
@whatsisname: There were far fewer programmers in 1970. These progammers did invent computing as we know it. Nowadays, any kid with an iPad can run Swift Playgrounds or Scratch and pretend to be a programmer. Guess who is better at "indirect reference"? — mouviciel, Sep 27 '17 at 08:06
Fascinating that a while back we had supercomputers straining every transistor to solve complex problems, and now we walk around with the equivalent in our pockets doing little except violating the patent on using XOR to display a moving cursor. The dumbing down and devaluation of computing capability, I tell ya! Why, when we have so much power, do we take so little advantage of it? We could have folded all the proteins in existence by now, if we preferred curing cancer to texting each other. — , Sep 27 '17 at 18:00
Pointers have always been considered risky, from day one. It was simply a compromise to move them from assembly to higher level languages. — Frank Hileman, Sep 29 '17 at 19:40

score 21 · Answer 1 · edited Sep 26 '17 at 21:13

21

The rationale was the development of alternatives to pointers.

Under the hood, any pointer/reference/etc is being implemented as an integer containing a memory address (aka pointer). When C came out, this functionality was exposed as pointers. This meant that anything the underlying hardware could do to address memory could be done with pointers.

This was always "dangerous," but danger is relative. When you're making a 1000 line program, or when you have IBM-grade software quality procedures in place, this danger could be easily addressed. However, not all software was being developed that way. As such, a desire for simpler structures came forth.

If you think about it, an int& and a int* const really have the same level of safety, but one has much nicer syntax than the other. int& could also be more efficient because it could refer to an int stored in a register (anachronism: this was true in the past, but modern compilers are so good at optimizing that you can have a pointer to an integer in a register, as long as you never use any of the features that would require an actual address, like ++)

As we move to Java, we move into languages which provide some security guarantees. C and C++ provided none. Java guarantees that only legal operations get executed. To do this, java did away with pointers entirely. What they found is that the vast majority of pointer/reference operations done in real code were things that references were more than sufficient for. Only in a handful of cases (such as fast iteration through an array) were pointers truly needed. In those cases, java takes a runtime hit to avoid using them.

The move has not been monotonic. C# reintroduced pointers, though in a very limited form. They are marked as "unsafe," meaning they cannot be used by untrusted code. They also have explicit rules about what they can and cannot point to (for example, it's simply invalid to increment a pointer past the end of an array). However, they found there were a handful of cases where the high performance of pointers were needed, so they put them back in.

Also of interest would be the functional languages, which have no such concept at all, but that's a very different discussion.

edited Sep 26 '17 at 21:13

agc

103

answered Sep 26 '17 at 17:30

Cort Ammon

10,962

3

I'm not sure it's correct to say that Java has no pointers. I don't want to get into a long debate about what is and isn't a pointer but the JLS says that "the value of a reference is a pointer". There's just no direct access or modification of pointers allowed. This isn't just for security either, keeping people out of the business of keeping track of where an object is at any moment is helpful for GC. – JimmyJames Sep 26 '17 at 21:13
6

@JimmyJames True. For purposes of this answer, the dividing line between pointer and not-pointer was whether it supported pointer arithmetic operations, which are not typically supported by references. – Cort Ammon Sep 26 '17 at 21:54
8

@JimmyJames I concur with Cort's assertion that a pointer is something that you can do arithmetic operations on, while a reference is not. The actual mechanism that implements a reference in languages like Java is an implementation detail. – Robert Harvey Sep 26 '17 at 23:05
Replies to comments: technically, the danger comes from (1) dangling pointers and/or references, (2) the compiler being given permission to emit code that dereferences (uses) pointers / references that are invalid, where invalid might be either null (which can be checked at runtime), or dangling (which requires additional design to make it checkable). – rwong Sep 26 '17 at 23:19
3

In general, C and C++ had voluntarily accepted membership into this dangerous club by allowing lots of "undefined behaviors" into the specification. – rwong Sep 26 '17 at 23:20
2

By the way, there are CPUs, which distinguish between pointers and numbers. E.g. the original 48-bit CISC CPU in the IBM AS/400 does that. And in fact, there is an abstraction layer underneath the OS, which means that not only does the CPU distinguish between numbers and pointers and forbid arithmetic on pointers, but the OS itself doesn't even know about pointers at all and neither do the languages. Interestingly, this makes the original AS/400 one system, where re-writing code from a high-level scripting language in C makes it orders of magnitude slower. – Jörg W Mittag Sep 27 '17 at 07:22
1

On the original AS/400, C programs were run in an emulator. Modern incarnations of the AS/400 are based on POWER, but they still need a special Tagged Address Mode in which integer registers have an extra tag bit that indicated whether the value is a pointer or a number. C programs, however, are run natively on the POWER CPU in normal addressing mode. – Jörg W Mittag Sep 27 '17 at 07:24
@JörgWMittag That is cool! I thought someone might bring up the difference in widths between integers and pointers on some systems, but what you describe is a totally different and really neat difference between them! Never expected that! – Cort Ammon Sep 27 '17 at 15:01
1

@CortAmmon: I don't know much about it, unfortunately, but you might be interested in the Burroughs B5000, then. The Burroughs B5000's CPU was the inspiration for the UCSD Pascal P-Code, the Smalltalk-80 VM and (via those two) the JVM. The B5000 was a machine specifically designed to run multiple languages at the same time, at a time when even running multiple programs written in the same language at the same time was a revolutionary step. (Specifically, those languages were ALGOL, LISP, and FORTRAN.) Its "assembly language" is actually a subset of ALGOL. – Jörg W Mittag Sep 27 '17 at 17:01
1

Lisp originally addressed the CPU directly, as 'car' refers to "content of the Address register" and 'cdr' refers to "content of the Decrement register" (ways of implementing a list). I heard that there were machines where Lisp was actually the "assembly language" level. Ow, my head hurts. – Sep 27 '17 at 18:07
The "alternatives" are as old as pointers in high level languages, or older. – Frank Hileman Sep 29 '17 at 19:41

score 12 · Answer 2 · answered Sep 26 '17 at 22:47

Some kind of indirection is necessary for complex programs (e.g. recursive or variable-sized data structures). However, it is not necessary to implement this indirection via pointers.

The majority of high-level programming languages (i.e. not Assembly) are fairly memory-safe and disallow unrestricted pointer access. The C family is the odd one here.

C evolved out of B which was a very thin abstraction over raw assembly. B had a single type: the word. The word could be used as an integer or as pointer. Those two are equivalent when the whole memory is viewed as a single contiguous array. C kept this rather flexible approach and continued to support inherently unsafe pointer arithmetic. The whole type system of C is more of an afterthought. This flexibility to memory access made C very suitable for its primary purpose: prototyping the Unix operating system. Of course Unix and C turned out to be quite popular, so that C is also used in applications where this low-level approach to memory is not really needed.

If we look at the programming languages that came before C (e.g. Fortran, Algol dialects incl. Pascal, Cobol, Lisp, …) some of those do support C-like pointers. Notably, the null pointer concept was invented for Algol W in 1965. But none of those languages tried to be a C-like, efficient low-abstraction systems language: Fortran was meant for scientific computing, Algol developed some quite advanced concepts, Lisp was more of a research project than an industry-grade language, and Cobol was focussed on business applications.

Garbage collection existed since the late 50s, i.e. well before C (early 70s). GC requires memory safety to work properly. Languages before and after C used GC as a normal feature. Of course that makes a language much more complicated and possibly slower, which was especially noticeable in the time of mainframes. GC languages tended to be research-oriented (e.g. Lisp, Simula, ML) and/or require powerful workstations (e.g. Smalltalk).

With smaller, more powerful computers computing in general and GC languages specifically did become more popular. For non-real time applications (and sometimes even then) GC is now the preferred approach. But GC algorithms have also been the subject of intense research. As an alternative, better memory safety without GC has also been developed further, especially in the last three decades: notable innovations are RAII and smart pointers in C++ and Rust's lifetime system/borrow checker.

Java did not innovate by being a memory-safe programming language: it basically took the semantics of the GCed, memory safe Smalltalk language and combined them with the syntax and static typing of C++. It was then marketed as a better, simpler C/C++. But it's only superficially a C++ descendant. Java's lack of pointers is owed much more to the Smalltalk object model than to a rejection of the C++ data model.

So “modern” languages like Java, Ruby, and C# should not be interpreted as overcoming the problems of raw pointers like in C, but should be seen as drawing from many traditions – including C, but also from safer languages like Smalltalk, Simula, or Lisp.

score 5 · Answer 3 · answered Jun 02 '18 at 06:00

In my experience, pointers have ALWAYS been a challenging concept for many people. In 1970, the university I was attending had a Burroughs B5500, and we used Extended Algol for our programming projects. The hardware architecture was based on descriptors and some codes in the upper part of data words. These were explicitly designed to let arrays use pointers without being permitted to walk off the end.

We had spirited classroom discussions about name vs value reference and how the B5500 arrays worked. Some of us got the explanation immediately. Others didn't.

Later, it was somewhat of a shock that the hardware didn't protect me from runaway pointers--especially in assembly language. On my first job after graduation, I helped fix problems in an operating system. Often the only documentation we had was the printed crash dump. I developed a knack for finding the source of runaway pointers in memory dumps, so everyone gave the "impossible" dumps to me to figure out. More of the problems we had were caused by pointer errors than by any other type of error.

Many of the people I've worked with started writing FORTRAN, then moved to C, wrote C that was a lot like FORTRAN, and avoided pointers. Because they never internalized pointers and references, Java poses problems. Often, it's challenging to understand for FORTRAN programmers how Object assignment really works.

Modern languages have made it far easier to do things that need pointers "under the hood" while keeping us safe from typos and other errors.

When & why did pointers start to become viewed as risky?

3 Answers3

Linked