18

Quoting from Programming for Performance exercise:

early versions of the MIPS processor had an "exposed pipeline" (that is, the assembly language programmer needed to know the latencies of operations and had to insert NO-OPS or other operations between dependent instructions to guarantee correctness). Later versions of the MIPS processor abandoned this idea.

The above is what I was able to find by googling "CPU with exposed pipeline" (mentioning "CPU" in the search string is important). The MIPS architecture was introduced in 1981 (unfortunately, the wiki page doesn't mention the exposed pipeline, except in the acronym expansion without explanation).

It is my understanding that VLIW architectures which also have an exposed pipeline, came later. Is that true? Was MIPS really the first one?

Toby Speight
  • 1,611
  • 14
  • 31
Leo B.
  • 19,082
  • 5
  • 49
  • 141
  • 6
    This doesn’t answer your main question; but yes, VLIW architectures have an exposed pipeline. Statically pipelined CPUs take this to extremes (but aren’t necessarily VLIW). – Stephen Kitt Apr 05 '17 at 21:07
  • 2
    And the currently "newest" CPU with exposed pipeline which is not VLIW is probably the Mill architecture. RISC started more or less with MIPS, and DSP wasn't much earlier, so I doubt there are earlier exposed pipeline architectures than the MIPS-1. – dirkt Apr 06 '17 at 11:20
  • 1
    @dirkt That's right. A few years ago I went to a few of the Ivan Godard's presentations. – Leo B. Apr 06 '17 at 16:14
  • 2
    I have been unable to find a definitive statement, but reading about the architecture of the CDC 7600 (1968) has left me with the impression that its pipeline was exposed, although more for the reason that its designers simply never considered an alternative than for any specific architectural benefit – Jules Aug 03 '18 at 22:38
  • @Jules Could you tell what in particular in the CDC 7600 architecture contributed to that impression? – Leo B. Aug 03 '18 at 22:42
  • @leob. The main thing is that the architecture reference manual goes into a great amount of detail about the pipeline length of various instructions, or the fact that after an instruction is issued to the multiply/divide unit there must be a delay of 2 cycles before another can be issued to the same unit, but I haven't seen any description of a mechanism that delays instructions that would cause a pipeline hazard. – Jules Aug 03 '18 at 23:00
  • 2
    .... And I just found such a mechanism, described on page 2-16, a set of register reservation flags. So it looks like it wasn't exposed. – Jules Aug 03 '18 at 23:16
  • 1
    @dirkt With lots of regrets I notice that your link to the Mill architecture on Wikipedia seems to be dead. – gnasher729 Dec 12 '22 at 19:21
  • 1
    @gnasher729 Probably some single person decided "not relevant" on a personal whim... The Mill CPU homepage is still there, but IIRC they don't have a nice summary of the architecture. – dirkt Dec 13 '22 at 05:34
  • "the assembly language programmer" - if I recalled correctly this was for compilers. – Thorbjørn Ravn Andersen Dec 21 '22 at 22:15
  • @ThorbjørnRavnAndersen I don't get what you're trying to imply. – Leo B. Dec 23 '22 at 04:36
  • Not implying anything. Just stating that the architecture did not target human programmers but compilers. – Thorbjørn Ravn Andersen Dec 23 '22 at 15:42
  • @ThorbjørnRavnAndersen If you're still talking about the MIPS architecture, what are you basing your statement on? The restrictions imposed by the architecture were trivial enough; nothing precluded human programmers from learning the rules and following them. – Leo B. Dec 24 '22 at 05:36

4 Answers4

16

The MIPS architecture was introduced in 1981

Are you sure? To my information the first MIPS implementation was of 1985 with the R2000. Of course, the project did start before (in 1981), but so did others.

It is my understanding that VLIW architectures which also have an exposed pipeline, came later. Is that true?

As far as I can tell, yes.

Was MIPS really the first one?

Not really. For (modern) microprocessors Berkeley RISC I (the foundation of later SPARC) was 2-3 years ahead of MIPS, as their first working chips came in 1982. Berkeley RISC did not only coin the name RISC (and vanished somewhat behind after it became the standard term), but also featured a branch delay slot exposing the pipeline when branches were about. Here the compiler (or programmer) would place the last instruction to be done before a branch is taken after that branch.

But then there are minis, especially the IBM 801 (*1), which was defined in 1976. It had its first working implementation in 1978, first commercial usage in 1980 and first single-chip implementation (as ROMP) in 1981 (*2). Looking at their 1976 overview paper shows that they already incorporated almost every aspect of what got 'invented' half a decade later as RISC - including a separate set of branch instructions, called branch and execute (*3), where the next instruction in sequence after a branch will be executed anyway - today called a branch delay slot.

As of my understanding that makes the 801 implementation of 1980 the first.


*1 - It's debatable if that architecture really is a mini, as it is not only very /370ish, but also has been used as microcode engine for /370 implementations.

*2 - Fodder for what-if-freaks: What if IBM had used in 1981 their own 32 bit ROMP instead of Intel's 8088 for their PC (while also making the chip available to other manufacturers) :))

*3 - By having two sets of branch instructions they even avoided the need of inserting a NOP if there was no usable instruction - like with two successive branches. In reality it was more like a bit in the branch opcode telling whether the next instruction is executed or a virtual NOP is inserted.

Toby Speight
  • 1,611
  • 14
  • 31
Raffzahn
  • 222,541
  • 22
  • 631
  • 918
  • 8
    I recall the Berkeley RISC folks being keen on the idea that compiler technology had advanced to the point where it was okay to do things in the CPU architecture that might tend to trip up mere humans. Since most programming was being done in high-level languages, oddities like delay slots weren't going to discourage acceptance of an architecture. – fadden Aug 02 '18 at 17:43
  • 1
    Are you sure? At the time of my asking the question, the introduction year in the information inset on the wiki page was 1981, to wit :) – Leo B. Aug 02 '18 at 18:22
  • 2
    The PDF file Tommy refers to mentions "early 1981" as the starting date of the project vs "1980" for Berkeley RISC. This makes it hard to figure out who was the first to come up with the idea of the delay slot. – Leo B. Aug 02 '18 at 18:43
  • 1
    Yes, as 1981 was the year the project started - and it wasn't until 1984 until the first implementation could run a program. Similar the Berkley project started in 1980, and it took until 1982 for the first working microprocessor system. And usually the time a system was working as date, not when an idea was formulated. Isn't it? – Raffzahn Aug 02 '18 at 18:45
  • 1
    The idea itself may be even older than that. Maybe even Aristoteles thought about? :)) Ideas are often around decades or even centuries before they can be implemented, and even longer until they may take of. so having an 'idea' is hardly a milestone to compare with. – Raffzahn Aug 02 '18 at 18:46
  • 1
    Given that both processors were research projects I'd suggest that whomever published first with real empirical data should get the prize. I'm assuming that being first as per the question is being used as a proxy for being influential and that both sets of academics were keeping up with the field beyond their institutions. Even if you agree that's a good test though, I think you're correct to conclude it's Berkeley in the absence of any evidence re: specific publications, given the earlier start date. – Tommy Aug 02 '18 at 20:49
  • 1
    @Tommy I think that Raffzahn's main point is that the Berkeley RISC project was first to produce a working chip. – Leo B. Aug 02 '18 at 20:54
  • 1
    @LeoB. I think his main point is that IBM did it in 1980 with the 801. On the digression of Berkeley versus MIPS (effectively: Berkeley versus Stanford; you can bet these guys knew each other socially), I fully understand his argument. That's why I say so in my comment, and why I've given this answer my vote. – Tommy Aug 02 '18 at 21:06
  • @Tommy There is no sense in using different definitions for different CPUs. For a /360 we use the date it got anounced as product, similar for a Z80 or a Pentium. Why also should a SPARC, MIPS or IBM801 be judged different? Also, IBM would beat both with it's 1976 date:) – Raffzahn Aug 02 '18 at 21:07
  • @Tommy Also, I developed this answer in two stages, first naming SPARC (Berkley) as early one, as that I had in memory, then looking up the 801, eventually the granddaddy of all RISC, to clear their dates. So yes, my first main point was that Berkley did beat Standfort - but it got superseded by IBM beating both - I still wound't be suprised to find some obscure mini/mainframe of the early 50s aready including workings that may be described as 'exposed pipeline' – Raffzahn Aug 02 '18 at 21:11
  • @Raffzahn my argument, which I assume by implication that you disagree with and which I do not think has any bearing on the correctness of this, your correct answer, is that academic projects such as Berkeley's exist to further research. A commercial product is a happy spin-off, after the research has been successful. That's why different judgment might be applied. I mean, 'who was first' is sort of an artificial question, but when something works at IBM it goes into production. When something works at Berkeley it goes into print. Maybe it goes on sale one day too. – Tommy Aug 02 '18 at 21:12
  • 2
    @Raffzahn it's not only a good answer, it's the best answer here. It is correct. I apologise if you think I'm asking you to defend it. All I'm suggesting is an alternative prism for evaluating academic efforts, because they have a different objective. It's moot in any case, since your IBM answer usurps both other candidates. – Tommy Aug 02 '18 at 21:14
  • 2
    @Tommy No offense taken. And I do understand your argument here, in fact, personally I would even use IBM's 1978 setup of a discrete implementation purely for demonstartaion as the date, not one of the later ones, as this is on par with what we use to determinate some other first - the first day the Manchester Baby was operational, the first time the Mailüfterl operated and so on. These are clear milestones. As for our purpose I want to use a scale that can cover a wide range of machines, and to me that's whenever an implementation was operational. – Raffzahn Aug 02 '18 at 21:22
  • Somewhat related: https://www.cnet.com/news/risc-chip-inventors-hennessy-patterson-win-computing-turing-prize/ – fadden Aug 03 '18 at 16:31
  • 4
    @fadden Of course the CPU folks would be keen on that idea. Meanwhile, the compiler folks were probably keen on the idea that CPU technology had advanced to the point where compilers didn't need to do so much work to make programs run fast... – user253751 Jan 10 '21 at 13:59
13

"A Retrospective on MIPS: A Microprocessor Architecture", authored by those that designed MIPS from the beginning, states:

The absence of hardware interlocks (to delay an instruction if one of the operands wasn’t ready) was a tradeoff...

[...]

The team wanted to pick a name for the project that emphasized performance. About nine months earlier, the RISC project at UC Berkeley had started, so we needed a catchy acronym. “Million instructions per second” (MIPS) sounded right, given the project’s goals, but this metric was also known as the “meaningless indicator of processor speed.” So, we settled on “microprocessor without interlocked pipeline stages.”

So: (i) absence of interlocks is the reason for the exposed pipeline; (ii) was also the feature that gave the project its name; and (iii) in picking a name, they tried to avoid meaningless titles.

I'd therefore suggest that it's likely that the MIPS processor was the first processor with an exposed pipeline, on the grounds that it's the feature the project was named after — it was one of the things they considered to make their work unique.

Tommy
  • 36,843
  • 2
  • 124
  • 171
  • 1
    Another interesting aspect of exposing the microarchitecture to the software is the mechanism of recovery from an internal interrupt like a page fault. The kernel may have to undo register modifications inherent in the addressing mode in order to re-execute the instruction if the hardware hasn't saved the initial register values and restored them before trapping. This saves logic but complicates interrupt handling. Figuring out what to undo may be guided by flags in the interrupt status or require decoding of the faulted instruction. I haven't come up with a meaningful question about this. – Leo B. Apr 06 '17 at 19:07
6

I believe it was the IBM 360:

Kosik remembers a time when the IBM 360 mainframe was ubiquitous. These water-cooled behemoths would occasionally literally spring a leak, and “they were not waterproof,” says Kosik. Burst hoses and leaking valves would lead to an immediate shutdown and a visit from a technician. In case of a failure in the building cooling system, the IBM 360 even had an optional bolt-on water storage tank.

davidbak
  • 6,269
  • 1
  • 28
  • 34
1

Interesting enough, that ARM, while not being the first, has kind of exposed pipeline. Specifically, ARMv3 32bit instructions that use contents of R15 aka PC, always have it incremented by 8 from the address of the instruction. In earliest ARM architecture revisions (like the one used in ARM7TDMI or earlier) that probably was an exposure of pipeline (in a way), but later certainly became just a feature emulated for compatibility.

lvd
  • 10,382
  • 24
  • 62
  • 1
  • @LeoB.: On the PDP-11, R7 holds the address of the next instruction. On the ARM, it holds an address 4 beyond that which unfortunately, precludes the use of PDP-11-style "immediate" addressing [since one would need an addressing mode that fetches the byte at [R15,#-4], while loading R15 with the value it already holds (writing R15 prevents the execution of the instructions at [old R15,#-4]). One could perhaps use [R15,#4]! addressing mode [post-increment, like the PDP-11] in 32-bit ARM mode, but that would require inserting an unused word between the load and the immediate value. – supercat Aug 06 '18 at 16:06
  • @supercat I see. My point is that these details expose not the pipeline but the microprogram. The famous PDP-11 backwards-self-replicating instruction 012727 would not have worked with a different microprogram. – Leo B. Aug 06 '18 at 19:23
  • ARM7TDMI (taken as an example) has 3-stage pipeline and PC+8 paradigm fits nicely with the pipeline: while the instruction that addresses through PC passes from fetch to execute stage, PC increments twice (as new instructions fetch through) and holds PC+8 value. In contrary, PDP-11 has neither prefetching nor pipelining and bothering with PC after the instruction fetched and has started execution shows exactly +2 increment. – lvd Aug 07 '18 at 14:01