Looking for help understanding a 'Reciproot' routine on the Manchester Mark I (1951)

Question

I'm hoping to understand a routine written for the Manchester Mark I [NOTE: a comment from Raffzahn suggests this may be the Ferranti (wikipedia link) and not the Manchester Mark I] which computes the reciprocal square root (1 / √x). The Mark I had a small number of pre-written subroutines, 10 of which were in the manual for the Mark I when it was re-written in 1951 (Alan Turing's original manual was hard for some engineers to follow). A copy of the Reciproot routine is here

The routine entry contains both code and comments explaining the code. There are two tapes, Reciproot 1 and Reciproot 2. There is also an attached table which is for 'unstandardizing the results' (I am not sure what this means). Having code, comments and the necessary tables to follow the routine is very useful but as you can probably see the code is difficult to read, making it hard to check my intuition about what the comments describe. Here's what I mean (this is tape 1):

│/│VSTA│/│D@/J│E│
└─┤/C/@│E│VK/½├─┘
  │GK/A│@│VK/A│
  │/@PO│A│J@/J│
  │GKPG│:│VK/½│
  │GKPG│S│VK/A│
  │/C/C│I│IS/J│
  │E:IN│U│VK/½│
  │/CTA│½│VK/A│
  │/C/C│D│VK/½│
  │J:/N│R│VK/A│
  │VK/A│J│N@UJ│
  │VKQO│N│N@UC│
  │D:QG│F│VK/F│
  │N@UC│C│VK/A│
  │/C/N│K│BAT:│
  │VK/A│T│US/C│
  │VK/N│Z│GK/N│
  │VK/N│L│GKTA│
  │VK/A│W│GKQO│
  │VKTK│H│OA/C│
  │MKTA│Y│E:/O│
  │MK/K│P│IS/C│
  │A@/J│Q│VK/N│
  │E@/½│O│:C/A│
  │VK/A│B│:C/C│
  │S@/J│G│E:/O│
  │VK/½│"│/C/N│
  │VK/A│M│/C/N│
  │U@/J│X│IC/A│
  │VK/½│V│QAPO│
  │VK/A│£│KE/P│

I'm worried that if I try to do this by myself I'll spend forever on it and not make much progress in understanding things.

What I am looking for in an answer

I'm not a mathematician. I can follow, with some practice, code like this but it is hard work. Ideally I would like a straightforward explanation of what the code is doing, how it is computing the reciprocal square root, in a concrete fashion.

Some help in deciphering

Martin Campbell-Kelly wrote an excellent history of programming on the Mark I, Programming the Mark I: Early Programming Activity at the University of Manchester (public archive link. Among other things, it had a very idiosyncratic system for input code, based on the teletype machines available for input.

The Mark I instruction set is reproduced below from Campbell-Kelly's article (p. 137)

What this is for

I'm writing a history of the Quake III fast inverse square root (see 0x5f37642f.com) and this is the earliest extant program on a digital computer which computes a reciprocal square root. I hope to publish a paper where this plays a small role. An accepted answer will be cited in the paper (and a really good answer will get noted in the acknowledgments, hah).

I have tagged this as "floating-point" though I don't think this was a floating-point instruction. Retrocomputing doesn't have a tag for 'numerical' computing.

Wow. nice question. Love it. Do you have the 'source' by chance as plain (and verified) ASCII? That might simplify decoding a lot. Also, Program as well as table seem to be not about the Manchester Mark 1 but the Ferranti Mark 1, as both included the extended instruction set - for example the A=A-D*S instruction, encoded as /½. Likewise the linked document mentions the use of B5 and B6, which are only present with the Ferranti Computer. — Raffzahn, Feb 23 '23 at 23:37
I do not have the ASCII source. This could be the Ferranti. That might explain the discrepancy between the year Campbell-Kelly noted in his paper (1949) and the date on this document (1951). Thank you for that. — Adam Hyland, Feb 24 '23 at 00:07
Someone might need to scan it and check up the scan? The confusion is common, as some papers (usually from Manchester University people, which may be where Campbell-Kelly picked it up) calls the Ferranti 'Production' Mark 1 to distinguish it from the 'prototype' Mark 1 (and both in turn from the SSEM aka Baby). — Raffzahn, Feb 24 '23 at 00:21
This has had a sort of circuitous path. The document was among the private papers of D.G. Prinz, but passed into the hands of someone else (David Link) doing an art project on a love letter written as a software program at manchester. He just so happened to have a scan. I don't know that a more canonical source exists. — Adam Hyland, Feb 24 '23 at 00:41
@AdamHyland Is this related to the reciprocal square root implementation by Cicely Popplewell mentioned in the Wikipedia article about her? Turing mentioned reciprocal square root computation on the Mark I in a publication, apparently table based. A. M. Turing, "Some Calculations of the Riemann Zeta-Function," Proceedings of the London Mathematical Society, Vol. 3, No. 1, Jan. 1953, pp. 99-117: "By storing tables of log n and n ⁻¹ᐟ² within the machine ... The reciprocal square roots were given with error not exceeding 10⁻⁵" — njuffa, Feb 24 '23 at 06:54
@AdamHyland Turing's 1953 publication I cited above relates to work performed in June 1950. According to Campbell-Kelly's 1980 publication in Annals Popplewell arrived at Manchester in October 1949 and the reciprocal square root routine found in the private papers of D. G. Prinz is dated November 1949. Which would indicate use of the original (prototype) Mark I, correct? I wonder whether the reciprocal square root computation used in the June 1950 computations is identical to the one dated November 1949. — njuffa, Feb 24 '23 at 07:20
Re. canonical sources: there is a large archive of Ferranti materials in the basement of the Musuem of Science and Industry in Manchester. You need an appointment to visit. It might be worth a visit, should you happen to be in the area. — Ian Thompson, Feb 24 '23 at 08:34
@njuffa This is probably Popplewell, though there is no indication on the document whose hand it was. I know she re-wrote (with Prinz) the manual Turing had written for the machine to make it more useful for programmers attempting to do work on it, and this work included re-writing some of the library of routines. — Adam Hyland, Feb 24 '23 at 19:42
@ian-thompson The manchester archives have a lot of material, but by an accident of history they don't have these routine libraries there. There might be some copies in the Strachey archives but I went to Manchester first before I was able to secure the copy I have linked above. — Adam Hyland, Feb 24 '23 at 20:22

dirkt · Accepted Answer · 2023-02-27T10:03:14.917

Summary

The reciproot code calculates the reciprocal of the root using the Taylor-Series expansion around x = 1, that is,

1/sqrt(1+ε) = ∑_{k=0}  (-1)ᵏ (2k)! / (2ᵏk)!² εᵏ

or

  1           1     1⋅3      1⋅3⋅5
------  = 1 - - ε + --- ε² - ----- ε³ + ...
√(1+ε)        2     2⋅4      2⋅4⋅6

The Manchester/Ferranti only had integer computations, so everything is fixed point, which is tracked manually.

As the above formula only works close to 1, first the number is standardized (which we today call normalized), that is, shifted until it's as close to 1 as possible.

In a second step, to bring it closer to one, an approximation φ of the reciprocal square root is found from a table of 16 entries, which is then factored out.

The remaining value is of the form 1+ε for a sufficiently small ε, so in a third step, the reciprocal root of that value is calculated with 7 members of the above series, using the Horner's method.

Then φ is factored in again, and the resulting reciprocal root is calculated. With the reciprocal root it is easy to calculate the non-reciprocal root. Finally, two index registers are set up to keep track of the binary decimal point.

After calling the routine, one can choose where to put the binary decimal point. This is called understandardizing, as the value isn't standardized as described above. There is a table to assist with that based on assumptions on the range of the input, and where the decimal point should be, by choosing a value called S resp. S', and then using some additional code.

Notation

The Manchester/Ferranti notation is a bit unusual from a modern perspective.

The contents of a memory address are written as [...].

The subscript + stands for "treat as unsigned number", while the subscript ± means "treat as signed number" (which is actually two's complement). The effect of the latter is to do a sign extension on load.

Alan Turing liked to use low-endian notations everywhere (see the introduction in [1]). I kept this for binary numbers (so the most significant bits are to the right), all other numbers are in normal notation.

Numbers are represented by a teletype code, so [/C] comes from / for binary 00000 and C for binary 01110.

New values (after the computation) are marked with a prime sign '. I have not figured out the superscript l or 1 in the link, potentially this was the result of a typist misreading the '.

Programme Sheets

The "programme sheets" (or tapes?) are laid out in the same way as the binary digits on the tube.

20 bits form a horizontal (short) "line" (what we call a word today).

One the tube there are two columns of (short) lines with 20 bits each, in 32 rows. A long line (a double word) is formed from two short lines in consecutive rows. This is the data type for numbers.

On the sheet, the most-significant "address characters" (for the columns) are on the left resp. right side; the least-significant "address character" (for the rows) which is common to both columns is in the middle.

So this program contains 4 columns with address /, E, @ and A.

Annotated Listing

A quickly made disassembly with some ad-hoc notation for the B-lines (index registers) follows.

//   VSTA  [VS]' = L, A' = 0        ; deposit link
E/   /C/@  std [/C]
@/   GK/A  [GK]' = M, M' = 0        ; M = position of MSB of [/C]
A/   /@PO  B6' = [/@]               ; 78
:/   GKPG  B6' = B6 - [GK]
S/   GKPG  B6' = B6 - [GK]          ; B6 = 78 - 2 * [GK]
I/   /C/C  D' = [/C]+
U/   E:IN  A' = A + D * [E:][B6]+   ; standard shift table??
½/   /CTA  [/C]' = L, A' = 0        ; [/C] now has MSB at bit 39
D/   /C/C  D' = [/C]+
R/   J:/N  A' = A + D * [J:]+       ; [J:] is probably 2^20 * 1/16
J/   VK/A  [VK]' = M, M' = 0        ; [VK] = 0..15 + [D:]
N/   VKQO  B7' = [VK]
F/   D:QG  B7' = B7 - [D:]
C/   N@UC  D' = [N@][B7]+           ; table with 16 overlapping entries of 20 bits
K/   /C/N  A' = A + D * [/C]+
T/   VK/A  [VK]' = M, M' = 0        
Z/   VK/N  A' = A + D * [VK]+
L/   VK/N  A' = A + D * [VK]+
W/   VK/A  [VK]' = M, M' = 0        
H/   VKTK  A' = 2[VK]±
Y/   MKTA  [MK]' = L, A' = 0        ; phi
P/   MK/K  D' = [MK]±               ; start Horner for [MK]
Q/   A@/J  M' = M + [A@]            ; 00000 00000 00000 00000  00000 00000 11100 11100
O/   E@/½  A' = A - D * [E@]+       ; 00000 00000 00000 00000  00000 00001 01101 01100
B/   VK/A  [VK]' = M, M' = 0
G/   S@/J  M' = M + [S@]            ; 00000 00000 00000 00000  00000 00000 00111 11100
"/   VK/½  A' = A - D * [VK]+
M/   VK/A  [VK]' = M, M' = 0
X/   U@/J  M' = M + [U@]            ; 00000 00000 00000 00000  00000 00000 00011 00010 = 0.2734375
V/   VK/½  A' = A - D * [VK]+
£/   VK/A  [VK]' = M, M' = 0
/E   D@/J  M' = M + [D@]            ; 00000 00000 00000 00000  00000 00000 00000 01010 = 0.3125
EE   VK/½  A' = A - D * [VK]+
@E   VK/A  [VK]' = M, M' = 0
AE   J@/J  M' = M + [J@]            ; 00000 00000 00000 00000  00000 00000 00000 00110 = 0.375
:E   VK/½  A' = A - D * [VK]+
SE   VK/A  [VK]' = M, M' = 0
IE   IS/J  M' = M + [IS]            ; out of the blue, use [IS]? Maybe 0.5?
UE   VK/½  A' = A - D * [VK]+
½E   VK/A  [VK]' = M, M' = 0
DE   VK/½  A' = A - D * [VK]+
RE   VK/A  [VK]' = M, M' = 0
JE   N@UJ  M' = M + [N@][B7]
NE   N@UC  D' = [N@][B7]+           ; table
FE   VK/F  A' = A + D * [VK]±
CE   VK/A  [VK]' = M, M' = 0
KE   BAT:  A' = 0                   ; also: address = BA
TE   US/C  D' = [US]+
ZE   GK/N  A' = A + D * [GK]+
LE   GKTA  [GK]' = L, A' = 0        ; sets ["K] = \alpha
WE   GKQO  B7' = [GK]

HE   OA/C  D' = [OA]+               ; 01011 11111 00110 01100  11110 01000 11010 01101
YE   E:/O  b-jump [E:]±             ; to QE (skip 1)
PE   IS/C  D' = [IS]+               ; 0.5?
QE   VK/N  A' = A + D * [VK]+
OE   :C/A  [:C]' = M, M' = 0        ; recip root
BE   :C/C  D' = [:C]+
GE   E:/O  b-jump [E:]±             ; to ME (skip 1)
"E   /C/N  A' = A + D * [/C]+
ME   /C/N  A' = A + D * [/C]+
XE   IC/A  [IC]' = M, M' = 0        ; root
VE   QAPO  B6' = [QA]               ; 40
£E   KE/P  jump [KE]+               ; to GA
/@   C@//  01110 01000 00000 00000  ; 78
E@   ////  00000 00000 00000 00000  ; a6 (taylor coefficient)
@@   /TPI  00000 00001 01101 01100

A@   ////  00000 00000 00000 00000  ; a5 (taylor coefficient)
:@   //UU  00000 00000 11100 11100
S@   ////  00000 00000 00000 00000  ; a4( taylor coefficient)
I@   //MU  00000 00000 00111 11100
U@   ////  00000 00000 00000 00000  ; a3 (taylor coefficient)
½@   //O½  00000 00000 00011 00010
D@   ////  00000 00000 00000 00000  ; a2 (taylor coefficient)
R@   ///R  00000 00000 00000 01010
J@   ////  00000 00000 00000 00000  ; a1 (taylor coefficient)
N@   ///N  00000 00000 00000 00110 
F@   P/DP  01101 00000 10010 01101  ; 1   table to extract phi, 16 entries
C@   IJHY  01100 11010 00101 10101  ; 2
K@   ANEY  11000 00110 10000 10101  ; 3
T@   DMKH  10010 00111 11110 00101  ; 4
Z@   /P£W  00000 01101 11111 11001  ; 5
L@   DHTW  10010 00101 00001 11001  ; 6
W@   EW@W  10000 11001 01000 11001  ; 7
H@   BCYL  10011 01110 10101 01001  ; 8
Y@   L:DL  01001 00100 10010 01001  ; 9
P@   QZXZ  11101 10001 10111 10001  ; 10
Q@   AHLZ  11000 00101 01001 10001  ; 11
O@   BD½Z  10011 10010 00010 10001  ; 12
B@   XTVT  10111 00001 01111 00001  ; 13
G@   A½YT  11000 00010 10101 00001  ; 14
"@   ECNT  10000 01110 00110 00001  ; 15
M@   TE:T  00001 10000 00100 00001  ; 16
X@   ////  00000 00000 00000 00000
V@   ////  00000 00000 00000 00000
£@   ////  00000 00000 00000 00000
/A   VSTA  [VS]' = L, A' = 0.       ; deposit link
EA   KAQO  B7' = [KA]               ; 12
@A   /C/J  M' = M + [/C]
AA   /C/K  D' = [/C]±
:A   /C/F  A' = A + D * [/C]±
SA   :C/A  [:C]' = M, M' = 0
IA   :CTK  A' = 2[:C]±
UA   :CTA  [:C]' = L, A' = 0        ; [:C] := 2 * [/C] ^ 2
½A   E:QG  B7' = B7 - [E:]          ; loop: test B7
DA   E:/O  b-jump [E:]±             ; 
RA   NS/P  jump [NS]+               ; return if not b-cond
JA   :C/C  D' = [:C]+
NA   :C/N  A' = A + D * [:C]+

FA   :C/A  [:C]' = M, M' = 0        ; [:C] = [:C]^2
CA   TA/P  jump [TA]+               ; to UA
KA   N///  00110 00000 00000 00000
TA   UA//  11100 11000 00000 00000
ZA   ////  00000 00000 00000 00000
LA   ////  00000 00000 00000 00000
WA   ////  00000 00000 00000 00000
HA   ////  00000 00000 00000 00000
YA   ////  00000 00000 00000 00000
PA   NE//  00110 10000 00000 00000  ; 44
QA   ½E//  00010 10000 00000 00000  ; 40
OA   G£NI  01011 11111 00110 01100
BA   K@JP  11110 01000 11010 01101
GA   "KPG  B6' = B6 - ["K]          ; start final part, B6 is 40
"A   "KPG  B6' = B6 - ["K]          ; B6 := 40 - 2["K]
MA   VKPB  [VK]' = B6
XA   PAYO  B5' = [PA]
VA   VKYG  B5' = B5 - [VK]          ; B5 := 44 - B6 = 4 + 2["K]
£A   NS/P  jump [NS]+               ; return

Walkthrough

The code relies on constants in other locations, probably all of that was well-known at the time, but as it's not in the listing, I had to guess most of it.

Given the input value in [/C], which must be between 2^20 and 2^40, after completion, [/C] is now shifted, [:C] contains the reciprocal of the square root (shifted), and [IC] contains the square root itself.

The B-line registers B5 and B6 are also used and contain values related to the shift of the result.

In the first line //, the lower part of the accumulator is saved in VS. This is part of the "Schema A" to call a subroutine.

E/ standardizes [/C], and the following code until ½/ is used to shift [/C]. This seems to use a standard table of powers of two at //, since [E:] is probably zero.

From D/ to T/, a value in [J:] (probably 2^20 * 1/16) is multiplied to the shifted [/C], effectively dividing [/C] by 16. This is used as an index in a overlapping table of 20 bits with 16 entries. While 40 bits are loaded, the lower 20 bits seem to be ignored. The value in the table is multiplied to [/C], squared (Z/ to W/), doubled (H/), and then serves as the ε in the series.

P/ to RE apply Horner's method to evaluate the polynomial, with the coefficients in E@ to N@. One of the coefficients is [IS], another seemingly well-known constant which must be 0.5.

JE to CE adjusts the result with the factored-out value, using again the table.

KE to LE sets up ["K] with what the what is called α in the source text (the shift of the result).

WE to XE make the even-odd adjustments that are also mentioned in the source text, while storing the final computer reciprocal root in [:C], and the root itself in [IC].

VE to £E loads B6 with 40, and then continues at GA.

GA to "A set up B6 as described in the source text, while MA to VA do the same for B5.

Finally, £A returns to the called (apparently the return address is stored by convetion in [NS]).

Interstingly, /A to TA seem to contain a completely unrelated routine that calculates a large power of [/C] and return the result in [:C]. One can see the same "calling convetions" as in the other routine.

Additional sources

There is also a description of the routine in [1], page 80, under "standard routines", with the name RECROOT (and not RECIPROOT):

The routines for mathematical functions were mainly chosen for simplicity of programme and generality of application [rather] than for speed.

[...]

RECROOT.

This was a relatively slow method for the reciprocal square root. It depends on the case p = 2 of the recurrence relation

u_{n+1} = u_n (1 − 1/p * a * u_n ^ p)

for a^(-1/p). It was necessary for the argument to be within a relatively restricted range, and for u_0 to satisfy 0.5 < a * (u_0)^2 < 2. It was often found more convenient to use the logarithm and exponential, although much slower.

Conclusions

While the routine uses a lookup table and then does some calculations, there the similarity to the Quake fast inverse square root ends: Mathematically, the approaches are completely different (Taylor-series expansion vs. Newton iteration).
They new that this method was slow, but used it for easy of programming (as a sort of "proof of concept"?)
The trick to divide by a constant by multiplying with a different constant and then throwing away the lower part was also used back then
They already had subroutine calling conventions in place

Literature

[1] Alan Turing’s Manual for the Ferranti Mk. I, manuscript. Digitized version.

[2] Programmers' Handbook (2nd Edition) for the Manchester Electronic Computer, revised by R.A. Brooker, October 1952. First chapter

Addressing of the mark 1 is a bit unusual from today's point of view, as it's 'bottom up'. While it's true that /C is binary 00000.01110, in addressing it means Line 0 (/) of Column 14 (C) which comes to address 448 - or the first line in 'unsystematic storage' - aka single variables. — Raffzahn, Feb 24 '23 at 13:27
It is curious that the Manchester team chose this method of approximation, given that they would have been aware of the iteration y=½y(3-xy²) if they were familiar with D. R. Hartree, "Notes on iterative processes," Mathematical Proceedings of the Cambridge Philosophical Society, Vol. 45 , No. 2, April 1949, pp. 230 - 236. — njuffa, Feb 25 '23 at 02:27
@Raffzahn I had more the impression that they use "line" to mean "word" (because a word looks like a line on the tube), so I used "row" to emphasize it's about the arrangement. So one row consists of two (short) lines. A long line consists of two short lines in successive rows. — dirkt, Feb 25 '23 at 06:56
@njuffa: I also read that they even used log/exp to calculate the root instead of reciproot, even though it was slower, so they didn't seem to care much about doing proper numerical analysis at this stage. — dirkt, Feb 25 '23 at 07:04
@dirkt yes, line would be in modern terminology a word. Thus it's quite confusing to introduce another term - it did put me off track. — Raffzahn, Feb 25 '23 at 14:15
@dirkt where did you read that? Would be very interested to see it. — Adam Hyland, Feb 25 '23 at 20:33
@njuffa. Thank you for that citation, the earliest I had seen the claim that NR on 1/sqrt was faster due to fewer divisions was J. Eve, Starting Approximations for the Iterative Calculation of Square Roots, The Computer Journal, Volume 6, Issue 3, November 1963, Pages 274–276. — Adam Hyland, Feb 26 '23 at 18:11
@AdamHyland Rewrote answer to make it more readable, with sections, annotated listing, and walkthrough. — dirkt, Feb 26 '23 at 18:29
Thank you for your comments on the similarity to the FISR. William Kahan mentioned in an oral history seeing how they did square roots at manchester circa 1954/5 inspired "Kahan's magic square root" (which became the FISR). Now that I think about it, if the practice was to use log/exp that makes even more sense for inspiration since that's the mechanism in the magic square root. — Adam Hyland, Feb 26 '23 at 18:51
@AdamHyland If you want my two cents: The innovation behind the "magic square root" is really to use the bit representation of the floating point number to get a starting value for the Newton iteration. From a mathematician's point of view, there are lots of ways to calculate the root, all were well known, and Taylor series vs. exp+log vs. Newton iteration vs. the other already mentioned are all very different methods, and I don't think there is any sort of connection between them. So I really wouldn't see this (nor the exp/log) as a precursor to the "magic square root". — dirkt, Feb 26 '23 at 19:01
@dirkt I agree with you. That's the central innovation and it didn't/couldn't come from what's on the manchester computers. What I mean is more that the experimenting with approximate logarithms in service of doing a square root might have been where the intuition that the bit field of a floating-point number could get you an approximate log. It couldn't have come from the literature prior to 1961 (http://dx.doi.org/10.1109/TEC.1962.5219391). Could have come from general experience but he mentioned this specifically as insp for his IBM 7090 routine, so I'm shaking it down. — Adam Hyland, Feb 26 '23 at 19:21
@AdamHyland Anybody who has used a slide ruler or log-range chart knows that exponentiation becomes multiplication on a log-scale, so taking the square root means "halving". And of course the exponent of a float is a log-scale representation. This knowledge is so widespread you don't have to be involved into historic root algorithms to know about it, and neither do you need literature for it. — dirkt, Feb 27 '23 at 09:06
@dirkt to me this is one of the more fun things about looking into the history of the FISR. This stuff is in some sense common knowledge, and yet innovations to it or implementations of it diffuse from example, as though it were not common knowledge. And again, I’m not asserting that Kahan had to have seen this to make his version—he noted the connection. — Adam Hyland, Feb 27 '23 at 15:46
@Adam Hyland You may also be interested in John C. Gower, "A Note on an Iterative Method for Root Extraction." The Computer Journal, Vol. 1, No. 3, 1958, pp. 142-143. " Summary: A double iterative method for evaluating y/x¹ᐟⁿ is derived ... The method seems to have originated with the EDSAC group at Cambridge (Wilkes, Wheeler and Gill, 1951), and was adopted on several of the earlier computers; at Manchester and Rothamstead for instance ". The iterations are division-free, equally well suited for computation of square root and reciprocal square root, very simple starting value. — njuffa, Mar 05 '23 at 07:05