17

TCP/IP has some binary header fields which are affected by byte order, so defines 'network byte order' to settle the issue, specifically defines it as big-endian.

When was this decided? The earliest reference I've found so far is RFC 1700, but that seems to be dated 1994, and I would expect this matter to have been decided much earlier than that.

DrSheldon
  • 15,979
  • 5
  • 49
  • 113
rwallace
  • 60,953
  • 17
  • 229
  • 552
  • 3
    I'd expect answers to this question to relate the circumstances and "why" around this; YY-MM-DD hh:mm:ss.ddd isn't a very interesting answer. – wizzwizz4 Apr 11 '17 at 08:01
  • 4
    RFC 791 on the internet protocol speaks of byte order. September 1981. https://tools.ietf.org/html/rfc791#page-39 The earlier RFC 760 does not seem to mention it explicitly though the fact that the high byte of the address (the network part) is sent first was a bit of a hint. – George Phillips Apr 11 '17 at 08:05

3 Answers3

16

As far as I can see, RFC 1700 doesn’t define “network byte order” as a phrase; it specifies the order of transmission of bytes (or octets) on the network, as done previously in the RFCs it obsoletes (going at least as far back as RFC 990 in November 1986). As I understand it, “network byte order” is just the order of bytes on the network, which depends on the network and should be documented in the network’s specification — it’s big-endian for IP, as documented in the IP specification, RFC 791, published in September 1981. (The initial TCP paper doesn’t concern itself with such issues.)

Endianness as a byte-order term goes back to Danny Cohen’s On Holy Wars and a Plea for Peace, published as Internet Experiment Note 137 in April 1980 and again in IEEE Computer Magazine in October 1981. That paper summarises the situation, pointing out that many existing communications protocols are little-endian (RS-232, Telex…) but that the ARPANet IMP was mostly big-endian, and that communications on the ARPANet were big-endian.

Stephen Kitt
  • 121,835
  • 17
  • 505
  • 462
  • 1
    You are correct in your definition of network byte order. It's not called that in the initial IP RFC, but that's what it is. Therefore, for IP, it was decided some time between the initial descriptions of TCP in 1973/74, and the publication of RFC 791. Interestingly, the IMP specification uses 16-bit words, and mentions DEC machines, which, if I remember correctly, were big-endian. I believe a lot of TCP/IP work was done on DEC machines as well. – Dranon Apr 11 '17 at 14:42
  • @Stephen Your answer leads to the question, why was the ARPANet IMP big-endian? (IOW I don't think we've reached the bottom of the rabbit hole..) – snips-n-snails Apr 12 '17 at 05:48
  • 1
    @traal there’s a fair chance that’s just what the engineers at BBN preferred, or were used to. They developed on PDP-1s, which were 18-bit word machines (so byte endianness doesn’t apply, only bit endianness), but for DDP-516s which were 16-bit machines; I imagine the technical documentation for the latter would help ;-). Interestingly, the latest Annals of the History of Computing has an article on an early encryption device for IMPs, and includes a potted history of ARPANet. – Stephen Kitt Apr 12 '17 at 07:23
  • @StephenKitt I couldn't find the original 1969 version, but I did find a 1976 revision of BBN Report No 1822, which defines the IMP. From my brief skimming of it, there aren't really any endianness issues, though, because of the larger word size. – Dranon Apr 12 '17 at 14:04
  • Note 137 is an interesting read. Only after some reading did I read the date "April 1", and after that the issue became unclear to me again. – Alex Martian Dec 15 '23 at 06:36
7

For TCP/IP, the matter was "decided" the first time an IP packet was sent. The order of bits and bytes put onto a wire (or a radio link) by the sender had to match the order they were processed by the receiver. If the receiver processed them in a different order, chaos ensued.

Since all communication involved at least two parties (a sender and a receiver), the appropriate order was established by convention. If you followed the convention things worked; if you didn't then things didn't work.

Eventually someone saw fit to actually specify the conventional order in an RFC, but before that you could figure out what the correct order was by trying to communicate with another already-working system. I would not be surprised if repeated failures due to incorrect assumptions is what led this to being included in RFC 791, when it had been unstated in RFC 760 and IEN 123.

Keep in mind, it was not typical to write the RFC/IEN first, then build a system afterward. Rather, you built a prototype first, refined it through multiple iterations, and then when you finally had something that worked well enough to share with others, you would document it in an RFC or IEN (although sometimes you'd do so even if it wasn't working "well enough" simply to meet a commitment).

Ken Gober
  • 11,427
  • 1
  • 40
  • 56
  • You seem to assume that there was no networking before the ARPANET. This is not the case. It's likely that the developers of ARPANET followed the same conventions of whatever underlying physical network(s) they were using. – JeremyP Apr 12 '17 at 09:26
  • I was not assuming that; if you can point out where I implied it I will edit in a clarification. I doubt the underlying physical networks would have imposed any restrictions on byte order. They undoubtedly would have had conventions regarding bit order but that doesn't matter as long as you get the same bytes out the other end. For example, RS-232 UARTS impose a bit transmission order (LSB first) but no byte order -- bytes come out the other end in the same order you sent them. Likewise, it is easy to bridge between Ethernet (LSB first) and Token-Ring (MSB first) without any byte swapping. – Ken Gober Apr 12 '17 at 13:33
  • Talking about the RFC that defines the Internet Protocol kind of implies that. Of course the underlying network does not have to influence the byte order of IP (because IP is just a blovk to it) but the engineers who made the first version of ARPANET might have been influenced by what they were working on top of. – JeremyP Apr 12 '17 at 13:53
  • Ah, since the question asked specifically about TCP/IP network byte order, I thought it implied that my answer applies to IP only. I'll edit in a clarification. – Ken Gober Apr 12 '17 at 15:25
  • +1 for noting that RFCs have usually codified/refined something that has at least a proven prototype implementation. At least, the good RFCs have. – Wayne Conrad Apr 12 '17 at 22:45
  • @KenGober I see what you mean: the question can be read in two ways: 1. when was NBO decided for the IP family, (to which your answer is almost certainly correc)t; 2 when was the NBO used by TCP/IP first decided ever. – JeremyP Apr 13 '17 at 08:37
3

Network packets are formed once but read during transmission multiple times. Therefore it is worth optimizing reading of multi-byte values. Reading a big-endian unsigned number consisting of N bytes (where N bytes is no more than the native word size) is

cnt = N;
val = 0;
while (cnt--) val = (val << 8) | getbyte();

whereas reading a little-endian number requires a more complex code, something like

val = 0;
for (cnt = 0; cnt < N; ++cnt) val |= getbyte() << cnt*8;

Therefore reading a value transmitted as a big-endian byte sequence is more efficient.

Leo B.
  • 19,082
  • 5
  • 49
  • 141
  • I don't see how this answers the question. – Chenmunka Apr 11 '17 at 08:46
  • @Chenmunka This relates to the "why" around the question, see the wizzwizz4's comment. Given the optimization criteria, there was no need to "decide" anything by fiat, the format follows naturally from the requirements. – Leo B. Apr 11 '17 at 08:50
  • That assumes you’re reading from the network one byte at a time; it’s more efficient to read multiple bytes at a time if you can. I’m not convinced by the “follows naturally” argument; there are little-endian communications protocols (some older than IP). – Stephen Kitt Apr 11 '17 at 11:25
  • 4
    I'm not convinced this has anything to d with it. I think it's more likely that whoever designed the protocol that popularised network endian was using a big endian machine. – JeremyP Apr 11 '17 at 13:08
  • 1
    Note that this is not how it is implemented e.g. in Posix: You always read the complete packet, and use htonl etc. to convert from network order to host order or vice versa. And I seriously doubt there was ever any hardware where you really read packets one byte at a time. – dirkt Apr 11 '17 at 15:06
  • I did get this "wrong" once upon a time I designed some hardware that transmitted 16 bit values lsb first (which seemed natural to me as we were using HDLC) so it was transmitted as 0123456789ABCDEF big endian means transmission is 89ABCDEF01234567 which looks odd. Biggest issue was the client was a 68000 family CPU which doesn't have the equivalent of bswap so the code ended up looking awful. – PeterI Apr 11 '17 at 15:48
  • @Stephen In a non-packet-switched protocol, where data is streamed once and received once, indeed, the endianness doesn't matter that much. – Leo B. Apr 11 '17 at 16:41
  • @JeremyP At the time most machines were word-based. – Leo B. Apr 11 '17 at 16:41
  • @dirkt This document (page 76) leads me to believe that ARPANET packets were read one bit at a time. Also note the bit numbering in a format description on page 54 of the PDF. – Leo B. Apr 11 '17 at 16:46
  • @Leo Aren't machines still word-based? https://en.wikipedia.org/wiki/Word_(computer_architecture) – snips-n-snails Apr 12 '17 at 05:43
  • @traal I should have said "word-oriented". See "Word and byte addressing" section of the article. – Leo B. Apr 12 '17 at 06:39
  • @LeoB. At what time? We haven't established when the big endianness of network byte order happened. FWIW the most popular minicomputer of the 1970's and 80's was the PDP-11 with byte addressing and it was little endian. – JeremyP Apr 12 '17 at 09:12
  • @traal Leo B is really talking about how memory was addressed. In the PDP-11, for instance, memory was (8 bit) byte addressed. By contrast, the DEC 10 from the same company was word addressed, each word being 36 bits in size. It needed special instructions to access individual bytes withing words. – JeremyP Apr 12 '17 at 09:19
  • The IMP was a computer of its own (with attached teletype); what you are looking at is a serial connection between IMP and HOST with handshaking similar to CTS and RTS. And of course stuff is transmitted bit-by-bit over a serial line, just as ethernet frames are transmitted bit-by-bit. That doesn't mean they are processed byte-by-byte as you indicate in your example program. And packets are always read completely by the IMP, and then placed in the corresponding queue. – dirkt Apr 12 '17 at 09:31
  • The problem is that as soon as you have different packet types with different places that need conversion of byte order, you'd have to split the read routines into variants for each packet type, and then you'd again have to process them differently later on. Usually only the latter is done, and is soon as the packet type has been decided on, the places where byte order conversion is needed are known. The bit numbering is this way because the host was a PDP-1, and DEC bits are numbered this way. – dirkt Apr 12 '17 at 09:33