26

I noticed back in the DOS gaming era that DOS games ran slowly in hi-res modes. I was surprised to discover that this could be true on a modern machine.

I booted a 2017 i5 7200u (I think) laptop into FreeDOS and loaded Quake; predictably no sound but otherwise okay. However, in 1280x1024, it ran slowly, which seemed impossible.

I was aware that the processor has to switch between real and protected mode quite a bit, but I doubt that's the problem because I created a virtual machine, running on the CPU, and got Quake to run well in that resolution.

So, what is the bottleneck? The graphics chip? The CSM?

Thorbjørn Ravn Andersen
  • 2,262
  • 1
  • 14
  • 25
Leon Simpson
  • 369
  • 3
  • 4
  • 3
    Have you rebuilt it or is it the original exe? A lot of the original code for graphics and floating point arithmetic is in assembler. It will take a while to unravel what the code is doing. You will need to check what routine is being used for this is how I put a dot in a specified colour at a specified position on the screen – cup Dec 11 '21 at 08:53
  • 6
    here are my guesses: 1. 1280x1024 on 2D ray caster is quite a lot we usually used 640x480 or 320x200 back in the days and as old games did not multithread number of cores does not matter 2. old MS-DOS style timing might be compromised on new machines (like for example CRT lib error) and also syncing might be compromised which might diminish performance 3. old code was usually assembly optimized for specific CPU architecture, but nowadays CPUs are very different ... 4. emulated VGA/SVGA might not be as fast as the real stuff... especially with low level IO techniques... – Spektre Dec 11 '21 at 14:51
  • 8
    Not an answer, but you may try GLQuake instead. Based on the same sources, but modified for OpenGL, which might result in noticeable speed increase - and being almost resolution independent. – Raffzahn Dec 11 '21 at 15:45
  • 2
    There is no switching between real and protected mode. Quake runs entirely in protected mode. It does not make calls to the BIOS during rendering or for any purpose other than mode switching, and I believe it uses the protected mode VBE for that anyway. – R.. GitHub STOP HELPING ICE Dec 12 '21 at 13:16
  • 2
    @Raffzahn: There are no OpenGL drivers on DOS. – R.. GitHub STOP HELPING ICE Dec 12 '21 at 13:17
  • 2
    @R..GitHubSTOPHELPINGICE Don't tell this to any Voodoo owner, their OpenGL may stop working :)) – Raffzahn Dec 12 '21 at 14:25
  • @R.. GitHub STOP HELPING ICE I was of the impression it had to constantly use v86 mode to utilise VBE at all, and that was why DOS games always ran slowly in high-res. If that's not it, then why did they run slowly back in the late 90s as well?! – Leon Simpson Dec 12 '21 at 14:56
  • 1
    The Quake source code is freely available, why not just download a copy and run it natively on your current o/s? – Bib Dec 12 '21 at 16:27
  • 2
    @LeonSimpson: The only thing VBE is used for is mode setting and obtaining the pointer to the framebuffer, and (possibly?) page flipping once per frame. It has absolutely no involvement in drawing which is purely memory write operations. The reason for slowness back then was the low cpu and memory bus clocks, not any sort of VBE overhead. – R.. GitHub STOP HELPING ICE Dec 12 '21 at 16:34
  • 1
    @Raffzahn: To my knowledge there never was a version of the 3dfx voodoo Quake for DOS. It was only on Windows, and I don't recall whether it used OpenGL or its own API. What was available for DOS was a custom version of Quake for the Rendition Verite. – R.. GitHub STOP HELPING ICE Dec 12 '21 at 16:37
  • @R..GitHubSTOPHELPINGICE Given, there were no official 3dfx, and even for Windows it was only MinGL. But there was/is Mesa, which started out as OpenGL implementation under DOS. It supported back then (~96/97) Voodoo cards as a layer on top of GLIDE. OpenGL Quake was kind of a prime example for this. (later) Q2DOS is still based on Mesa and Sage. – Raffzahn Dec 13 '21 at 13:41
  • @R..GitHubSTOPHELPINGICE there was GLUT for MS-DOS ... so there definatelly was OpenGL in MS-DOS (I used it within Turbo C++ 3.11) however without any HW support only SW render ... The only HW Glide I know of for MS-DOS was 3Dfx Voodoo1 but to use that instead of OpenGl you have to use the hack of exchanging .ovl and or .dll ... this was usually used to run 3Dfx stuff on non 3Dfx HW but it can be done in reverse too as the 3Dfx driver is identical to OpenGL1.0 API +/- some default settings difference – Spektre Dec 15 '21 at 09:32

3 Answers3

29

The original Quake used software rendering directly to video memory, at that time in 320x200, and you are using a resolution with around twenty times as many pixels. In other words twenty times the frame size that Quake was designed for.

I would expect that what you see is that the CPU simply cannot push single pixel values to the video card any faster.

A virtual machine does not push pixels directly to the graphics card, but uses a highly optimized rendering in the host, which most likely uses 2D-graphics acceleration to get the pixels shown.

Thorbjørn Ravn Andersen
  • 2,262
  • 1
  • 14
  • 25
  • Valid points, especially about I/O and emulation (virtual machines) scaling different than real hardware. Just, 320x200 was the default resolution for quake, fine on a 75 MHz Pentium with an non-accelerated VGA. Already back then most played in way higher modes on average machines. In fact the not much later GLQuake defaults to 640x480. – Raffzahn Dec 11 '21 at 15:43
  • 8
    GLQuake uses OpenGL for rendering which requires lot less bandwith between the CPU and the GPU especially for higher resolutions. – Thorbjørn Ravn Andersen Dec 11 '21 at 16:26
  • Sure, but it also shows how fast tech moved at that time as it could be assumed that any OpenGL capable card will of course support 640x480 as well - older VGA did not. So the difference is more about the missing need to support even older stups. – Raffzahn Dec 11 '21 at 16:47
  • But why not? Is that a limitation of the graphics card or the cpu? Or the fact that the VESA BIOS Extensions are emulated by the CSM? – Leon Simpson Dec 11 '21 at 18:02
  • @LeonSimpson It's a limitation of the way the game was written. – Hearth Dec 11 '21 at 19:06
  • 10
    @LeonSimpson yes, and more as today's PCs hardware in no way works like back then, even through is seems on a software size. For example a graphics cars is no longer a simple byte parallel memory interface (like ISA) but a complex layered multi lane serial interface like PCIe. Much of the speed gains in modern hardware are reached thru larger transmission size and asynchronous communication. Something that doesn't go well with reading a byte (or word) from a VGA and writing it back modified. On a classic memory based interface, these are very basic interactions, close to maximum I/O ... – Raffzahn Dec 11 '21 at 19:34
  • 9
    @LeonSimpson ... speed. On a modern PCIe the same VGA access needs to be packaged into a request packet, then packaged in a data link layer framing, then stuffed with some PHY framing. then the transaction started. The same on the way back to deliver the requested byte. After getting modified it gets again packaged and send to the card. So while hidden for DOS applications, it still represents the absolute worst case pattern possible. Any virtual PC/emulator will handle that buffer in memory and only update once in a while benefiting from maximum packet size with minimum transaction number. – Raffzahn Dec 11 '21 at 19:36
  • @Hearth For that to be true, surely it would run slowly in a VM also? – Leon Simpson Dec 11 '21 at 22:24
  • I'll add that it runs slowly in DOSBox too, though not as badly as when booting FreeDOS. This is only true in 1280x1024. I know DOSBox is an emulator, but I'm still surprised it's slow. – Leon Simpson Dec 11 '21 at 22:26
  • 3
    @LeonSimpson Code written for old computers very frequently makes assumptions about the hardware it will be running on, which were completely valid assumptions at the time, but modern computers, while mostly compatible, have very different hardware. Code written to make optimal use of specific hardware may not work very well on modern hardware. DOSbox emulates that specific hardware, to an extent, and translates all the functionality the game uses into something modern processors can handle more efficiently. – Hearth Dec 11 '21 at 23:06
  • 4
    It's even worse that you think with many modern video cards no longer supporting legacy VGA modes. You should consider yourself lucky that it runs at all natively on 2017 hardware. See here for more information: https://www.vogons.org/viewtopic.php?f=7&t=85099 – Zhro Dec 12 '21 at 04:46
  • @Raffzahn I would expect quake to write to vga memory only and not read. It is simply not necessary. – Thorbjørn Ravn Andersen Dec 12 '21 at 09:05
  • @ThorbjørnRavnAndersen And that changes the issue in what way? – Raffzahn Dec 12 '21 at 12:11
  • @Raffzahn it doesn’t change the issue but you are typically very accurate so just mentioned that this might be slightly different than how you explained it – Thorbjørn Ravn Andersen Dec 12 '21 at 12:19
  • @ThorbjørnRavnAndersen Well, it's a comment, thus a simplification (and I still needed two). Going into the real issue of reasons for non combining of transactions and so on would have filled a page. Also, while Quake may work by only writing(would need to be checked), not all games do. – Raffzahn Dec 12 '21 at 12:47
  • 1
    I think that this is a combination of the general description in the answer, and @Zhro’s comment above — Quake in 1280×1024 would really need LFB support in VBE, and that’s not implemented in 2017-and-later IGPs. Without that, it will have to page display areas all the time, which will kill performance. – Stephen Kitt Dec 12 '21 at 14:25
  • Heck it isn't even running natively if you are running an x64 OS. It needs to drop down to x86 land to work... – Aron Dec 12 '21 at 16:24
  • 1
    @Aron If it's running in a DOS VM, surely it is running natively, at least as far as the CPU is concerned? – Leon Simpson Dec 12 '21 at 20:56
  • @Raffzahn Of course, both memory and GPU access is much faster than on ISA - and the latency with ISA was horrible too (not to mention stealing your CPU cycles). But it's pretty likely the optimizations that made Quake run well on ISA make it perform far worse on a modern GPU - my software 3D renderer pushes 30 FPS on 1920x1200, which is pretty close to the bandwidth limit, but as you said, instead of writing directly to VRAM ASAP, it renders everything into a RAM backbuffer and sends all the image data to the GPU in one go. Sending individual pixels ordered in a way appropriate for VGA... – Luaan Dec 13 '21 at 07:21
  • 1
    @LeonSimpson Not quite; hypervisors are still emulated even today, they just get more support from the hardware. Depending on the hypervisor and hardware you're using, there's many ways that VMs may be implemented. And Intel's x64 modes threw away 16-bit support entirely (which is the only reason why you can't run 16-bit applications on 64-bit Windows anymore without virtualization or emulation). Since 2010, some Intel CPUs allow optionally running VMs in real mode; but that still includes higher-level memory protection (extended page tables). Of course, video is fully "simulated". – Luaan Dec 13 '21 at 07:28
  • The original Quake would've been optimized for what I call the "linked-list model of memory", too: where the main focus of optimization is reducing the number of instructions executed, not things like cache locality. – user253751 Jul 07 '22 at 15:23
  • @user253751 The bottleneck for quake was math speed, not cpu. It was unplayable on a 486 – Thorbjørn Ravn Andersen Jul 07 '22 at 16:35
  • @ThorbjørnRavnAndersen what is the difference between math speed and cpu? – user253751 Jul 07 '22 at 16:45
  • @user253751 the speed of the math coprocessor which at that time had migrated inside the cpu. The pentium itself (executing x86 instructions) wasn't much faster than the 486, but the coprocessor was. – Thorbjørn Ravn Andersen Jul 07 '22 at 16:53
  • @ThorbjørnRavnAndersen By increasing the frame size 20 times you ask the rendering engine to do 20 times as much work. While math is certainly more than 20 times as fast as it used to be... is memory latency? Not sure. Memory latency is certainly a lot fewer times faster now than in 1996, than math is. Since software in 1996 was optimized to reduce the number of instructions, and memory accesses were irrelevant, it's plausible that software from 1996 did not age well. – user253751 Jul 07 '22 at 17:02
  • @user253751 yes. You read the bit of the question where it was mentioned it ran as expected in a virtual machine? This is carmack code… – Thorbjørn Ravn Andersen Jul 07 '22 at 18:43
11

Your problem is almost surely that FreeDOS (rather DOS in general) does not really treat mediating of access to hardware as part of its role as the operating system, and thereby has not setup access to the video memory properly. In particular, it likely hasn't set the MTRR (memory type range registers) or whatever the most recent equivalent of them is to enable write combining for video memory, so each write of each pixel to video memory is going through an expensive synchronization process.

If you run the exact same Quake binary under DOSEMU on Linux (if DOSEMU is even still maintained enough to work) or even an actual emulator like Dosbox, you'll likely find that it runs perfectly well because the host OS has setup video memory access correctly. There are likely tools to do this on FreeDOS too; I recall there being stuff like that back in the 90s or early 00s, but not what the names were or whether they were ever updated to work with later CPU models.

  • 12
    People may take exception to "is not a real operating system" and having that in your answer does not benefit anyone. FreeDOS not setting up VMEM properly is useful information. No judgement here. Many would say FreeDOS is an OS, just not a very sophisticated one. – Spud Dec 12 '21 at 19:05
  • 4
    Some BIOSes used to have an option to make video memory WC (uncacheable, write-combining) instead of UC (fully uncacheable, strongly-ordered). IIRC, my early-2000s board with an AGP slot called it "UCSW" instead of WC, as in "UnCacheable Software Write combining". But yeah, if the modern laptop firmware didn't default to that for MTRRs, you'd be out of luck for store bandwidth to the framebuffer. – Peter Cordes Dec 12 '21 at 22:06
  • 4
    I found https://www.vogons.org/viewtopic.php?t=59676 which shows video write bandwidth numbers with / without MTRRLFBE. If the hardware/firmware supports a linear framebuffer at all, running some DOS program to set up MTRRs before starting Quake might help. Apparently there's are programs called MTRRLFBE and/or fastvid; I guess google for download links? (For those not aware, MTRRs are actual CPU registers, Memory-Type Range Registers, which can just be set once before running some other program, and change how any stores work to that address range.) – Peter Cordes Dec 12 '21 at 22:12
  • 7
    Not setting up MTRRs has nothing to do with FreeDOS not being a "real operating system", whatever that is supposed to mean. I'm fairly sure genuine MS-DOS wouldn't know how to set them up either (and that is a real operating system, isn't it?). IIRC MTRRs were introduced by the Pentium II, so most DOSes predate them. It is the job of platform firmware ("BIOS" back then) to set them up, and modern firmware probably defaults to the most compatible option (uncacheable) because performance during early boot rarely matters nowadays. – TooTea Dec 13 '21 at 08:19
  • 2
    By a "real operating system" here I mean one that mediates hardware access. DOS (FreeDOS or otherwise) is not doing that. Since folks seem upset at the wording I'll try to reword it. – R.. GitHub STOP HELPING ICE Dec 13 '21 at 14:29
  • 2
    I edited out the unneccesary comment which detracted from a good answer. – deep64blue Dec 13 '21 at 16:51
  • @AlanDev I don't like "fully fledged" either. It is a fully fledged operating system that can run stand alone on a PC. How about "sophisticated"? – JeremyP Dec 14 '21 at 08:57
  • 1
    @AlanDev By that definition, no version of Windows before NT was a "fully-fledged operating system" either. Nor is any embedded OS such as VxWorks. Don't go looking for a different word, because any qualifier you put there is going to be wrong. Operating systems may or may not provide this, but that doesn't change whether they're operating systems, whether they're complete operating systems, nor even whether they're sophisticated or advanced or modern or fully-featured operating systems. It's merely a design decision. – Graham Dec 14 '21 at 09:19
  • @Graham I assume that was directed at JeremyP's comment not mine?? – deep64blue Dec 14 '21 at 14:28
  • @AlanDev No, it was for you, about both the original adjective and the new one you tried. JeremyP suggested an alternative, but I think the problem with any adjective is that they're objectively (provably) wrong if they imply quality, complexity, or completeness. The true statement is that it's "not an operating system that's aware of and mediating access to hardware"; any qualifier about the quality, complexity or completeness of the operating system is not true. (Or it may be true, but for reasons of general code quality and not because of this.) – Graham Dec 14 '21 at 15:40
  • @Graham I didn't add an adjective so I have no idea what you are talikng about, sorry. – deep64blue Dec 14 '21 at 21:11
  • 1
    I didn't come here to have a debate over what makes an operating system, but in my book, mediating hardware access is part of that. This doesn't mean I think the DOS approach was/is wrong, just that DOS is more of a program loader for near-bare-metal programs than an OS. – R.. GitHub STOP HELPING ICE Dec 14 '21 at 23:10
  • I reworded it a bit further since the wording was still disliked by some folks. – R.. GitHub STOP HELPING ICE Dec 16 '21 at 13:54
  • @R..GitHubSTOPHELPINGICE The question is if the definition of Operating System you use today is the same as applied then to Disk Operating System (DOS). Today it would probably be just be considered a single program loader but at the time it allowed your computer to execute the programs you needed to run. – Thorbjørn Ravn Andersen May 20 '22 at 17:15
  • This answer is almost certainly wrong - DOSBOX does not allow direct hardware access, and what is slow is direct single writes to PCI/PCIe, which have enourmous overhead compared to the old times (MTRR might play a role as well, but video access under DOSBOX is completely different to under freedos). – Remember Monica Jul 18 '22 at 18:36
  • @RememberMonica: The OP is not using DOSBOX or any emulator. They specifically said they booted FreeDOS on bare metal. – R.. GitHub STOP HELPING ICE Jul 19 '22 at 02:43
0

Quake only having a software renderer (on the DOS release day version at least) doesn't mean that you could do away with caring about anything else too.

An Athlon XP cpu is 5 times slower than yours give or take for our intents and purposes (i.e. multiple threads and SIMD extensions aren't a thing), and with the lamest "video accelerators" of the day you could get into single digits framerates even in QVGA resolution. Despite otherwise being more than enough even for SVGA.

I don't really know the intricacies behind CSM emulation (is it just the thing existing at all with its thunking everywhere? or perhaps nobody really ever bothered to optimize its VBE driver for speed? ...could the obvious lack of a legacy vbios/oprom in such new PC entail some extra burden too?) but that seems the only explanation.

mirh
  • 111
  • 2
  • 1
    This. CPU's today just aren't that much faster than they were back then as one is inclined to believe they are (in a single thread workload). Just full-hd is more than thirty times the pixel count of mode-x. – Haukinger Dec 14 '21 at 14:45