I recently bought an Apple computer with the new M1 CPU to supplement the beastiarium known as Varnish Cache’s continuous integration cluster. I am a bit impressed that it goes head-to-head with the s390x virtual machine borrowed from IBM, while never drawing more than 25 watts, but other than that: Meh …
This is one disadvantage of being a systems programmer: You see up close how each successive generation in an architecture has been inflicted with yet another “extension,” “accelerator,” “cache,” “look-aside buffer,” or some other kind of “marchitecture,” to the point where the once-nice and orthogonal architecture is almost obscured by the “improvements” that followed. It seems almost like a law of nature:
Any successful computer architecture, under immense pressure to “improve” while “remaining 100% compatible,” will become a complicated mess.
Show me somebody who calls the IBM S/360 a RISC design, and I will show you somebody who works with the s390 instruction set today.
Or how about: The first rule of ARM is, “We don’t talk about Thumb-2.”
A very special instance of this law happened when AMD created the x86-64 instruction set to keep the x86 architecture alive after Intel, the nominal owner of that architecture, had all but abandoned it and gone full Second Systems Syndrome with the ill-fated Itanium architecture.
Fundamentally, we now have both kinds of CPUs—ARM and x64—and they both suffer from the same architectural problems. Take, for example, the translation from linear virtual to linear physical addresses. Not only have page table trees grown to a full handful of levels, but there are also multiple page sizes, each of which comes with its own handful of footnotes limiting usability and generality.
Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?
Linear virtual addresses were made to be backward-compatible with tiny computers with linear physical addresses but without virtual memory. There are still linear virtual addresses that are backward-compatible with computers that were backward-compatible with computers that were …
Apart from the smallest microcontrollers, nobody sane uses linear address spaces anymore, neither physical nor virtual. The very first thing any real-time nucleus or operating system kernel does is implement an abstract object store on top of the linear space. If the machine has virtual memory support, it then tries to map from virtual to physical as best it can, given five levels of page tables and all that drags in with it.
Translating from linear virtual addresses to linear physical addresses is slow and complicated, because 64-bit can address a lot of memory.
Having a single linear map would be prohibitively expensive in terms of memory for the map itself, so translations use a truncated tree structure, but that adds a whole slew of new possible exceptions: What if the page entry for the page directory entry for the page entry for the exception handler for missing page entries is itself empty?
This is the land where “double fault exceptions” and “F00F workarounds” originate. And with five levels of page tables, in the ultimate worst case, it takes five memory accesses before the CPU even knows where the data is.
It doesn’t have to be that way.
One of my volunteer activities for datamuseum.dk is writing a software emulation of a unique and obscure computer: the Rational R1000/s400. (It’s OK; you can look it up. I’ll wait, because until one stood on our doorstep, I had never heard of it either.)
The R1000 has many interesting aspects, not the least of which is that it was created by some very brilliant and experienced people who were not afraid to boldly go. And they sure went: The instruction set is Ada primitives, it operates on bit fields of any alignment, and the data bus is 128 bits wide: 64-bit for the data and 64-bit for data’s type. They also made it a four-CPU system, with all CPUs operating in the same 64-bit global address space. It also needed a good 1,000 amperes at five volts delivered to the backplane through a dozen welding cables.
The global 64-bit address space is not linear; it is an object cache addressed with an (object + offset) tuple, and if that page of the object is not cached, a microcode trap will bring it in from disk.
In marketing materials, the object cache was sold as “memory boards,” but in hardware, they contained a four-way associative cache, which brilliantly hid the tag-RAM lookup during the row address strobe (RAS) part of the DRAM memory cycle, so that it is precisely as fast as a normal linear DRAM memory would have been.
State-of-the-art CPUs today can still address only approximately 57 bits of address space, using five levels of page tables, each level successively and slowly sorting out another 10 bits of the address.
The R1000 addresses 64 bits of address space instantly in every single memory access. And before you tell me this is impossible: The computer is in the next room, built with 74xx-TTL (transistor-transistor logic) chips in the late 1980s. It worked back then, and it still works today.
The R1000 was a solid commercial success, or I guess I should say military success, because most customers were in the defense industry, and the price was an eye-watering, $1 million a pop. It also came with the first fully semantic IDE, but that is a story for another day.
Given Ada’s reputation for strong typing, handling the type information in hardware rather than in the compiler may sound strange, but there is a catch: Ada’s variant data structures can make it a pretty daunting task to figure out how big an object is and where to find things in it. Handling data + type as a unit in hardware makes that fast.
Why do we even have linear physical and virtual addresses in the first place, when pretty much everything today is object-oriented?
Not that type-checking in hardware is a bad idea. Quite the contrary: The recent announcement by ARMa that it has prototyped silicon with its Morello implementation of Cambridge University’s CHERI architecture gives me great hope for better software quality in the future.
Cut to the bone, CHERI makes pointers a different data type than integers in hardware and prevents conversion between the two types. Under CHERI, new valid pointers can be created only by derivation from existing valid pointers, either by restricting the permissible range or by restricting the permissions. If you try to create or modify a pointer by any other means, it will no longer be a pointer, because the hardware will have cleared the special bit in memory that marked it as a valid pointer.
According to Microsoft Research,b CHERI would have deterministically detected and prevented a full 43% of the security problems reported to the company in 2019. To put that number in perspective: The National Highway Traffic Safety Administration reports that 47% of the people killed in traffic accidents were not wearing seat belts.
Like mandatory seat belts, some people argue there would be no need for CHERI if everyone “just used type-safe languages,” or they will claim that the extra bits for “capabilities” CHERI pointers carry make their programs look fat.
I’m not having any of it.
The linear address space as a concept is unsafe at any speed, and it badly needs mandatory CHERI seat belts. But even better would be to get rid of linear address spaces entirely and go back to the future, as successfully implemented in the Rational R1000 computer 30-plus years ago.
Join the Discussion (0)
Become a Member or Sign In to Post a Comment