My question is, “Is it time to rewrite the operating system in Rust?” Now I will give you a little bit of a spoiler alert. This is Betteridge’s Law of Headlines, that any article that asks a question, you can reasonably say “no” to. But you’re not going to get a “no” out of me on this one. You’re going to get a bunch of qualified “Yeses,” but just to give you a spoiler, it is a question mark, not necessarily an exclamation point for reasons I’ll get into.
What Even is the Operating System?
But first, and actually I think it’s pretty great that I don’t even think I’m the first, maybe not even the second presentation, to have a slide on what even is the operating system? I mean, we’re in the OS track operating system. It has been around a long time. And apparently, we don’t even know what they are. And it’s actually harder to define than you might think. If you think this is easy, by the way, just say, what is software? Give me a definition of what software is, and I can wrap you around the axle on your definition of software. I will pose you this follow up question: is Euclid’s GCD algorithm software? Was Euclid writing software when he wrote the GCD algorithm? And if you’re answer is, “Well, no, of course not, because this predates hardware,” it’s going to be very hard to come up with a self-consistent answer to what software is.
I actually made the mistake. I know Mike Olson, who’s one of the founders of Cloudera. And I was up at Cloudera in very early days. There were like six people in the office. I don’t even know why he had me by. But I was having a weird day. And I was really grappling with this, “What is software,” question? And they’re like, “Well, software’s a thing that run’s on hardware.” So I’m like, “Well, what’s hardware?” You kind of take that apart. And all, everything starts breaking down. Like you are T minus 10 minutes away from an existential crisis. And you realize that you know nothing, that the ancient Sumerians were running software, that software is merely knowledge that we run. It doesn’t have to be Turing complete.
And at one point, Mike can see his team. Jeff Hammerbacher I think is beginning to stare at his hands and wonder if anything means anything. And Mike takes me aside. He’s like, “What’s going on? Are you on drugs? I mean, are you high right now?” And I’m like, “No, Mike. I’m just like, what is software?” He’s like, “All right, can you just talk about something else or whatever and get out of here?” Anyway, so you have the same problem with the operating system. What is the operating system? And like any definition, well we would say, “Well, the operating system is the thing that abstracts hardware.” But, well, how about virtual hardware? I mean, surely abstracting virtual hardware, surely the guest operating system that you run on your AWS instance is surely an operating system. That’s not abstracting hardware, not physical hardware. It’s abstracting virtual hardware.
So it’s really easy to migrate around or wander around and not have a really concise definition. But I think for our purposes, let’s say that an operating system is the program, because it is a program, that abstracts the hardware to allow for execution of other programs. So I think the fact that the operating system is itself just a program, it’s just kind of a super program, is kind of an important observation. My friend Mike Demmer, who’s here, went to work for a company called Tera, that you haven’t heard of. Tera actually bought Cray and then took Cray’s name, which is always an interesting sign. When the acquiring company takes the acquired company’s name, you know it did not have real good brand recognition. But Tera was a super interesting company, doing lots of interesting things.
And Mike was doing kernel development for them, and had done that for maybe a year or two. And he and I caught up. And he’s like, “You know what? I realized that an operating system is really just a program. It’s a complicated program. It’s asynchronous program. It’s an interrupt-driven program.” But at the time I was kind of offended by that because I’m like, “It’s not just a programmer. I’m a kernel programmer. Don’t rob me of my dignity.” But then I was like, “Actually, it is just a program. It’s a complicated program.” But it’s a program that runs other programs. I do think that if your program doesn’t run other programs, it’s a lot of great things that’s valuable to society and humanity. But we probably shouldn’t call it an operating system. And I think that the operating system also should be abstracting hardware in some way, shape, or form. But there are shades of gray there. And I think the operating system you have to say defines the liveness of the programs that it’s running. If the operating system dies, everything that it’s running dies, can’t live without the operating system. So it’s essential in that regard.
It’s important not to conflate the actual operating system, with the operating system kernel. The operating system kernel is the thing that runs in the privileged context of the microprocessor. That’s going to be ring zero if you’re X86, although ring zero becomes complicated because if, again, you’re in virtual hardware, you’re in a virtual ring zero. Nested vert, you could be in a ring zero that’s several rings away from the actual hardware. But you are in, whether it’s virtual hardware or physical hardware, you are seeing the entire machine, or what you believe to be the machine. Virtualization has made this so complicated. Everything has to have asterisks now. It was a lot easier when we weren’t virtualizing hardware.
But the operating system kernel is only one part of the broader operating system. The operating system also includes libraries. It includes commands. It includes daemons. It includes facilities. And, yes, Justin, in his earlier talk was describing how he wanted to strip all the stuff away. He blamed it all on Sun somewhat strangely, and then said that we need to strip it all away. And that is true to a point. But even in Justin’s stripped down world, there are still bunch of components that aren’t running at that highest level of privilege on the microprocessor or virtual microprocessor. But we would still consider the operating systems. The operating system is bigger than just a kernel. And I think that’s that really important.
Operating System Implementation History
Now, if you’ve seen me talk before, you know there’s no way we’re getting out of here without a history lesson, and especially when we’re talking about operating systems. We’ve got to turn the way-back dial, and we’ve got to talk about history. So operating systems used to be called executives, which is a very strange post-World War 2 hierarchical – I’m not sure whether it was like a managerial structure. I mean, there are things that could have been worse. But we had executives for a while and they changed into operating systems. And originally these things were all written in assembly, of course. They were all written in machine code. And if there’s one really interesting architecture to look at, it’s the Burroughs B5000, 1961. It was a super interesting machine. And that was the first operating system, which is called the MCP, which is the master control program. Yes, if you’ve seen “Tron,” it’s a master control program in “Tron.” If you haven’t seen “Tron,” do not see “Tron.” It is a terrible, terrible movie. Go see “War Games” instead. So if you’re trying to catch up on the movies you missed because they happened 20 years before you’re born, go watch “War Games,” not “Tron,” please.
But MCP was the master control program, master control executive, just all about dominating programs, I guess. But the thing that’s really interesting is this thing was written in its own Algol derivative called ESPOL. I just feel like I get Esperanto in my head immediately when you say ESPOL. And ESPOL, as it turns out, is not to be confused with a university in Ecuador, whose abbreviation is ESPOL. So probably a much more famous ESPOL at this point. But ESPOL was this Algol derivative that they wrote the MCP in. And this was a machine that was way ahead of its time in many different dimensions. And I can tell you as a technologist who has had the misfortune of being ahead of my time several times over, being ahead of your time is not commercially fruitful. Basically, the biggest upside of being ahead of your time is you apparently need no introduction to the OS track, but other than that, it’s all …
So Burroughs is a machine, was a wildly inventive company, incredible machine. But you haven’t heard of it because they went out of business. So the MCP was really interesting. And then in 1964, Project MAC at MIT was looking to build the follow-on for CTSS, which is Compatible Time Sharing System, and they wanted to do everything right. Now this is something we call Second System Syndrome, and every time I think second system syndrome was a bit invented, and maybe it’s a relic. Maybe it’s a kind of Mythical Man-Month, kind of holdover from the ’60s, I get hit by another second system. So I think it is a truth.
A second system syndrome is, “Hey, as long as we’re going to rewrite this, we should also rewrite that and revisit this, and we should finally do this the right way, and let’s do that the right way, and let’s finally change the database to something.” And the next thing you know, you have this massive system that ships later and later and later and later, and finally never ships at all. And that’s the second system syndrome.
Project MAC had maybe the ur-case of second system syndrome with Multics. And then this is also just a misfortune of history in terms of when they came on. They knew they wanted to do it in a higher level language, but I guess they just didn’t like the higher level languages that were out at the time. Now on the one hand, according to Multics, the only things to pick from were really FORTRAN and COBOL. That’s actually not the case. So now, someone you probably never heard of, named Jean Sammet, was an early computer scientist. And she wrote an incredible book called “Programming Languages,” which is a taxonomy of programming languages, circa 1963 or whenever that book was written. And let me tell you. That book is more than two pages long. Okay? Like there were a lot of programming languages; maybe too many programming languages, like a Mesozoic era of programming languages, too much CO2 in the programming language atmosphere or something, 80-foot tall ferns or whatever. But there were a lot of programming languages to pick from. But whatever, they didn’t like them all. So, fine. So we’re going to do it in this programming language that hasn’t yet been implemented, but has this really amazing specification called PL/I.
So they did this in PL/I. That is a Roman Numeral 1, by the way. If you want to put PL/I in your slides, that isn’t Arabic one. That is a Roman one. A very important point. And so they adopted PL/I and then they wrote the full specification for Multics with not only not a lot of code, not a compiler. They didn’t have a compiler. In fact, I don’t know, why do you do this? They were like, “Let’s outsource the compiler.” It’s like, okay, that makes sense. “Let’s outsource the first PL/I compiler ever to be written. We’re going to outsource this to this group in Los Angeles that no one had ever heard of,” who- it shows you how some things never change- and they’re like, “Sure, we can implement a PL/I compiler.” And then a year later, they fly out and are like, “Where’s the PL/I compiler?” “We don’t know what we’re doing. We cannot implement a PL/I compiler.” All right, so we have no PL/I compiler. But we have amazing operating system that has been very detailed, very specified, but we can’t actually write any of it because we don’t have the compiler. So this is ultimate waterfall, like nightmare waterfall.
PL/I in Multics
And so they didn’t have a PL/I compiler until 1966. And that is a major problem. Now, think of it; it’s really funny about Multics. Multics is one of these things like a super divisive history, in that depending on who you talk to, Multics was either this amazing success that no one gives adequate credit to, or this dismal failure. And I’m sure the truth is in the middle. And Corbató wrote an interesting piece, “Multics: The First Seven Years in 1971”, after Multics has already failed-ish. A bunch of companies have pulled out of Multics, and they viewed PL/I as being this great aspect of Multics.
And there’s no real mention of the fact, like, “Well, you didn’t have a compiler for three years.” I’m not sure that it’s unabashed success because to me- and again, it’s hard because this is all before I was born – it’s hard to know for certain. But to me, looking back on it, the fact that the compiler was not available for so long had to have exacerbated the decline of the coalition. They’re actually called it the triumvirate that ran Multics, and in particular, Bell Labs pulled out of the Multics project. So Bell Labs is like, “All right, we’re done. This is crazy. This is not going anywhere. The compiler hasn’t shown up. It doesn’t work.” Because PL/I wasn’t there, they developed a different dialect of PL/I called EPL. And this is another one of these things that you read. The EPL people say, “Boy, did we save the day. We came up with this dialect of PL/I called EPL.” Then Corbató is like, “People came up the dialect of PL/I, but we never used it, and it was completely insignificant.” So it’s like, “All right, I’ve got no idea what happened. It’s like watching Rashomon for operating systems.”
Anyway, Bell Labs pulls out of Multics in 1969. And there was a researcher that was formally on the Multics effort, a guy called Ken Thompson, a guy you’ve heard of presumably, wanted to implement a new operating system, and now didn’t have Multics, so implemented this thing on the PDP-7. It was later ported to the PDP-11/20. And it was named UNIX as a word play, if somewhat peculiar one, on the complexity of Multics. And every time I re-read the UNIX history, I get something different out of it. It’s unquestionable that a lot came together at UNIX, in terms of the late ’60s, especially the early ’70s. They were in the right place, at the right time.
One of the things that I see, especially reading it side-by-side with the Multics history, is it was really top-down software versus bottom-up software. And Multics was very top-down command-and-control, write specification, “maybe we’ll get it compiled or maybe we won’t” kind of software. UNIX was like, “Ah, let’s get something working and then, I don’t know, get something else working, and then we’ll write a man page for whatever we already got working.” And it was very organic, and I think that the world was ready for that kind of software model. And you begin to realize, looking at in the larger lens of what was going on socially and everything else, as my mother is fond of reminding me that, because I wasn’t alive in 1968, I’ve got no idea what a society on the brink actually looks like. Every time I complained about the present, she’s like, “Oh, this is nothing compared to 1968.” “All right, all right, Mom, fine. Mom, you’re kind of confirming the stereotype of baby boomers, but let’s move on.”
Unix and High-Level Languages
But in particular, UNIX was implemented entirely in assembly. And I think it’s a bit of a myth that UNIX was implemented in C. It wasn’t. It was implemented in assembly in a PDP-7. It was ported to assembly in the PDP-11. And, yes, there was B. There was this interpreted language B that was around, it was a BCPL derivative. There’s a very strange connection to Multics in that B and EPL were both implemented in this thing called TMG. TMG, I know that sounds like a teen web series that my kids would watch. I feel like my 14-year-old was watching the latest episode of TMG. But, no, TMG was the transmogrifier that was a transpiler, and B was actually interpreted. And actually, TMG is another kind of- sorry, footnote on a footnote- it was implemented by Robert Morris, Bob Morris, the father of Robert Morris, Jr., who was the Morris worm, even though I think Paul Graham, his eye twitches every time someone says the Morris worm because he and Robert Morris founded a company together. And he doesn’t like anyone to talk about the Morris worm, but it’s fine. I thought it was interesting, whatever.
So B, actually, was only used in these like super auxiliary capacities in UNIX. And it was using the assembler. It was used in the only version of DC. DC, a command that you only may know from mistyping CD. You ever done this, where you want to CD to your home directory or whatever? You’re just like CD enter and DC enter and you’re like,” What just happened?” And you’re like a captive prompt, and you’re like, “Control C” and it eats the control C. You’re like, “What is going on?” You write control D, quit. You’re like, “I don’t know how to quit or whatever it says.” And I shouldn’t say that. It actually shouldn’t say anything. DC reflects its B heritage, suffice it to say.
But they realized that B was not going to do it. It didn’t perform. They needed a much higher performing language than B. In particular, they, Dennis Richie and Ken Thompson, saw the power of implementing the operating system on a higher level language. But they knew that B in particular had no byte addressability. So the lack of byte addressability was going to be a big problem. So they came up with a new language that they called C. And so C followed on B. It made sense. But again, contrary to the myth, they are not twins. They’re not born at the same time. UNIX begat C in a very important, this biblical way. They’re siblings; they’re not twins. And UNIX is very much the older sibling.
And so C is rightfully called portable assembly. People called C portable assembly as a slur, but I view it as praise. I mean, its assembly and it’s portable? That sounds great. There’s a lot that is great about C. And C also was way ahead of its time. If you look at those other programming languages from that era, and then you look at C, it’s like stepping into modernity suddenly. First of all, the caps lock key disappears. It feels like every language prior to C was screamed. It was like, was the caps lock key broken?
And I like to collect old programming manuals. And I bought a weird lot from someone in the UK. And this box shows up. It’s like a bunch of crap. And then there’s this manual for this language called language H. And language H is a COBOL derivative that you’ve not heard of for a reason. This is an NCR Eliot-invented language that had Sterling as a keyword. Every language H program begins with CHAPTER ONE. And you’re like, “Oh, okay. This is cool, it’s a book. I get it. Whatever.” It’s super new. Computing is super new, and a book is the metaphore. And then you’re like, CHAPTER ONE. It will be like Sterling, and tapes and all these kinds of calculations. And then every program- there’s no chapter two, by the way. It’s only chapter one. Every program ends with the keyword “OBEY.” You’re like, “What? How about epilogue or end? This doesn’t make any sense.” It’s like now all of a sudden these are orders or whatever. It doesn’t make any sense.
And so that’s what we did. We went from obey in chapter one to modernity in a very short period of time. And the C revolution was real, and it was organic. I mean, I think one thing that’s interesting, again, if you look at this Multics as this top-down approach, this command-and-control approach, and Unix as this more bottom-up approach, one of the differences that you see is that C grew things as it needed them. The first interpretation of C didn’t have structures. At least structures were out in 1973 and this came in early. But they didn’t actually implement the UNIX kernel in C until they had structures. They implemented structures because they needed them for the UNIX kernel. They implemented bit fields because they needed it for the UNIX kernel.
So they implemented things as they needed them, which had strengths and weaknesses. One of the weaknesses from my perspective, there is no logical XOR in C. This is one of the things that pisses me off about C. I know this is arcane thing to get kind of spun up about, but there is no logical XOR. You’ve got Bitwise XOR, and you’ve got Bitwise AND, Bitwise OR. And you got Logical AND, Logical OR, so where’s Logical XOR? And there are actually times when you want Logical XOR in C, where you want to assert one of these conditions as true or the other condition as true, but not both of them, and at least one of them. That’s Logical XOR.
And you’ll know you’re in my code because that code has that assertion. “Yes, done with Bitwise XOR. I know how to do it, okay?” With an actual line of code, and then a block comment cursing the ghost of Dennis Ritchie for not having Logical XOR. I worked together with Roger Faulkner, who worked with Dennis Ritchie, and I was complaining about this. He’s like, “You know, there’s no Logical XOR because Logical XOR can’t short circuit.” I’m like, “Roger, I know that Logical XOR can’t short circuit. But is there some sort of biblical commandment that logical operations must be able to short circuit?” He’s like, “No. Really I’m just telling you that’s the reason.” I’m like, “That’s not the reason.” He was like, “I’m going to ask Dennis.” I’m like, “Go ask Dennis. Do it.” And, of course, Dennis immediately replies, “No, we don’t have Logical XOR because it can’t short circuit, and you should just use Bitwise.” I’m thinking, “Argh.” Anyway. So I was wrong.
But so there was this organic growth, and the organic growth was important. In particular, C grew very important facilities, like macro processing. Macro processing and Rust have actually caused me to reflect back on my own use of macros, because it’s one of the things I absolutely love about Rust. The macro facilities in C are really, really important. Yes, I know you’ve been told never to use the pre-processor, and, yes, I get it. But that’s the same thing we tell our kids about that they shouldn’t drink and they shouldn’t do drugs, and they shouldn’t have sex, and all this other like fun stuff, that actually they should do when they get older, and they can do so safely. You’re like, “I want grandkids.” I mean, at some point, figure it out, but not now.
And the preprocessor is the same way, and the preprocessor is really essential; it’s essential for DTrace, it’s essential to ZFS. It was essential for a lot of the things we’ve done, but the pre-processor leaves a lot to be desired. There were standardization efforts, but they came late. They were contentious, and so on.
Operating Systems in the 1980s and 1990s
So through the ’80s, C just dominated. And it was excellent fit for operating systems, essentially, every OS, with some exceptions. I know the Mac was in Pascal. Fine. But there’s going to be one of those in every class. And then there were some research operating systems. I don’t imply this is the totality of operating systems, but the vast majority of operating systems were either in C or they were still in assembly because they were DOS. But many of them were in C.
In the 1990s, a great darkness came across the land. It was really a terrible time to be alive, I’m convinced. And now I can speak from personal experience. Because there was this object-oriented industrial complex that insisted that everything be object-oriented. And the C-based systems were thought to be relics. Any C-based system was the walking dead. And this has all been replaced by this beautiful C++ systems, or Java-based systems that are going to replace all of this. So you can keep working on this legacy garbage, if you want. But all of us are going to go over here, all of the flower people are going to go over here. And the architects are going to architect this great beauty. And all of those systems failed.
And they failed, I think, for a bunch of reasons. But one is they were all framed as second systems. And a lot of them suffered from Second System Syndrome. So, Apple’s Copland. I mean, kids grow up today, and they think of Apple as the trillion dollar company. I still remember when Gil Amelio was running Apple. And every other Silicon Valley company was eyeing them for their real estate because they couldn’t ship an operating system. They had this second system called Copland. It was going nowhere. Sun had Spring. That was written all in C++. I was an undergraduate at the time. We got this hot release of Spring. We thought we were getting something that was illegal. I mean, it was like, “Okay, I’m going to give this to you on a CD, but it’s amazing.”
And you run it. It’s like, “Is this is a hard drive stress test?” because all this thing is doing is rattling the spindle. And it’s just like, “Well, how much DRAM do you have?” And at the time, this is a souped-up machine that had 32 megabytes of DRAM. And they’re like, “Well, oh, no. That’s not nearly enough DRAM. You need to have 128 megabytes of DRAM,” which is like saying today, “Oh, you’ll need 512 gigs of DRAM. Yes, you’re going to need like 2 terabytes of DRAM.” And you’re like, “Who has that?” Well, all of the Spring developers had that. That’s a bad idea. Do I need to tell you that’s a very bad idea? You don’t want to give your OS developers so much memory because they’ll use it all. And then it can’t be used on a system that’s got less memory.
So there was Sun Spring. Have any of you heard about Taligent? Oh, yes. So Taligent was going to rule us all. So Taligent was IBM and Apple. It was the best and the brightest. We’re going to go up to the ivory tower and hand us down the operating system that we would run. Thankfully, I didn’t do this at the time. I might have really gone nonlinear. Taligent, as it turns out, is a portmanteau of “talented” and “intelligent.” It’s like, “Whoa, boy.” It’s like, “How about Arrohole? How about that? Call your company that.” And is turns out, they nailed the worst of IBM meeting, the worst of Apple, in an effort that was a total disaster. And again, and they weren’t going to do this operating system. And it wasn’t necessarily their fault. It was, again, it was because they were taking this from-scratch approach, it’s just that it attracts all sorts of abstraction garbage, like some sort of terrible abstraction fly strip.
And then the Java-based operating systems. I mean, I was at Sun, so I’m obligated to remind us all that JavaOS actually went nowhere. This is where we’re going to do Java everywhere, Java–based microprocessor. We’ll do a Java OS. Java had no unsigned type. It’s hard to talk to hardware when you have no unsigned type, as it turns out, so that didn’t work very well.
Operating Systems in the 2000s
But then Linux arrived, and the nice thing about Linux, especially Linux on x86, is that UNIX had a huge resurgence. But with that, C-based systems became really deeply entrenched, and there were a bunch of C++ efforts. They all kind of withered, and, there are exceptions to this. Haiku is an exception to this. But when Haiku is listed as your example of a serious attempt … I mean, I love Haiku first of all, but Haiku was designed as the hobbyist operating system, to emulate Be. It’s great. I love it. But it’s the counterexample. I’m sure there are a lot of others, but basically C++ kernels withered. Everything was in C.
A lot of mistakes were made. No, I was experimenting. It was age-appropriate. No, it makes total sense. But there would be a lot of problems making a Ruby OS or a Python OS, and it would be something you would do because you were in college, which is fine. But there were no real serious efforts.
System Software in the 2010s
So we hit, I guess what we’re going to end up calling the teens, which makes no sense to me, but we hit the 2010s. And system programmers were looking for something different. They, we, want the performance of C, but they, we, had used these other languages. And it’s kind of nice to use a map or a hash, or it’s kind of nice to be able to use split or regular expressions, or what-have-you. I mean, things that are powerful constructs, but I want them in the system that I’m building.
I think there is a class of software for which I don’t mind requiring deeper thought upfront. And then there is another class of software for like, “Hey, let’s get something working,” and that’s fine. So this is not like the entire world should be GC’d, or none of the world should be GC’d. I actually do think there are right tools for the job. But for system software, for the operating system, for that infrastructure, I think GC is a real problem. And at least for me, I’m at the point where it’s like I don’t want to do performance-critical software in GC.
And then, so I was actually in a bit of quandary because the Node thing wasn’t working out. I was kind of getting out of a bad relationship on that one. And then it’s like, what’s next? And C++, I had done way too much C++. I talked about how in college, I had done way too much C++. So Steve had done Ruby OS. I had written a hundred thousand lines in C++. They were both age-appropriate mistakes, I think we would say in retrospect. I didn’t want to go back to C++. It was like, “No, no. But C++ has totally changed now.” That’s great for C++, and I want C++ to have a happy and productive life. But C++ dragged all my shit into the street and lit it on fire. So I am not going back to C++ ever, and that’s not a value judgment. That’s just like, that’s self-preservation, right? I’m not going back to C++.
So forget C++. It’s not going to be C++, even though, again, I honor the changes they made to it. And looking at this, I was getting increasingly excited about Rust. Rust was really interesting. And I think, how many people are aware of Rust? A lot of hands. How many people have pined after Rust, but haven’t written any? How many of you have actually written Rust? That’s a good number, actually. But a lot of you were in the category that I was in not that long ago, which is I was seriously Rust-curious. I was reading all the blog entries, and like looking at it like, “Wow, that looks interesting, and this looks interesting. And well, what do you think about white space?” because I got to find a reason to hate it, which is wrong, right? You don’t want to do that, get over your white space. And so I’ve learned a tough lesson.
But I was really interested about it. I was intrigued by Rust. And in particular, the values of Rust really represented the values that I have for my own software. So values, there was the values of Rust, said very upfront. You go to any Rust website, it’s going to say that it’s a system software language that runs fast and is designed to be safe. And that was really interesting because those are the things that I want to do, and I want to build permanent software. I want to build software that outlives me. I’ve already written software that I know absolutely will outlive me, and not because I plan to walk into traffic tonight. There is software that I’ve written that’s simply going to be uneconomical to rewrite because it works.
And the thing that is beautiful and amazing about software, is like Euclid’s GCD algorithm 2000 years ago, when software works — we’ve always focused on how software is always broken and the world is terrible, which is fine, that’s true — but when software works, it works forever for the problem that it’s solving. It is mathematical machinery. We are really lucky because most of us are software developers. And we are lucky to be alive now. We live in a golden age of the stuff. Our descendants will look back at this time, as this just amazing time, when all the software was being written.
And, now that said, some of the software is not going to last forever. Some of the software, we are going to have to rewrite with software that can survive in perpetuity. And what I see from Rust is that aspiration for in perpetuity. And I know Rust has eight different origin stories, all from Graydon. Graydon came up with the name, and has eight different reasons why he called it Rust. But the one that I like is I like that industrial sense of building things that survive, such that the chemical reaction with oxygen is your biggest problem. That’s pretty neat.
So Rust is really interesting. The thing that is super interesting about Rust, and to the best of my knowledge, unique, is the notion of ownership. So Rust is able to statically figure out who owns what. And when you, you being the thread of control, own the memory that you’re dealing with and when you don’t. And when you don’t, when you’re done, when it realizes, “Okay, you’re done with this, I will take care of freeing this for you,” there’s no “free” in Rust, per se. Memory is allocated for you, and then it is freed when you are done with it. The compiler makes this huge bargain with you. It’s like the compiler says to you, “I will do this for you. I, the Rust compiler, will do this. I will give you this incredible super power, where you don’t need to think about this. But you’re going to need to help me out. And you’re going to need to help me out when I don’t know where this memory is going or who owns it. So I’m going to allow you to lend this memory out. I’ll call it borrowing it, right? But I need to know when it comes back. And sometimes I get confused. I, the compiler, get confused. And sometimes it uses lexical lifetimes a lot to figure out what owns what when. And sometimes, I will get an incredibly cryptic error message because I am confused about what, about who owns what when. And you are going to grind on this to get this working, even though it feels painful and it feels like I’m giving you mixed messages and telling you to fetch rocks because I’m going to do this incredible thing for you, where I’m going to give you the power of a garbage collected language with the performance of manual memory management.”
And that’s amazing. And it’s really important for different people for different reasons. So there is a demographic, probably a big demographic, that a lot of folks for whom the great thing about that is memory safety, that I can’t overwrite anyone else’s memory, and that is really important. I’m not going to have bzero()s that have got the wrong size on them, memcpy()s that have the wrong size on them.
One of the things I love about Rust, among other things is Rust, there’s lots of kinds of different data corruption that Rust prevents. One thing is when you initialize a structure, you have to initialize all fields. And on the one hand, you are like, “Hey, Rust, get out of my grill. Why do I have to initialize all of these?” And then Rust should be like, “What’s your problem? Why don’t you want to initialize this? You’re the one with the problem. You initialized this. Give me a value. Give it to me as zero, put a zero there. Fortunately, the Rust compiler doesn’t get that belligerent with you. And actually, what’s really funny is the Rust compiler has such a fearsome reputation for somewhat legitimate reasons just because when the borrow checker’s angry with you, it can feel like, “I don’t know what to do.”
So as a result, the Rust compiler totally overcompensates. It is the friendliest compiler I’ve ever encountered, because Rust is like, “Oh, I don’t think you meant this. I think you might have meant this thing. And here’s exactly what it …” And it’s like, “Look. I’ve got all these ASCII colors where I’m showing you exactly where it is.” And you’re like, “Why are you being so helpful? This is great.” It’s like, “Just remember this later. Just remember my helpfulness at your darkest hour because your darkest hour is probably coming.” But I actually do. I love the Rust compiler. And the reason this is important is, for me, memory safety isn’t as big of an issue because – I don’t want it to sound arrogant – this is like, I can write correct C. And now I’ve gotten it the hard way. It’s not something that I’d recommend for everybody, but I can write correct C. And I’ve written a lot of correct C. I can write C that frees memory properly, that doesn’t overwrite, and that basically doesn’t suffer from memory corruption.
The arithmetic overflows are a lot harder, actually. Rust actually helps you. That one I’m not going to make the claim because those are super hard. But on memory safety, I can do that. But Rust is still valuable for me. I can do that because I’m controlling heaven and Earth in my software. It makes it very hard to compose software, because even if you and I both know how to write memory safe C, it’s very hard for us to have an interface boundary where we can agree about who does what. So you know how to free memory properly, and I know how to free memory properly. Oh, but I didn’t realize that in that error condition, I was supposed to free the handle that you had given me, not you — it’s a miscommunication.
C is very, very hard to compose properly. And one of the things that we do a lot actually to compose C is we use what we call intrusive data structures, where you embed nodes of a data structure in your larger structure, which is great for C. It’s terrible for Rust. We’ll talk about that in a bit. But Rust allows you to actually compose, truly compose, because the borrow checker has assured that the contracts for who owns what are implicitly abided by, and it’s amazing. It really is amazing because you can build this really sophisticated primitives.
Rust Performance (My Experience)?
And again, everyone’s mileage may vary, and I’m not trying to have this blog entry comparing the performance of Rust and the performance of C for my application. And I had this eight paragraphs trying to explain how this is not trying to draw broader conclusions about the performance of C versus the performance of Rust. But of course, it’s the internet. The first comments are like, “I can’t believe this jerk said that Rust is always faster than C.” It’s like, “That’s not what I’m saying.” For the software that I wrote, I had a body of software that was in C. It was in C and Node, I rewrote it in Rust. And when I rewrote it in Rust, well, I should say when I was doing that, I was almost inhibited by the fact that I was effectively porting my C because I tried to do some of the things the same way I had done them in C.
And in particular, in C, we have multiple ownership all the time. A double linked list is actually two linked lists. And a given object is actually owned by two things; your next thing and your previous thing. They both think they own you. Rust does not want to do that. I mean, you can do it. There was a great blog entry, a series of blog entries, on so you really want to write a double link list in Rust. You really don’t want to do it. And what you want to do is with Rust, and I think and the whole Rust community has been very upfront about this bargain that you’re buying into with Rust, is you want to rethink your problem and implement it in a way that makes your ownership clear. So if you conveyed your ownership clearly, Rust will give you very high performance code as a result. It’s like, “Okay, I can buy that.”
So I had done so many things. (We’re going to pick up the pace. Oh, that’s for questions — I’m not taking questions! False alarm. False sense of time.) But in my experience, so I went to go and implement this thing in Rust. I made all these compromises because I just got to the point where I got so beaten down. I mean, beaten that’s too violent of an expression. But I was bargaining. I was in the bargaining phase. I will do anything to get this working. I needed to get this working because I was trying to make this high performing and do what I’d done in C. I need to stop doing that. I just need to get this damn thing working.
So I got it working. And I’m like, “I don’t think it’s going to have anywhere near the performance of C,” because the way I did it was so dumb. I mean, I just did it the knuckle-headed way. In particular, I implemented a doubly-linked list effectively by having hash tables of UUIDs around my console. You want to get your next one? You’re going to look up this UUID effectively this time so you can get to the next thing.” And I’m like, “This is terrible. This is going to perform abysmally.” And it outperformed my C. It’s like, outperformed my C? What the hell? And so I’m like, “All right, I got to take that apart. What actually happened?”
What actually happened is- and it gets right to the composability of Rust – what actually happened is this particular code spends all of its time searching in a data structure, in a sorted data structure, which, in my C implementation was an AVL tree for a lot of good reasons. We use AVL trees all over the place because they’re super easy to use. You can embed an AVL node in your broader C structure. You can have a given C structure that’s on 15 different AVL trees at the same time by embedding different AVL nodes. We’ve got a library that gets it right, and so on and so forth. So there’s a lot of good reasons to do it.
But an AVL tree is good, but it’s not the best data structure to use. And in particular, Rust uses B trees. And B trees, yes, from databases, Rust implements data structures. Its binary tree structure is a B tree. As it turns out, in this era, and the memory hierarchy being what it is, cache line sizes being what they are, a B tree is actually a better implementation. And the implementation for B trees in Rust is total rocket science. And there are some great blog entries describing the iteration on that, but it’s amazing stuff. So a lot of this is a lot of the performance delta here, and this is runtime on the Y axis. And the rectangle is various data sets effectively for this particular application.
A lot of the win, on the one hand, was B trees. And it used to be like, “Wow, that’s B trees, not AVL trees.” It’s like, “Yeah, but the reason I could use a B tree and not an AVL tree is because of that composability of Rust.” A B tree in C is gnarly because a B tree is just … I’ve been talking about intrusive. It’s very intrusive. It’s very kind of up in everything’s underwear. And someone, of course on the internet, is like, “Well you should have used this B tree implementation.” You look at the B tree implementation, it’s like, “This is rocky.” And if there’s a memory corruption issue in here, I don’t want to have to go find it. So I would still use an AVL tree in C, even though I know I’m giving up some small amount of performance. But in Rust, I get to use a B tree, which is really amazing.
Rust: Beyond Ownership
Now beyond ownership there tons of features in Rust that I love. Algebraic types allow for amazing error handling. So algebraic types say this is a thing that can be this or that or this other thing. But it’s only one of these things. It’s basically a union, a C union, but with really first-class support. And the great thing about Rust is, if you return a result that is one of these things, you need to figure out which one of these things it is. Is this an error? Is it an actual thing that I can unwrap? And it’s beautiful, really results in terrific error handling. I can go to more details. Not now, but you can find that online. There’s a terrific chapter in the book on error handling.
Hygienic macros. I so didn’t have hygienic macros in C that I kid you not, I did not know how to spell hygienic until I wrote a blog entry on this two months ago. I literally did not know how to spell hygienic as a C programmer. It’s like that extra “I”. What’s that doing there? But the hygienic macros, it is macros that I can actually get in the abstract syntax tree, are a first class primitive in Rust, and it’s amazing. You can do crazy powerful stuff in macros, crazy powerful safe stuff in macros. The foreign function interface, which is so difficult with GC languages. With GC, a point I should have made earlier when we talked about GC language, is part of the reason they are an ill fit for an operating system is because it’s very hard to have C code that lives beside GC code and can actually share objects across them.
The foreign function interface in Rust is very well thought out, very full. And supports – I kind of invented this term, I’m sure there’s a better one out there – full duplex integration, in that you can take C and drop it into Rust or Rust and drop it into C. And that is really interesting. And there’s a lot of I’d say community interest in that. And one of the also neat things about Rust is that there is an unsafe keyword that you can use for unsafe operation. It’s not arbitrary operation. You can’t just like have the unsafe keyword, and then just go to town.
But so there are certain things though that get relaxed when you have this unsafe keyword, which allows you to overrule the safety guarantees just ever so slightly where you need to, though with obvious peril. But in particular, that allows you to build a composable data structures that can be used for other things. And there are a ton of other great features. Wish Ashley and Steve in the room, it’s a terrific community throughout the room. I didn’t mean to imply that it’s only a terrific community with you here in the room. But the Rust community is actually really amazing. I got to the point where open source communities just kind of gave me a throbbing headache, because it can be really difficult to accommodate such diverse opinions and everything else. And I don’t know necessarily that it’s any less difficult in Rust. But I think that the values of Rust tend to drive conversation. It feels like there is a keel in the water. I mean, it’s still the internet. So, of course, there’s still bad behavior and everything else. But it feels so much more constructive because the values for Rust are so clear. So there’s a lot of value to it.
Operating Systems in Rust?
So, finally, the main event. I would say first of all, if the history of OS development teaches us anything, it’s that the run-time characteristics of an operating system trump its development time. Nobody cares how long it took you to develop it, and that it was easy to develop or hard to develop. No one cares. They care if it works, and they care how fast it is. That’s all they care about. And when I say “they,” I mean “you.” That’s what you care about, you who do not develop OS kernels, and that’s what you should care about. That’s what we want. We want it to be fast, and we want to be correct. And if it was easy for them to implement, great. If it was hard for them to implement, well, okay, is it fast and correct? Yes, it’s fast and correct. Well, okay, that’s someone else’s problem. And that’s fine.
But we have to understand that those are the lodestars. So we can’t actually throw those things out, if we’re going to re-implement it in OS and Rust. And we have to remember that structured languages replace assembly because they perform as well. If they hadn’t performed as well, they would not replace them. And every operating system has assembly somewhere in it. And I do think that it’s important to realize that Rust does represent a new opportunity that we have not had in 30 years, more, which is a new programming language that actually is a candidate for this stuff.
So the first attempt that I’m aware of is Alex Light’s Reenix. This was a teaching operating system called Weenix that was re-implemented in Rust as an undergrad thesis. The undergrad thesis itself is a great read, where Alex goes into a bunch of the challenges that he ran into as he implemented this. The biggest challenge was that Rust then did not allow for any allocation failure. So if all allocations effectively block work, or death. Why live? Let’s just all blow ourselves up, which is fine. That’s actually not necessarily a bad attitude, especially in user land, but in the operating system, we can’t do that. In the operating system, we need to be able to know if this allocation is not going to work, and if we are out of memory, I need to go do some work to free up memory. It’s like, “Well, how can an application possibly free up memory?” It’s like, “well, this application is the operating system.” So we’ve got ways of freeing up memory. I will throw out dead processes until we’ve got memory freed up, which we don’t want to do. I mean, this is a definite point of divergence between Linux and other kernels. We actually don’t want to throw out dead processes. But there are lots of things that we can go do to free up memory in the system because we control all the memory in the system. So it’s a very hard to run a modern operating system – we run SmartOS — It’s very hard to SmartOS out of memory.
But we have to be able to still potentially fail allocations in some isolated cases. So one of the things we did in the kernel a long time ago is we introduced sleeping allocations. Most allocations are sleeping allocations that can’t fail. Some allocations are no-sleeping allocations, which means that they can accommodate failure, and we need the ability to implement something like that. This is an area that was made much better with the addition of global allocators, it was like 1.27, 1.26, something like that.
But dealing with memory allocation failure is still very much an active issue, an unsettled area for Rust. I would check out the RFC number that you can check out there. These RFCs, if you do real Rust development, you’re going to end up in the RFCs pretty quickly as you go to understand because things are new enough that, even if you’re using things that are there, you’re going to inevitably come back to the design discussions. And those design discussions can be really interesting and actually uplifting.
Since 2015, there has been this explosion of little operating systems in Rust. If I forgot yours, I’m really sorry. I tried to get them all. I tried to get the ones that still compile. So you got Redox, which is kind of the big one, and Tefflin, and Tock, and intermezzOS, and RustOS, and QuiltOS, Rux. And then Philip Opperman’s Blog OS. Philip Opperman, please give this thing a name. Call it whatever you want. You get to name something. It’s important enough to name it. I don’t want to have to call it Philip Opperman’s Blog OS. So just give it a different name. It’s a great thing you’re doing, right? Please, give it a different name. BlogOS I guess is what it is.
So some of these are teaching systems. IntermezzOS, BlogOS. Some are uni-kernels, like QuiltOS, that only are going to allow for Rust programs, so they’re kind of limited in that regard. Some are targeted at IoT, like Tock. The attribute that all of these systems have in common is they are all de novo. They are all written from scratch, which is great on the one hand, because Rust allows us to explore these new ideas and build these operating systems. And all these efforts should be encouraged, and especially the teaching ones. All terrific. It’s great.
Two but’s on that. One, is it means that you’re forsaking Linux binary compatibility. And that just means that a lot of the world’s software is not going to work, which is fine. I mean, I honor that. I’ve experimented with that approach for many years. For me, it doesn’t work, but I like having the world’s software work. And I say that as a UNIX operating system that’s very, very close to Linux. I’m serious, but like, no, no. So we’ve done all sorts of things to allow for Linux binary compatibility. Justin talked about some of those in his talk. That Linux binary compatibility has a really long tail. It’s really brutal. It’s very hard to do. You end up in a lot of really ugly gray areas, and again, I’m referring to Adin’s talk for some of the gray areas that they ran into, and we’ve run into a bunch.
Operating Systems in Rust: the Challenges
So you’ve got all those problems, which you don’t want to have to resolve. You don’t want to have to do a vfork and SIGCHLD semantics. Trust me, vfork semantics is terrible. SIGCHLD semantics is terrible. vfork plus SIGCHLD semantics, unspeakable semantics super fun size that no one should have to deal with. And that also means that they are all fighting Second System Syndrome. They are fighting. Like, “Well, as long as the train is leaving, let’s back up the truck. And let’s do a new file system. Let’s do a new way of thinking about devices” And that can be really productive and interesting, but that can also be paralyzing. So that’s a challenge.
It’s also a challenge with what is the advantage of rewriting things that are actually working code in Rust? And I don’t know because the safety argument just doesn’t carry as much weight for kernel developers, not because the safety argument isn’t really, really important. It’s just because it’s safe, because when it’s not safe, it blows up, and everyone gets really upset. We figure out why. We fix it. And we develop a lot of great tooling to not have these problems. So it has to have more of a grab than just the safety. The performance is great. There are things about it that are great. But this is working code that we want to replace. Why do we want to replace this working code, especially with respect to the kernel, which has multiply owned data structures everywhere? Everything is on many lists. I mean, there are things that are where you’ve got these nice hierarchies of objects, but most things don’t actually have that. And, yes, some of those are semantics sewers and they should be re-implemented. Some of them, there are some endemic problems. And you’ll get thread dispatch and so on. There are some endemic problems.
And then, we’re just going to work around this. It’s unsafe. Okay, but every time you use an unsafe in your Rust operating system, it’s like you’re losing a little bit of the advantage of Rust, but whatever. So I think an OS kernel, despite the historic appeal and the superficial fit, I don’t know if that’s worth it. I don’t know if that’s good, the right first approach. I mean, again, these other operating systems should be encouraged. I think it’s great. But I think for the OSs that we’re going to use for the rest of our lives, I’m not sure that that’s a good, that right first approach. Maybe we’ll get there eventually. I don’t want to rewrite ZFS in Rust. That’s what it boils down to. I wake up in a cold sweat when I think about rewriting ZFS in Rust. We’ll go look at the ZIO pipeline, and tell me you want to rewrite that thing in Rust. It’s just like, “Oh, no, thank you.” I mean, the Rust compiler would slay me. It would win, and I would lose.
But let’s look at hybrid approaches because I think a hybrid approach is really the right approach here. One is to have Rust in-kernel components. So I love the fact that Rust can interoperate with C. So one hybrid approach is that you retain your existing C-/assembly-based kernel, the way we have had for many years. And then you allow for Rust-based things to be developed, Rust-based drivers, Rust-based file systems, Rust-based in-kernel software. One of the points that Justin made in his talk was that we want to have more in-kernel software. Well, this is the safe way of doing it.
Now maybe the safety argument is a big win because now it’s like, “Hey, more people can safely develop the kernel code.” Yes, fine. Existing kernel developer is not going to rewrite the kernel, but having a safe Rust sandbox inside the kernel, that might be interesting. It would allow for an incremental approach, which I think, let’s take that UNIX B/C approach. And it allows Rust to be used for new development, where you’re not trying to port existing C structure and being burdened by the C implementation. There’s a prototype of this in FreeBSD, others are possible. I think we definitely want to do this in SmartOS at some point.
Hybrid approach number two, Rust OS components. An operating system is not just a kernel. There’s a lot of other stuff out there. There are utilities, daemons, service manager, device manager, fault management, the debuggers, etc. Yes, systemd is a part of your operating system, whether you acknowledge it or not. You’ve’ got these things that are part of the operating system. These things are great candidates for if you writing systemd now, no question that it’s in Rust. Speaking for us, if we’re writing SMF now, no question in my mind that that’s in Rust.
And in part because, by the way, let me tell you something that most kernel developers won’t tell you. It’s harder to write user level software. It’s harder to write user level software because what can go wrong? You know a problem that I never had in the kernel. A configuration file that someone deleted. That’s just not the way we’re rolling the kernel. We don’t have that. The configuration file is the kernel. If you deleted me, I’m not here. It’s not my problem. You can’t boot anything. I got lots of other problems. I got to boot this thing, and then I got Intel introducing vulnerabilities, and I got lots of other problems. I have plenty of problems. But I don’t have distributed systems problems. I don’t have these kind of operational failures. There are lots of failures that I don’t have. And that’s just something that Rust is really good at. So that might be a really good candidate for Rust. And especially as the OS expands to beyond just an operating system, and it consists of entire cloud control plane.
And then the third approach, one that I’m excited about at the moment, is you know, there is another operating system out there, an operating system that hasn’t advanced technologically as much or at all, that’s basically still running glorified DOS. But it’s running all of humanity on top of it. And that is this absolute sewer of unobservable system critical, mission critical, software called firmware. And firmware has lots of problems. As I’m fond of saying, it is humanity versus firmware. And everyone needs to pick a side. And I think that if you look at some of the challenges of firmware, it’s like, “oh, man, that’s a good fit for Rust.” Yes, some of the embedded challenges that we have, but those seem much more circumventable. I don’t think you’re going to have the multiple ownership disasters that you have in a kernel. You don’t have the same expectations around binary credibility. You don’t have to go re-implement ZFS. I mean, it’s a smaller kind of a thing.
And so when I look at OpenBMC, I view OpenBMC as on its knees with tears streaming down its face begging for someone to please rewrite me in Rust, because if we’re going to have a BMC that’s going to hang out a socket over the internet, God forbid, or even over the network, I want that thing to be in Rust. I don’t want that to be in C++. And there’s a lot of stuff like that. So I think that is where I want to see a world. Right now, firmware is the proprietary software that we’re all forced to run. I want to see a world where we get to open firmware, and I want to see a lot of that running in Rust because it’s a much, much, much better foundation for humanity. We’ll do humanity a huge favor.
Looking Forward: Systems Software in Rust
So looking forward. I mean, I’m very bullish on Rust. I’m very excited about Rust. It’s something we really haven’t seen in a long time. It’s a true alternative. We can use it deep in the stack. I think it we can use it in the firmware. I think we can use it in operating systems. I just think that practically it’s going to be really, really hard. But I think we can definitely use it in the way we extend the kernel. And the beauty of Rust is that it allows these hybrid approaches. That the beauty of Rust is its ability to interoperate, cooperate, with native systems or C-based systems; allow for vistas to open up. And for us in system software, it is a very, very exciting time.