The Docker Transition Checklist

19 steps to better prepare you & your engineering team for migration to containers

40. Building a database for internet-scale applications (Circa 1998)

Jon Christensen and Rich Staats learn about Chris Hickman’s first venture-backed startup (circa 1998) and its goal to build a database for Internet-scale applications. His story highlights what software is all about – history repeating itself because technology/software is meant to solve problems via new tools, techniques, and bigger challenges at bigger scales.

Some of the highlights of the show include:

  • Why Chris left Microsoft and how much it cost him; yet, he has no regrets
  • Chris’s concept addressed how to build a scalable database layer; how to partition, chart, and cluster; and how to make it highly available and a completely scale-out architecture
  • Chris couldn’t use the code he had created for it while at Microsoft; but from that, he  learned what he wouldn’t do again
  • Chris let the file system be the database at Microsoft, and the project was named, Internet File Store (IFS); it used backend code and was similar to S3
  • Chris named his startup Viathan; had to do copyright, trademark, and domain name searches
  • Data for the Microsoft project could be stored in files/XML documents; Viathan took a different approach and used relational databases instead of a file system
  • Companies experienced problems at the beginning of the Internet; rest of ecosystem wasn’t developed and there weren’t enough people needing Internet solutions yet
  • Viathan went through several iterations that led to patents being issued and being considered as Prior art
  • Viathan’s technology couldn’t just be plugged in and turned on, applications had to be modified – a tough sell
  • Chris did groundbreaking work for what would become DynamoDB

Links and Resources

AWS

DynamoDB

AWS re:Invent 2018 – Keynote with Werner Vogels

re:Invent

DeepRacer

JSON

Moby Dick

MongoDB Acid Compliance

Prior Art

Kelsus

Secret Stache Media

Rich: In episode 40 of Mobycast, we learn about Chris’s first venture-backed  startup circa 1998 and their goal to build a database for internet scale applications. Welcome to Mobycast, a weekly conversation about containerization, Docker, and modern software deployment. Let’s jump right in.

Jon: Welcome Chris and Rich episode number 40 here.

Chris: Hey Rich, hey Jon.

Rich: What’s up guys.

Jon: Great to be doing another Mobycast. So last week, we left ourselves on a bit of a cliffhanger. Chris was about to take the dangerous step of becoming an entrepreneur and then right before he told us what that was going to be like, we cut the plug on the Mobycast and said let’s wait a week. So here we are. we’re talking about—definitely go listen to last week’s episode but we’re talking about DynamoDB and AWS and Werner Vogels’s talk, his keynote and how that sent Chris on a bit of a trip down memory lane and just how history kind of repeats itself. I think the point of the story is really that this is what software is all about. It’s these cycles and history repeating itself and essentially solving the core problems that technology is meant to solve or software is meant to solve but with new tools new techniques and new bigger challenges at bigger scales.

Last week, we talked about Werner Vogels’s keynote and why Amazon decided to shift away from using a relational database for a lot of the querying and data that they need for their Amazon.com website and Chris told the same story but from six years prior to Amazon having this problem via Microsoft and MSN.

It kind of ended on a sad note where Microsoft killed Chris project, his pet project where he was buildings on internet database and kind of from a lack of vision I would call it because clearly, Chris is solving a really important problem that eventually did get solved. But before it gets solved, Chris is going to go out on his own and found the company about it. So Chris let’s take it away from there. I think it’s March 11, 1999.

Chris: Yeah it’s a little bit before then. I left Microsoft at the end of ’98 to do this. when I left Microsoft, we myself and Marco, we were working on patent layers at Microsoft to draft that for patents related to this work that we did on how do you build an extensible storage layer for internet and style data. So those were several discussions, sit down discussions with believe some of the legal team there Microsoft the patent lawyers to put that stuff together. But I couldn’t wait. I wanted to go solve this problem now that the project had been basically terminated.

So I left Microsoft at the end of ’98. Those patents we still had the i’s dotted and the t’s crossed, I think there was like one subsequent meeting that Marco stayed at Microsoft for self figures after that. And so the patents were officially filed I think in March of ’99 even though I had left at the of ’98.

Jon: You know what’s sort of funny is that that same year I was working on a thesis for my computer science degree where I was trying to see if I could get some kind of neural network into a robot car. I didn’t get the neural network in, I did get some computing into the robot car but it’s sort of funny because this year also at re:Invent during one of the keynotes DeepRacer was announced which guess what, it’s a neural network in a robot car. We were both working on things that eventually AWS would announce in 2019, we’re talking about in 2018 re:Invent.

Chris: Yes, the cycle continues.

Jon: Yes.

Chris: So I left and it was kind of weird because looking back like I had—I haven’t even been there quite three years yet. So I was less than half vested in all my options. This was a peak time for Microsoft stock as well. In the less than three years that I’ve been there, the stock has split three times. It was crazy.

Jon: That’s ridiculous.

Chris: Every quarter when we had the earnings call, it just blew it out the door and it was a Wall Street darling and the stock, it literally went from, I think my original strike price was something like $10, it ended up being like $10 a share and by the time I left, it was $100 a share.

Jon: Fantastic.

Chris: Yeah, but I walked away from over half that. It was without hesitation. I was either ballsy or stupid. But I have zero regrets. So I left Microsoft at the end of ’99 and started—I teamed up with a former Microsoft colleague Steven Anderson. He had left Microsoft we’ve been actually officemates for a while there and he had left Microsoft about six months prior to me. He went to go work for a software development company as their CTO and help them grow their business.

We got to talk and I told him about what was on my mind and wanting to go deal with this problem and build and build solutions for it. He was excited about that as well so we decided to, “Hey, let’s go do a company. Let’s go make this happen.” So 1999 right, this is the height of the first internet bubble. It is just so frothy. Listeners out there may remember things like pets.com and may remember things like Starbucks investing like. $400 million in furniture.com or maybe it’s living.com. It was an online furniture website where obviously things like logistics of like how do you ship the stuff and not end up eating all your margin with stuff like that.

It was a really interesting time and the net is literally within six weeks of me leaving Microsoft. Steven and myself, we were in the private back room of one of the high-end steakhouses in downtown Seattle when it’s this wood paneled room with this massive wood table. We’re all sitting around having lunch. There’s four or five venture capitalists along with Steven, myself and another individual. One of the VCs were like, “Yes, we would like to invest in you.” and so as a term sheet, it’s upside down face down on the table and slides it over to Steven.

I’m just thinking, this is so unreal. This is like straight out of a movie. Steven turns it over and it’s like, “Here’s our seed realm term sheet for $700,000.” so I’m looking at this I’m like, oh my gosh, we’re going to get $700,000. We have no assets, it’s just basically Steven and myself, it’s an idea, it’s a pitch. We haven’t generated a document yet, we don’t even have an executive summary typed out. It is literally just a pitch and here it is, we’re going to get our shot. We have capital, immediately right out of the gate to go start executing on this.

It was pretty exciting. We are off to the races and we were just on a mission at that point. Just hit the ground running. the whole concept was how do you build this scalable database layer and how do you do things like partitioning and charting and clustering and how do you make it highly available. How do you make it so that it’s a completely scale out architecture?

Jon: At that point you kind of already done a lot of work on it for Microsoft obviously, you couldn’t use that code anymore but you probably already learned some of the things that you wouldn’t want to do again. Like, “Oh yeah, if I were to write that again, I’d do it a little differently.”

Chris: Absolute. The work at Microsoft, some of the core concepts, this is building it on some of those core concepts but the actual implementation, it was different. It was rotated at least 90 degrees if you will. The work we’re doing at Microsoft was a different persistence mechanism. So we’re primarily thinking about file based storage instead of that was the database, the file system was the database. It was dealing with things like XML was the document format back then until json came along. We actually have XML. There were differences and there were some different goals but absolutely the work at the startup company was…

Jon: What was the name of the startup company?

Chris: So the name of the startup company was Viathan. Naming things is one of the hardest things in the world to do. When we started a company it was, now we have to come up with a name for it. You have to do things like copyright searches, trademark searches, you have to do things like domain name searches.

Jon: Well the luxury of doing a domain name search in 1999.

Chris: Actually, yeah. Then there was infinitely more choices however, it was still difficult. It felt like all the good ones were taken. It was very hard to come up with something. It kind of felt like the couple in the hospital that has a baby and it’s like two days after the birth of the baby and still don’t know what to name it and they’re like, “Well, you need to fill out this paperwork for the birth certificate so pick a name.” and so, Steven and I, we were in an elevator ride up into our lawyer’s offices to drop the paperwork for incorporation. I kind of have him these ideas, “Well what about Viathan.” because the code name for what we were building we were calling it leviathan because we’re like the monster, very big, lurking underneath the water.

Jon: Literally from Moby Dick. Here we are at Mobycast.

Chris: Everything is related people. And so I said, “Let’s just chop off the elite. Viathan, we’ll make up a word.” And Steve was like, “I love it. It sounds great.” we run up by our lawyer and he’s like, “I love it. It sounds big, it sounds forceful, it sounds strong.” so that’s how we named the company. So Viathan was the name and off we went. We were building how do you store this internet data, it’s not really relational, how do you deal with things like partitioning, the charting, how do you cluster it. What are all the components that you need into that and to go build that out?

We spent the next two years doing exactly that. Lots of lessons learned over the process. We got a lot of things right and we got a few things really wrong. At the end of the day, that’s what kind of hurt us. I think the net there was that this was still early days and so even though companies like Microsoft at that kind of scale had this problem, the rest of the ecosystem wasn’t there and the internet hasn’t grown to this size where there was enough people with that pain and they needed a solution today. They knew they were going to need it a few years down the road but not necessarily today.

Jon: I have a couple of clarifying questions though. You had said at one point you were talking about file system, you have files on the file system storage and XML. I just wasn’t clear when you said that whether that was Viathan that was using XML and files on the file system or whether that was Microsoft.

Chris: So what we were doing at Microsoft, Microsoft was like, let the file system be the database. The project name at Microsoft, we actually called it IFS which was short for internet file store. Basically, it was backend code that they were using. In a way, you can think of what we were building was really similar to S3. It’s just super similar to S3 if not almost again verbatim. So that was the work at Microsoft.

Jon: You’re building a data like Chris.

Chris: Yeah, I mean, essentially because that’s what all this data could be stored in files and those files were actually XML documents. That was the whole gist behind that. With Viathan, we took a different tact and we said, there’s some good things about relational databases and that they are highly performant, they are built to support acid. So let’s leverage the best and design around the negatives of them. With Viathan, we actually used relational database as the persistent store instead of a file system.

But still sticking with this concept of things are—you don’t have these multi table relationships they really are like these bits and blobs of data that are independent. So build in to support the concept of things like charting, partitioning, clustering, but user relational database is the actual persistence store.

Jon: I just kind of wanted to dig around in that idea a little bit. I’m just imagining that one of the things about doing that is like a database sort of writes everything to a set of files that keeps open so when you use a file system, you maybe have to go open and close files and that takes a little time, it’s a little slow. Whereas if you have the file handles already, they’re just sitting there ready for you to write or read from the file and everything gets faster so that part of what you’re thinking like we can make this faster by using something like a database that’s just as good at you know pulling information in and out of files that they already has access to.

Chris: Yeah, it was multi…

Jon: Yeah, that and you get the acid transaction stuffs for free.

Chris: Yeah, so things like locking transaction and then also just going to market with something like this that was built on top of these industrial strength systems, that is a net win as opposed to saying like we’re building something completely from scratch based upon the file system.

Jon: That makes sense. It’s interesting that you started, you’ve kind of built a document database on top of relational databases so you get the acid compliance and that this year 2018, Mongo announces acid compliance.

Chris: Yes and DynamoDB now supports transactions.

Jon: Craziness.

Chris: Yup, indeed. It’s interesting how far we got with Viathan. We did get rather far with it. We went through several iterations that work led to a few patents being issued as well.

Jon: Before you talk about those patents, another thing I’ve never asked you about is, did you have any kind of significant customers during any important real work ever? It’s okay if you say no, I mean you could just have been too far ahead of the market.

Chris: Or I could just lie, right?

Jon: Yeah.

Chris: We had IBM, we had NASA, we had Bank of America, no.

Jon: Shoot, you had me going. I was like, that’s freaking cool, that’s great.

Chris: So again, this kind of gets into some of the trials and tribulations and some of the big lessons learned that we had. I kind of alluded to the fact that the ecosystem wasn’t ready for it. At that point in time, you didn’t really have much. There wasn’t much in the way of dynamic programming languages. I mean almost everything was compiled, it was basically C and Java were your options. You were writing those as native applications, if it was web based code, you were doing ISAPI extensions which is basically a plug in model for Microsoft’s internet server.

Your INC code tends to go into that and your C code was using SDKs. This whole concept of API driven development and RESTful APIs, it really wasn’t a thing at that time. You were just writing code the way that you always had with just link into an SDK and calling that API. It was doing its own marshaling work and on March and whatnot. So in order for someone to adopt this technology that we’re building, you couldn’t just plug it in and turn it on and this works. They actually had whatever applications that they have that were storing and retrieving data, those applications had to be modified. They had to write to our SDK, they could actually leverage the date layer that we have built. That was a big source of friction.

Jon: I can’t imagine there being an entity Java bean for Viathan database.

Chris: Yeah, JV didn’t even exist. The enterprise edition of Java didn’t even exist at that point. This is 2000 or 2001.

Jon: I don’t know, JV maybe have been there by then. You were too far up in the northwest, too close to Microsoft to hear about it.

Chris: Probably. We did have this issue with one like, we’re asking you to replace your database with us. That’s a huge thing and then two, you have to go and rewrite your application with our SDKs and APIs. There’s definitely some work there to be done. We spent a lot of time, I mean I personally spent a lot of my time on planes going and visiting folks down in the valley, in Silicon Valley. Working on just that of getting like basically beta customers. It was a tough sell. We went off to a bunch of super interesting companies that were doing, like they were at the top.

We did go down and talked to IBM and some of their engineers. I was talking with EMC quite a bit. We went down and talked with LoudCloud and for people who think this is going in the way back machine, LoudCloud was what Marc Andreessen did after he left Netscape. So it was all about data center, basically giving you data center management software. We were trying to get into LoudCloud and said like, “This needs to be part of your offering to folks that are trying to run internet scale businesses in the software that they need to do it.”

Then the other thing that we ran into is like we didn’t get a lot of time to do this. So from kind of like that point in time of being in the back room of the Metropolitan Grill, steak lunch, term sheet for $700,000 to like, “Oh no, this thing just died.” that was about two and a half years. We built a lot in the first 18 months or 24 months. but it still wasn’t, I mean we weren’t at version 1.0 yet, this was very mission critical, very difficult software to build, lots of components to it, lots of things to deal with and sometimes the last 20%, like this is definitely the case.

The last 20% is 80% of the work, it almost felt like it. We ran into timing issues and politics and expectations on between folks that we brought in to help round out the management team that wasn’t comfortable in the space. Investors managing their expectations and at the end of the day, we ran out of time. And so we weren’t able to really bring all the technology and the IP that we created, we didn’t get the running room to bring it to market.

Jon: You did end up getting a little bit more cash at some point, right?

Chris: Sure. We could sit here and probably spend hours going through that whole, because it was a wild ride. We did go on to raise more money. At the end we raised $24 million total and we grew from six zero to 60 people.

Jon: Yeah, I just wanted to give you the opportunity to say that without having to bring it yourself.

Chris: Yeah, we hit the ground running. We went from zero miles an hour to 150 miles an hour in a matter, it felt like weeks and we stayed at that rate for the whole duration and so much drama and intrigue. So much great engineering was done and technology was created, a couple of patents came out of this.

Jon: Yup, it’s like with the patents.

Chris: Sure. Actually I just went and looked at this the other day because I was just kind of interested especially after Werner described the trials and tribulations that he had which led to the creation of DynamoDB,. So I went back to the primary patent that resulted to the work that we did at Viathan and the title of the patent is Internet Database System. I thought that was kind of interesting because when Werner was going through his slides, he described DynamoDB as the—I’m trying to find the notes here for how we actually…

Jon: Extensible storage system? No.

Chris: No, it was. I will come back to this. It was one of those things where it’s like, it literally is the exact same terminology, it’s just…

Jon: […] for internet scale application?

Chris: Yes, DynamoDB is like, this was the database for internet scale applications and it’s like the title of our patent, it was internet database system. The one thing that stuck out to me is that there are 199 other patents that now site, this is Prior art. Which is pretty good for a patent to have that many other patents reference it. That’s actually a pretty high number. A lot of patents will have a handful or 20, 30, 40. So to have 199, that’s quite a few. That was really kind of cool and interesting to see that so much follow on work was kind of sighting this is Prior art and building on top of it.

Even more interesting was to see of those other patents that site this is Prior art, 61 of those belong to Amazon. So almost like, what is that, that’s just about a third of all of the other patents that site this belong to Amazon. If you look at some of those patents, the titles of those Amazon patents, I mean they are the components of DynamoDB. It has to do with partitioning and charting and request routing. It’s just super interesting.

Jon: I just realized though that I’m confused by that because I guess I don’t know the patent system well enough. When you reference something as Prior art, it feels like if you’re building on top of an existing patent, shouldn’t you be paying the patent holder license fee or at least getting in touch with the patent holder and saying, “Can I use this?”

Chris: If you are actually licensed, if you are using it verbatim, then yeah, you need to license that. The whole idea behind the patent system is basically, you’re sharing your intellectual property so that others may learn from it, extend it and in return, you get the assurance that anyone else that basically can’t steal your idea verbatim. Derivative work is something different. So all these other patents that site this is Prior art, that’s derivative work. It has to be different enough so that it’s not duplicating, it’s not stealing, it’s not copying what was done but it needs to be an extension of this.

So just like you know, there are patents that were done at Microsoft for that internet file store project and there were some similarities to it. The work that was done at Viathan, it’s derivative work, it’s not copying that but it’s taking some of those principles and now taking it to the next level. It’s a new invention. That’s what’s going on here. This is all part of the patent process too when you file a patent. You need to go and do a patent search to see okay, what are similar patterns. This helps the patent examiners as well. They’re going to do it when they examine it to go look into the patent database to see what other patents are similar to it in the same categories and whatnot and make sure that there’s not overlap.

You will have this as well, if there is an overlap and the patent examiner has issues or questions with it, then there will be back and forth. You may have to either explain your way out of it or actually change your claims. Because it’s the claims of the patent that dictate what it is that you get to keep as yours and what can be enforced.

Jon: I think this little side journey on patents, I think that the important thing to take away is kind of what we’re going to get into over the next couple of episodes is basically Chris, you literally did a lot of the groundbreaking work for what would become DynamoDB and what’s what we’re going to talk about over the next couple of episodes. we’re going to really look under the hood of it and see what it’s all about and I think we’re going to base most of that material on a talk that you attended at re:Invent.

Chris: Yeah. It’s super exciting because at re:Invent they did have a deep dive on the implementation of DynamoDB. So something that Amazon AWS has never really shared before. What is the architecture and how does DynamoDB actually work under the covers. So I was super interested the to see what they did there and how they architected it and again, it was one of those things like, wow, this is total déjà vu. This is exactly what we did at Viathan or a lot of the exact same problems and some of the exact same solutions. So it’s very interesting. So I look forward to diving more into that.

Jon: Great. Thank you for putting this together Rich and thanks for just fantastic stories over the last couple of episodes Chris and we’ll see you next week.

Chris: Thanks guys.

Rich: See you guys.

Chris: Bye.

Rich: Well dear listener, you made it to the end. We appreciate your time and invite you to continue the conversation with us online. This episode, along with the show notes and other valuable resources is available at mobycast.fm/40. If you have any questions or additional insights, we encourage you to leave us a comment there. Thank you and we’ll see you again next week.

Show Buttons
Hide Buttons
>