Intro to Rush.js with co-author Pete Gonzales
Monorepos are the new muse of library maintainers, but what happens when your project grows past 100 packages in the same repo? What about thousands? Rush.js was created for those cases, and Pete—who started the project while working at Microsoft—is here to tell us about it.
Pete Gonzalez | @octogonz
During the day, Pete works at HBO in Seattle on their streaming media apps. Prior to that, he was at Microsoft for 9 years, and before that, he worked at various consulting companies. A long time ago he was a cofounder of Ratloop, a small company that makes video games.
01:24 - Rush.js: What is it and what is it for?
04:47 - Problems with Managing Large Codebases
- Rush Stack: provides reusable tech for running large scale monorepos for the web
07:22 - How does Rush provide a solution for build orchestration?
13:34 - Rush Stack Opinion: How to Lint, Bundle, etc.
16:53 - Using Rush Stack: Getting Started
24:27 - Getting Technical About Versions
- Phantom Dependencies
- Pure Dependencies
32:47 - Thoughts on Monorepos
36:30 - Getting Started (Cont’d) + Efficient TypeScript Compellation
43:28 - Does Rush have a size limit? Is it for bigger or smaller projects? Both?
44:34 - Using pieces of Rush in non-Rush projects?
CHARLES: Hello and welcome to The Frontside Podcast, a place where we talk about user interfaces and everything that you need to know to build them right. My name is Charles Lowell, a developer at The Frontside.
It's been a really, really long time since we've gone to talk to everybody. We've been heads down working on projects, working on some products, working on a lot of different things that haven't given us the time to do much podcasting. However, one of the things that we've been doing lately is converting all of our projects over to using monorepos. And so, we've been very interested in the tooling that comes with managing more complex code bases that might contain a single application or contain a single package. And so, we have been evaluating lately these different tools. And one of them that came to the fore was Rush.js. So today, on the topic of this tool in particular, but I think also just in terms of managing larger and complex code bases, we have Pete Gonzalez on the podcast to talk with us. Hey, Pete.
CHARLES: And today, also, we're going to be talking with Shane Wilson, who is the author of repkgs and a trusted voice here at Frontside, certainly when it comes to build tools. Hey, Shane.
CHARLES: So, yeah. Why don't we just dive right into it and talk about what was the particular set of circumstances, the particular set of problems that gave rise to Rush. What is it and what is it for?
PETE: That's a good way to think about it. It's very much characterized by its beginnings. So, it started, I want to say, around 2015 when I was working on Microsoft SharePoint and we basically -- I won't go into too much detail, but we had a whole bunch of Legacy C# server code and the business has decided they wanted to move it into the browser as client code. So, lots of teams with this huge code base just started rewriting C# code as TypeScript. And the model, at the time at least, was you have npm packages that are your units of components and then each npm package goes in its own git repo.
So we just had a bunch of developers, who had previously actually been working in a monorepo, go and start creating repo after repo and then publishing packages and sharing them with each other. And we ran into a lot of problems with that, which today people kind of understand the value of monorepos for TypeScript and Node.js projects. But at the time, there was a lot of assumption that while all the npm packages that you use off the Internet, they're all kind of developed that way. But we ran into problems where somebody would make a change and then somebody downstream would get broken or things wouldn't upgrade. And we eventually realized that we needed to bring the projects together, at least for the team that I was working on. And at the time, there wasn't any Yarn or Lerna, these projects that people are familiar with today, we really didn't have any options. In fact, npm itself was like a new thing that was displacing Bower. So, a lot of stuff that we take for granted today in the Node.js community at that time, at least within the group that I was working and nobody had been very familiar with it.
So, we basically started with just a small tool that would let you cross-link project folders together and then it would work out which project needs to be built first. So, you build your libraries and you build the application. And so, it was a basic build orchestrator.
And then out of that, basically, people started coming together into one repo and then we encountered various other problems, such as publishing workflows and stuff like that. And so, Rush sort of came out of those beginnings. But one thing that makes it a little different from the other tools that have come along is that it was done at a larger company. A lot of people, if you're a very prolific coder, you might make a thousand files or something and split them out into packages and then have something build them. But at the end of the day, it's still you or maybe you and 10 other people that work on your code. So, you can be fairly flexible. And the code tends to be fairly well-behaved because everybody understands it. Whereas what happened on the project I was working on, it was more like you have like 50 people and then 100 people and then 150 people all working in this big common space. And you encounter a set of problems that really impact your ability to scale up. And that's the direction that Rush has sort of taken. As these other solutions came along, we're kind of like, "Well, maybe we should stop using it and switch to something else." But we decided to make a more new tool that focuses on those problems of large code bases and large teams.
CHARLES: Yeah. Maybe you could talk about, in specific, some of those problems.
PETE: Some of them are technical problems. Like you just have a lot of code and the builds take a long time, so you need parallel builds and incremental builds. And lately, we've been talking about multi-phase builds just to break some of those down. Parallel just means that you start up multiple Node.js processes that build stuff simultaneously. But there's also been work recently on sharding. So, you can distribute the builds across multiple VMs rather than just having it run on one VM. Like for incremental builds, with Node.js, there's a problem that every time you run a command in a given folder to compile a project, you have to spin up Node.js and then eval a bunch of toolchain scripts. So, for example, Rush's incremental build logic does all of the analysis from the build orchestrator, so that Node.js processes already running for Rush can determine whether a folder needs to be built or not. And even if that saves you maybe one second of time versus if you had gone and run the toolchain on that particular folder, if you have like 200 projects, that could be 200 seconds of time that you save.
But there's also human aspects of it as well. For example, you have people of different skill levels coming in and working on the code. So, you need people who maybe are just coming to that monorepo for the first time to fix a bug and then will go on somewhere else, you need it to be a good experience for them to learn how the tooling works. And it shouldn't be like a bunch of tribal knowledge that they need to learn how to actually turnaround a pull request. There can be political issues. So, there's like the bad team that doesn't have as good quality bar and keeps checking things in that make flaky tests or problems that break the build for other people. So, the other teams start instituting stricter policies, like stricter eslint rules, stricter code reviews, sign offs and stuff like that. But then it creates a separate problem where now people are complaining that the ecosystem is too bureaucratic.
So, this led eventually into Rush Stack, which is a more recent thing that we've been doing where we take the idea is Rush itself is a build orchestrator that doesn't really make a lot of assumptions about what your toolchain is. And then the idea behind Rush Stack was to start bringing in other pieces of the toolkit and saying, "Well, if you want to work more the way that we work, you can start using these other pieces of Rush Stack like API extractor and the bunch of web pack plug-ins and things like that. And it would include a bunch of best practices and stuff.
CHARLES: So, if I could ask a question, and this is on a much smaller scale, but I'm just kind of keeping in mind as I'm listening to you talking some of the monorepos that I work in personally. And one of the most annoying things that I think you actually touched in with the build orchestration was if I have, say, 15 packages, 15 npm packages in this repo, and there's a web of dependencies between these packages, and I'm working on one package and that's where my focus is. But I realize that I need to make a change in another package. Now, the minute I start straddling development on two packages, I have to remember, "Oh yeah, I need to go run a build. Whenever I change a source file on this other package, I need to run a build in that package." Then I can come back. And oftentimes, I will forget to do that. And I'm like, "Why is this not working?" And it's like, "Ah, I made the change in the source file. I need to compile the TypeScript and bundle it into an npm package distribution so that I can consume it from the library that I'm actually working on." And so then, what I find is now I'm making custom little hacks to do like file watching at the top of the thing, at the top of the the repo that looks at all the packages and incrementally compiles them whenever there's a change detected. So, how does Rush provide a solution to that? How would this build orchestration work? Is that a use case that it could solve?
PETE: The watch scenario is definitely an important scenario. Rush itself today doesn't have a great solution for that. Basically, you run watch in the project that you're actually serving. For example, you have a frontend web application and then you have various libraries that it depends on. So today what we do is you just run watch on the frontend web application. And then when the library is changed, you can use Rush to say, compile everything up to this project that I'm in right now and it will detect which things have changed so it can skip the ones that don't have any changes. For example, this is an area that we're very interested in. It hasn't been a top priority, I guess.
One other interesting aspect of it is so when we first developed Rush, it was just something I was doing on the site. I wrote the original prototype of it myself over Christmas vacation, I think. And then it became something that people were using. People got excited about, more people started having feature requests. Eventually when I was at Microsoft, there was a whole team of people who started just managing the tooling for the monorepo. And when you have a lot of people, that's actually one of the selling points for the monorepo, is the more developers that you get working together on one place, suddenly the more budget you have to have a team of people go and build really good tooling for it. So, it's not just like each repo has one guy who spends five hours a week putting out fires. It turns into like when I left, we had something like four or five people who were full time just working on tools. And there was a ticketing system where if you got a weird build error or some tests were unstable, you could actually open a ticket. And we had an on-call person who would investigate it. And then we collected statistics about how many tickets we got in different areas and did root cause analysis and stuff. I'm not HBO, but HBO kind of has the same thing. They have dedicated tooling teams that work in the big monorepo and it's really kind of good.
We think of it almost as a vector where every single piece of it, like your linker, is made by a different group from the compiler. And you can choose between Rollup or webpack for your linkers. Every step of the way, you have to make a decision. You have to read the blogs and find out what is popular and then make your bet and say, "Oh, we like Jest because that seems to be popular today and that's what we're going to use." Well, we made bets like that five years ago on Gulp and things that aren't very popular now. But when you start doing large scale development, once you make that bet, it's hard to change it because you have all this stuff that starts getting built up around it.
So, part of the idea of Rush Stack was let's have a bunch of people who are professional developers and who have experience doing tooling, make some bets and then decide on a set of stuff that should work well together. And then we're going to collaborate together with our community from open source. I now work at a different company, but I still am very involved with it and bring a different set of perspectives. But the spirit of it is we're all working together to try and over time build something up that is similar to what you'll get in other languages.
CHARLES: Is the idea then that the Rush Stack has opinions about how to lint, how to bundle, how to do all these things?
PETE: And part of the opinion is, we don't have time to develop a bunch of different options for things. We just want something that works. We don't really want everybody to choose their own flavor of ice cream because we don't have time to do that. But there's another aspect of it, which is that when you lock in on something, you can make optimizations that aren't possible when there's abstraction. And I can give some concrete examples of that. Like when we started with Gulp, the idea of Gulp was that you model tasks and then the tasks have dependencies between each other. So, TSLint, which was the linter we were using at the time, was a task and then TypeScript was a task. And the task was like this Gulp wrapper around the actual library of TypeScript and TSLint, and then you like wire them together. So first, we run TypeScript and we run TSLint. And the idea was you make these boxes and arrows like a big data flow diagram of how each task gets its turn on the files and the way things move between the boxes were streams of files or partially processed file buffers. And we ran into problems with that because, for example, TSLint needs to do a TypeScript analysis. So, it invokes the compiler engine to study the code and figure out what types or things. But the TypeScript task does that, too. So you really want the same compiler engine to perform the analysis once and then to share that intermediary state between TSLint and TypeScript. But it's hard to do that just with TSLint and TypeScript, if you're just writing a simple program that invokes one after the other. You have to kind of go into some internal libraries a little bit if you want to wire that up. But then when it's behind a task abstraction, there isn't like a natural way to do that.
Another thing that will come up is like some people, I don't know if we're still doing this today, but they would say, "If the code doesn't compile, don't show me lint errors because maybe the code's not even valid. Let's first make sure that the compiling works correctly. And then if the compiling was successful, then show the lint errors. But I want to run the compiler in the linter simultaneously to save time." So, I need to save the error messages from the linter in a buffer until the compiler says it's successful, then I'll show those messages. So again, this is something that breaks down this idea of abstractions.
And if we wanted to replace TSLink with ESLint, let's say, you end up with the same problem again that I mentioned, where you want ESLint's parser to share the same analysis as TypeScript. But the way that you hook those pieces together is going to be completely different because the entire architecture of ESLint is completely different. So, all that's to say that when you make something pluggable, you make a lot of architectural sacrifices or you create the need to create this rich interface that can expose the real stuff that you need to interact with something in a tight, optimized way that's expensive.
So, with Rush Stack, we've taken the philosophy that we do want some things to be pluggable, but they should be conscious decisions of this is a part where there really is value in allowing different pieces to be swapped in here. And we've actually tested them and we've actually validated that the abstraction we set up makes sense and is workable. It's not like we just say, "I did one thing and it's pluggable, so you can drop something else in there if you want." But it's a dice roll. It's an exercise of the reader, whether it will actually work or how much work is involved with that.
CHARLES: So, if I wanted to use Rush Stack, and I understand that it's in development, I'd like to talk both about what's coming in the form of Rush Stack and what you would be able to use maybe today. But let's continue this talking about Rush Stack for a minute here. So, if I say I had a monorepo that had maybe, let's say, it's got 10 npm packages, it's got a couple of like backend servers and a couple of UI like frontend React apps, what would that look like in terms of the artifacts that Rush generated or how would I interface with that system? What would be my experience in terms of spinning everything up, working on an npm package, working on one of the backends, working on one of the frontends, so on and so forth?
PETE: Yeah. So, you would start by converting the monorepo to use Rush as its build orchestrator. And there's like a rush init command that will dump out config files for you. We didn't put a lot of thought into the onboarding experience for a long time because when you have a monorepo, you only set it up once and then you never think again about the setup experience. And that was when we made Rush open source and then started trying to build a community around it and get people to use it and maybe contribute back to it, it took us a long time to realize that other people had a hard time setting up Rush at all because we never went through the experience of setting up Rush. We've been working in the same five monorepos for years. But when we went and did some investigation into that, what we found is that we like the idea of having these big config files that have json schemas that make sure you have IntelliSense and error checking when you're editing them and they have big blocks of comments around everything. So you run Rush in it and it just dumps these big empty config files with comments. And then you go uncomment the parts that you want and add your stuff. We try to put as much prioritization of rush in config files versus like command line options or environment variables or other places where there's less rails or guidance of what you're supposed to do to set it up. So anyway, that would be the first thing you do is generate the config files and then go register all of your projects so that Rush builds them.
PETE: Let me finish my list of package managers for a minute, then I can dive into that. So, I said there was the original npm. Then there's Yarn classic and there's pnpm, which actually changes the installation model but using symlinks. So, it solves some problems that we'll talk about in a minute by creating a whole bunch of symlinks in your node modules folder. And it is a correct installation that eliminates those problems. And it's mostly backwards compatible because it still preserves the original Node.js module resolver model. And then there's Yarn plug and play, which is essentially a completely different installation strategy from the original Yarn, but its approach is that it changes the way Node.js resolves packages through this plugin. And then it's cleaner, it doesn't have to create some links at all. It doesn't even have to create node modules folders. It's more like how Ruby gems, for example, work. But it has more compatibility issues, because there's a whole lot of code that assumes it's going to go walking through node modules folders looking for things. It can be stuff like web pack. It can be just people's random packages that go and resolve things. But we encountered some compatibility issues. We feel like Yarn plug and play is like the future, it's like a better design, but it could take a while before a lot of people will be able to move to that.
Let's say you've got those and then there's Rush, which predated most of them. Rush was designed around the original npm package manager. So, it introduced its own symlinking strategy that is somewhat similar to what pnpm does today. Even now, today, when you use Rush with pnpm, it's actually Rush's symlinks retrofitted onto pnpm's model. And we have a PR that's been open for like a month now that actually finally lets pnpm completely manage the installation when you're using it and gets Rush out of the business of package management, which we're really excited about because it's complicated problem. And we have had like just tons of work that we've done over the years to support that and make it work correctly.
But let's rewind a little bit and talk about why all that exists, because it is another big aspect of working in a large-scale repo.
CHARLES: I understood reading from the documentation that Rush will or will not work with other package managers besides pnpm?
PETE: It supports npm and Yarn classic. The support is not great because almost none of the Rush developers or main users actually use this. Pretty much everybody is using pnpm. We've been wanting to improve the support for the other package managers, but our plan to do that is to basically let them just do the installation.
So right now, for example, when you use Yarn -- like Rush's original strategy was it looks at all the projects in your monorepo and then it creates this fake project that has the dependencies on the superset of all of your monorepo dependencies. And then it runs the package manager wants to install that folder and then it goes and creates symlinks for each project into that centralized folder. And there's some technical reasons why that is a good strategy. But when you do that with npm and Yarn, every time that they change things or release new version, that changes some semantics, some aspects of that can break and then you have to go debug it. I can say it works with the version of npm and Yarn at the time when we were last using that. But it's had problems over the years. And so, the long-term strategy would be to do the same thing we're doing with pnpm. Hopefully, we'll do it this summer. But say, if you want to use Yarn with Rush, then Yarn just installs the way Yarn installs and you get all the other Rush features but Yarn manages the installation and that means you lose the phantom and doppelganger solutions that Rush was doing. But the idea is maybe you didn't want that. The reason you're using Yarn is because you want Yarn's installation model.
CHARLES: I see. So, it really is, for example, one of the things that I was wondering is what's the overlap between, say, using Yarn workspaces with Rush. And kind of what I'm hearing is that it really is, you're choosing one or the other. Because that's kind of when you're working with Yarn workspaces, you've got some virtual package that aggregates all of the packages and stores at least some of the dependencies in one place. And Rush is doing the same thing.
PETE: Let's get a little more technical that versions.
PETE: So, npm has, I guess I can say, two big factors that got us into this space to begin with. One of them is you just have way more packages than usual. So, for example, if you use, I don't know, C#, which I've used before, it would be common that you have 20 libraries that you install that are like dlls or something that you get from open source or you buy them from a commercial company and that's your dependencies. Whereas with the standard Node.js project, notoriously you can have typically more than a thousand npm packages getting installed just to do some basic stuff like, build an application and not do anything particularly weird. They just have an approach of having lots and lots of small projects that all have dependencies on each other. So out of the gate, npm has a lot more complex version management issues than languages that have less libraries that tend to be bigger and more monolithic libraries, like .NET framework, for example, which just when you go to use C#, you get this massive library that comes from Microsoft that solves a whole bunch of problems, than with npm, each one will be solved by some individual package.
But then there's another aspect, which is the way that npm installs packages is -- I have to be careful talking about this, but it's flawed. When you have packages that depend on other packages, they form what's called in computer science, a directed acyclic graph. Basically, it's like a tree that branches apart, but then it can come back together again. But the installation strategy of the Node modules folder is that we're going to use folders on disk to represent the dependencies between packages. So, if A depends on B, C, and D, those become subfolders of it. And then when you have a diamond dependency, you run this problem that the file system on a computer actually is a tree. It's not a directed acyclic graph. You'd need symlinks to make it into a directed acyclic graph. So, there's a whole write up on the Rush website that tries to give examples of the technical details.
CHARLES: It really isn't npm's fault. It's actually kind of the original sin was committed by Node in the way that it requires modules. Like, that's where it actually happens.
PETE: I think they were developed together.
CHARLES: Oh, okay.
PETE: Actually, I shouldn't speak about it because I wasn't there. I don't know the original history of how Node.js and npm came together and agreed on how they're going to interoperate with each other. But if you look at it at the time, it was probably completely sufficient for the problems they were solving. And like Bower, the thing that they were competing with, doesn't allow libraries to depend on libraries at all. So, it's kind of an innovation that you could have these big dependencies of things. But basically, it introduces certain problems like the phantom dependency is the most obvious problem, which is they do this thing called hoisting, where if several things all depend on a package, they move it up to the top. And then if you can't find the thing you're looking for in a subfolder, you just walk up the tree and find it somewhere higher in the tree. And that creates this phantom problem, which is that a package can require things that it didn't declare a dependency on. So, it doesn't appear in your package JSON, but it is findable somewhere in the node modules folder. So, Node.js doesn't even look at package JSON files when it's resolving. It doesn't look at dependency declarations when it's searching for things. So, you can require things that you never declared and that can cause problems because a lot of times, the thing will be there because of some aspect. It's like, I depend on this other package and it depends on something. So generally, I can require that thing and have a somewhat good guarantee that it's going to be there. And I might never even notice the problem. But I don't have any say about what the version of it is because I never declared it. And it can get disturbed because it's the package manager chooses to install things differently than the version or the thing that I was expecting might not either, or it might be the wrong version.
And then doppelgangers are an even more interesting case, which is when you have a diamond dependency. I'll try to make it concrete. I can't think of the package names off the top of my head. It's like A depends on B, which depends on this D library. That's our diamond. And then A also depends on C, which depends on D. You can get a case where B and C need two different versions of D and npm models that they install the package twice, so you can get multiple copies of it. And because of the tree, there are situations where you can actually end up installing five copies of the same version of the same library. So, it's not even just that we have side by side versions, it's like the same version has to be installed in multiple folders because without a symlink, there's no way to make the folders end up being --
CHARLES: Whereas in other languages, you would get an error at build time saying, "Hey, I cannot find a single version of this library that satisfies the dependencies of this project." And you have to deal with it manually at the build time.
PETE: Or they would say -- Ruby, for example, will just install that version in one place. And then the way you find it is when you require a library, the resolver just looks in your dependencies and says, "Oh, you need version 1.2.3. Okay, I'm going to go look in that folder for it." But with npm, it's like, "Well, the only places I can look for it are in certain places that don't know what the version is." It means that the package manager has to copy the same version into different folders and you can [crosstalk] your tree. But if you think about it, say I have a singleton, then now I have two singletons getting spun up because Node.js doesn't actually understand that these two folders have the same library in them. It sees it as two entire instances of it. Where the TypeScript compiler, we had a longstanding compiler error that we were dealing with when we first set up Rush monorepos, which is the TypeScript compiler would see a declaration of a class in two places and it would assume that they were not compatible declarations. It has the same name, but it's a completely different copy of it. And we eventually got them to relax the rules a little bit and say, "Well, if it's in another package that has the same name as this package in the past relatively within there they are the same, then you can reasonably assume that it's the same thing and then disable your check there." And it took like a year to get that finally fixed.
So, this is like one aspect of the problems that's like really obvious. But there's another aspect, which is just the management of versions is complicated. npm has this thing called pure dependencies. In a monorepo, I personally have probably spent months of my life just helping people sort through problems of I'm trying to upgrade this package and now my installation fails or I upgraded the package and now the application doesn't run because something got jostled around in the Node modules folder and now the versions aren't exactly right. And when you go into a monorepo, the problems really multiply. In a small repo, which is where the vast majority of npm users work, you just don't have enough different people trying to install different versions of things to really hit these problems. But as more and more people come in, like the Rush Stack repo has something like 10 versions of the TypeScript compiler getting installed for various reasons. And then there are things that have pure dependencies in a TypeScript compiler, so you get sort of complicated version problems and that's where Rush's features came up. It has special protections against phantom dependencies and doppelgangers. It has weird concepts like preferred versions and allowed alternative versions and special checks to make sure that the versions are in sync. And you can have Rush check things for you.
So that's just an area that we became really interested in because we had a lot of frustration supporting it. And the big company, when you're shipping production software to the cloud where releases go out every week, you really need things to be deterministic and reliable. So, we put a lot of engineering and thought into that. Whereas a lot of other people were like, "What's wrong with it? Works fine for me. I don't understand. If it's a problem, go hack something in your package JSON file." We weirdly over and over again had conversations with people who couldn't understand what we were trying to solve or why Rush was preventing them from acquiring something they could see there.
I'll mention one other case that this came up, which is when you have multiple monorepos. People are always wanting to create more monorepos and then they're like, "Well, it's a great separation." You have all of your libraries in one monorepo and then your app in the other monorepo and then you just automatically publish things and automatically upgrade in the downstream repo.
CHARLES: So, you advocate or you kind of just monorepo all the way. Just keep doubling down on one codebase. It's always better if you've got a huge block of code to add it to the existing repo?
PETE: Well, I would say technically it's easier to manage when it's in one place because you can do things like a pull request that updates 200 projects easily at once without having to worry about publishing and upgrading stuff. But there are political reasons why you would want things separated. For example, at Microsoft, we had an internal repo that was closed source and then an open source repo in GitHub that was open source. So, at the very beginning, out of the gate, our toolchain had to be in a different repo from the application development. So that's like a valid case. Where there can be business groups that really are separate from each other.
To me, one of the real rubrics that you can use is about the contract. So, if you have, take React, for example. There's a group of developers that make React and they ship it, they document it. They have like release notes when they release a major new version, they make sure that they test the contracts for it. So, they're shipping a contract for people to consume. And there's a group of people around making sure that that contract is stable and well supported. That kind of boundary totally makes sense to have it within its own repo. The thing that really didn't work for us was the sharing code where we're not really shipping it. It's more like we're sharing it. So, there's some url parser that somebody makes and then another team wants to use it and some other teams have some libraries. So, various teams start putting code into different libraries into some shared repo that they're going to consume. But nobody really owns it. And you get this problem where somebody goes and makes a change to the thing they're using, but they're not going to go and test the other downstream repos. So then when the downstream repos upgrade, they might get broken by that change. And then it's like, whose responsibility is it? That guys should have tested everybody else? Or these people just need to pay this cost. And we had teams, when we were working in this model where it was like every week, somebody would have to go spend like their afternoon getting the upgrade to compile again because the upstream repo that they were consuming packages from just kept constantly having churn and changes.
Because at a company, it's not like npm, where somebody just kind of makes a library and once in a while, they update it. It's like there are people actively changing stuff every day because they're trying to get a feature shipped. And these aren't just really isolated libraries. They're all parts of the anatomy of a big application. So, there's a lot more churn. And we tried, "Oh, let's have it automatically upgrade. Let's make it so that when somebody goes and makes a change, we run tests in in the other repos and tell them that it's broken." Or a really common one is let's make people review the contracts. Everybody needs to follow a semver and they need to design their APIs to be forward thinking and stuff. But all those things that you look at, you're like, "Well, React did that. Let's the impose that on our developers working in this shared repo." The problem is that there's no budget for that. Nobody's manager is going to let them go spend a day having a meeting and deliberating about a design change to some url parser, because that's just not the business that they're in. Nobody really thinks that that is a shipping thing.
So, if you don't solve that problem, what you end up is people just stop sharing code. It's easier just to fork the library and have our own version of it and stop trying to share it with the other team. The monorepo really does solve that problem because it lets everybody work together on a shared library. And if you make a change to it, you can't merge or change unless all the unit tests pass for all the things that are affected by it. And when stuff fails, you can easily just go fix that stuff up in the same PR that introduces the breaking change.
PETE: Rush kind of checks out at the point where we need to build a project. At that point, it just runs an npm script in that folder. And then it's up to your toolchain to figure out how to build the project. And you can have different tool chains in each project folder. Rush actually discourages you from having any global stuff. So, for example, our ESLint rules, we don't have an eslintrc file at the root of the repo. Each individual project has its own lint setup and you can even be running different versions of lint in different project folders, which sounds not good. But when you have like hundreds of projects, sometimes not all the teams can upgrade ESLint at the same time, so you need to let people be able to have differences in their tool chains locally. Prettier, we did decide with prettier to run that globally because it's more of a cosmetic thing. It doesn't have a lot of breaking changes, and it's something that kind of needs to run on the diff of what you've changed when you have like a git commit hook. But other than the one exception of prettier, we pretty much have the toolchain be self-contained for each individual folder.
CHARLES: I thought one of the selling points also was incremental building and shaving build times off of that. So how can you do that by having each kind of package contained inside the repo being responsible for managing its own build?
PETE: Well, the toolchain is shared, right? So, the first thing that you would do in Rush is make a toolchain project, a project that actually is the code that compiles things. And we usually put shared configuration in there, like your tsconfig for web applications or your tsconfig for Node applications. With ESLint, I went through a whole thing when we switched from TSLint to ESLint, where ESLint actually doesn't have the ability to load plugins from a shared configuration folder. It tries to load the plugins from the actual project folder. So, we have this work around that we made and we have an open PR, we're trying to get ESLint to have better support for monorepos. But basically, you want to move, you want each project to just have a dependency on, we actually call the rig. So, the idea is the tool chain is like this shared code that is reusable scripts for invoking the TypeScript compiler or web pack or whatever. But then the rig is like a flavor. So, you would have a rig for a web library or a rig for a web application or for a tooling application and things like that. So, these stereotypical kinds of projects usually get a specific, like a toolchain package that has all that stuff rolled up in one bundle. So, you just kind of will have as little copying and pasting of boilerplate between projects as possible.
CHARLES: Let's say I had some rig then for compiling TypeScript npm packages. How would I, for example, I don't know, use the same cache to the TypeScript assets or keep like a TypeScript compiler hot, so that, I don't know, like the syntax tree parse or something, like this is one of things I noticed in the build is like TypeScript takes a long time to compile.
PETE: That's true.
CHARLES: And if there was some way that when a tiny source file changes, you could run a command and it would just compile that one file and put it in the disc directory is that just I need to do that myself, and I can benefit from having this single rig that's in a single place so that any project that's going to be using that can benefit from it, which I think is good. But is there also, does Rush provide some sort of solution out of the box or sort of maybe some building blocks so I can build that type of solution out of the box?
PETE: Rush doesn't. And for a long time, Rush Stack didn't either, because we're using Gulp and we really -- Gulp as a mission, is kind of an older, less popular solution now. And the whole approach that we took with Gulp, we regretted and we kind of wanted to rethink for many years, but it worked and it was like a component of our setup that worked reliably. So, we just were using the same Gulp toolchain for years and years. I'm now at HBO, but recently now Microsoft has been developing a new modernized replacement for that that's called Heft, and they're in the process of moving that code in -- they had done an internal prototype of it. And now they just opened the first PR this week to start releasing that as open source. But that would be like a toolchain that is more modern, that actually gives you an out-of-the-box way to invoke the TypeScript compiler. You can also do it in a simple way, like the TSDoc project is a small amount of repo with like four or five projects that just uses barebones scripts that just invoke TypeScript and then invoke lint in one folder that invokes web pack. If you were looking for an example of the most minimal way to use Rush to build something without like a shared toolchain, that's like an extreme example.
But to come back to your -- you're also asking an architectural question, right? How do you do efficient TypeScript compilation while still treating each individual project as being a self-contained process, which is also important when you start sharding builds across machines because -- TypeScript compiler recently introduced a feature where it also kind of acts as a built orchestrator. You register all the projects in your monorepo in the centralized tsconfig file, and then TypeScript will go and figure out the order to build things and watch everything in your repo. And it's really cool technology. But I'm not sure that it would actually work for our scenario, because when you have hundreds of projects and some of those projects use different versions of the TypeScript compiler, and you have other tasks like pre-processors or post- processors that run before and after the TypeScript compiler step, that's been a constant thought in the back of my mind is how would you use this feature in the TypeScript compiler? So, the approach that we've been taking is just making the builds in the individual folders as efficient as possible. So as I said, Rush won't even spin up the tool chain if it can see that the source files haven't changed since the last time it was run. It has like an efficient hash for each of the source files [inaudible] changed it or not.
But within that, if we do decide we're going to run the compiler, the latest TypeScript compiler has on-disk caching. So, you start to spin up the Node process, but the actual state from before is not entirely lost. And we've talked about -- for the longest time, I've been wanting to do this cool optimization where we would have a service. So, when we use like VScode, for example, there's this language service that runs in the background. It's basically a long running TypeScript server, and then you hand source files to it and then it gives you back like an analysis of them.
SHANE: Just one thing I wanted to ask, is it maybe a bit further back before the TypeScript toolings happened. But do you think Rush has a certain limit on how big of a project it should be used on because it was designed for big projects and it's really focused on that. Would you suggest like a project of three people, five packages, or do you think there's a limit to when it really starts to come into it all?
PETE: You're saying a minimum threshold where it wouldn't be worth using Rush for something smaller than a certain scale?
SHANE: Right, yeah. Does that exist or is it good for everybody?
PETE: I mean, I use it for small projects, but maybe it's just because I'm into Rush and like using it. It's just I like using familiar technology. We have made it somewhat easy to set up. So, it's not like anywhere near as intimidating as it was in the past just to get off the ground with it. But its focus, I would say, is for larger scenarios, I guess, like where build times matter. And there's some benefits you get from breaking the rules in a small project. I'd say if, for example, if you only have five projects, it might not even make sense to make them into separate projects. Maybe it would be easier to initially develop them as a monolith.
SHANE: Right. Okay, that makes sense. Are you able to pull out any of the main pieces of Rush to use in like a non-Rush project? Like the way you look for phantom dependencies or the incremental builds? Can you pull those out as a library and maybe use part of that in a non-Rush project or is it mostly use it altogether as a framework?
PETE: The incremental builds rely on this package-deps-hash library that we maintain, so that part would be usable in isolation, like the command line parser for Rush is -- a lot of Rush Stack is actually libraries that Rush itself uses. So those pieces are reusable. Another cool feature of Rush is when you're doing a bunch of parallel builds, it doesn't mix together. It shows you real-time output of the logs as your builds are running. But it does not mix together the output from different sub-processes. So, for example, it will pick whatever the first thing is it started and show you the real-time output from that project altogether as one contiguous thing. And then when that project completes, it will take any other projects that build in the background and dump their complete output and then pick the next thing that's still running and give you real time output from that. So, it gives you like a real time feed of what's happening as your projects are building. But without intermixing logging lines from different projects and having to put prefixes on the lines. So that technology, again, is in a package that you could use for some other project that's not built into Rush.
CHARLES: What about the inverse case where we've, let's say, hypothetically adopted Rush? Can you incrementally adopt the individual capabilities of Rush as you're using it? So you alluded to a publishing capability and a deployment capability. I'm assuming if I want to use Rush to build my packages, I can still use my old mechanism to publish them.
PETE: As long as it understands -- the Rush projects really are modeled as standalone npm projects. And subject to some issues with versioning, you could actually just go into a random folder and run npm install and npm run, build, or whatever, in that folder and it should more or less work. The main reason we did that was actually to facilitate moving projects between monorepos. So, when you have multiple monorepos, you do reorganization or trying to consolidate things. So, we really wanted projects not to be intertwined at all with anything else in their monorepo and to really act as self-contained units that could be moved around easily. So that, in some ways, makes it easier for other tools to interoperate.
There are some limitations to that. Like a lot of tools, for example, try to walk up to the root of the monorepo and find a package JSON file there and expect to find Node modules there and stuff. That stuff doesn't work.
CHARLES: And it sounds like you're saying that that's kind of a bad idea.
PETE: We think so. For example, if you put a Node modules folder in the root of your monorepo, that introduces a phantom dependency problem because Node.js can resolve things from there. So, all of a sudden, you once again can import things that weren't declared in your local project because they've been accidentally hoisted to the root of the repo.
CHARLES: Yeah, I find that happens to me all the time. I forget to include a dependency. And so, when I'm running my tests or I'm writing applications that reside inside the monorepo, it's fine and everything works. And then the minute someone installs the package from npm into a context that is not the monorepo at all, everything breaks because that package isn't there, because I just forgot to declare it.
PETE: And the original team that built Rush was shipping STK to paid enterprise customers. So those breaks were really, they were like paid escalations or incidents when that happened. API extractor is something we could talk about another day. But it's also all about carefully controlling contracts for packages and making sure that you never accidentally break something.
SHANE: That phantom dependency is actually one of the biggest issues I found working with Yarn workspaces. When you're working with a team that might not know the ins and outs of monorepos, accidentally installing something globally, forgetting to add it to your package dependencies, having everything work, all your tests working, you publish and everything breaks, it's probably a thing that I've seen the most as far as what can break a package. I've been working on a product like my own just to solve that. It's like one through dependencies and find out are you asking for something that you aren't including in your package dependency, because it's so common.
PETE: I believe there are lint rules. I think there's some lint rules that might check for that. There's some options. The most insidious form of it actually isn't that it's broken and it can't find the module. It's that it finds an unconstrained version of it. So, it will sort of seem to work. And we would get these cases where for 90% of the customers or people who are using the consumers, let's say, who are using the package, it does work because their tree ends up in the same format as ours. But then there's like 5% of people who actually have side by side versions or certain more complicated things happening in their case. And for them, it doesn't work. And then they're like, for opening issues, saying, "When I saw your package, I get this really weird error," because it's basically getting the wrong version of one of its dependencies somewhere deep in the tree. And we're like, "Well, done repo for me. It doesn't seem to repo for anybody else. Must be something with your computer. Did you try turning it off and back on again?" And it takes a long time to finally fix it.
I'm a very analytical person who fixates on obscure technical things that might be a problem. For me, I've always been on this bandwagon, but it's one of the hardest. npm versioning is one of the hardest things that we've dealt with in terms of messaging, because people assume that the default model works fine, because it works okay for them. And they assume that the problem is fundamentally simple, because if everybody just follows semver, then everything will be fine. And to explain why like phantom dependencies and hoisting are problematic, it's very hard to explain it because they're like, "Yeah, yeah." I hear you saying all this crazy technical stuff and drawing diagrams and stuff, but you're not understanding. You just need to follow the semver. I've never had any problems with it.
CHARLES: Yeah. It's definitely not that simple.
All right. Well, thank you so much, Pete and Shane, for coming by today. We will post, of course, the links to Rush.js where you can read up on all the stuff that we've been talking about, on monorepos, on doppelgangers, and phantom dependencies and just npm versioning in general on the website. We can continue the conversation on Twitter. Thanks so much for coming by. I've definitely had my eyes opened in the last year, I would say, to working with monorepos. It does come with its challenges. But overall, I think it's a net positive in terms of the problems they solve. So, go check out Rush. Go work with monorepos and be happy. Bye, everybody.
Thank you for listening. If you or someone you know has something to say about building user interfaces that simply must be heard, please get in touch with us. We can be found on Twitter at @thefrontside or over just plain old email at firstname.lastname@example.org. Thanks, and see you next time.
Please join us in these conversations! If you or someone you know would be a perfect guest, please get in touch with us at email@example.com. Our goal is to get people thinking on the platform level which includes tooling, internalization, state management, routing, upgrade, and the data layer.
This show was produced by Mandy Moore, aka @therubyrep of DevReps, LLC.