> Bazel is designed to be optimal and has perfect knowledge of the state of the world, yet it’s too slow for quick operations.
This is one of the biggest challenges where Bazel falls short of non-Bazel tooling for us in the Web development/Node.js world. The Node ecosystem has characteristics that push Bazel scaling (node_modules is 100k+ files and 1GB+), and Bazel's insistence on getting builds correct/reproducible is in a way its own enemy. Bazel needs to set up 100s of thousands of file watchers to be correct, but the Node.js ecosystem's "let's just assume that node_modules hasn't changed" is good enough most of the time. For us, many non-Bazel inner devloop steps inflate from <5s to >60s after Bazel overhead, even after significant infra tuning.
It's not unique to Bazel. Nix also struggles with Node.js. I'm not too fond of either solution, but to me, the problem lies in the Node ecosystem, and it seems unlikely a "language-agnostic" tool will ever be able to crack that nut.
Node also struggles with Node.js, and that's including its shortcuts that kill reproducibility. A node.js sync from an empty cache is by far the longest part of any build process I maintain. It's still a long step with a full cache and without doing anything.
I’ve worked on two 100+ weekly committer monoliths and two similar sized MFE architectures. I think the article hits good points though I’d add some acutely painful ones it misses[1]. I’m someone who is by gut now squarely in the pro-monolith camp, but I think that in the comment thread of a similarity anti-MFE article it’s worth steel manning the pro-MFE arguments rather than characterize it simplistically as cargo-culting or resume-boosting.
First, MFEs solve organizational issues by cheaply offering release independence. An example is when teams that do not overlap in working hours. Triaging and resolving binary release blockers is hard to do correctly and onerous on oncall WLB. Another example is when new products want to move quickly without triggering global binary rollbacks or forcing too fast a release cadence for mature products with lower tolerance for outages or older test pyramids.
Second, MFEs are a pragmatic choice because they can proceed independently from the mono/microrepo decision and any modularization investment, both of which are more costly by several multiples in the cases I've seen. Most infra teams are not ivory towers and MFEs are high bang-for-buck.
Finally, MFEs are a tool to solve fundamental scaling issues with continuous development. At a certain level of commits, race conditions cause bugs or build breakages or test failures, and flaky tests cause inability to (cheaply) certify last known good commit at cut time. You can greatly push out both of these scaling limits with good feature flag/health-mediated releases and mature CI, but having an additional tool in the kit allows you to pick which to invest in based on ROI.
Advocating for modularity is nice but I've never met an MFE advocate who didn't also want a more tree-shakeable modular codebase. We should not jump to the conclusion the MFE as bad or a "last resort" because there exists another solution that better solves an partially overlapping set of problems, especially if the other solution doesn't solve many problems that MFEs do or requires significant more work to solve them.
[1] Runtime JS isolation (e.g. of globals set by third party libraries) is hard and existing methods are leaky abstractions like module federation or require significant infra work like iframing with DOM shims. CSS encapsulation is very hard on complex systems, and workarounds like shadow DOM have a11y and library/tooling interop issues. Runtime state sharing (so not every MFE makes its own fetch/subscriptions for common data) is hard and prone to binary skew bugs. Runtime dynamic linking of shared common code is hard to reason around and static linking of common code can result in the same transition of a lazy loaded module to go from taking 10kB to 1MB+ over the wire.
Setting aside the philosophical free market questions, additional government subsidy childcare seems fine. There’s a positive externality of readily available childcare: one adult to many children is more efficient than home care, and parents being able to work generates economic production, taxes, etc.
But couldn’t we come up with a subsidy with less market distortion? A specific subsidy to pay for childcare for childcare workers will just cause the market clearing wage to childcare workers to drop to the point that only people with high childcare costs will work in the field. This labor pool is much smaller and people will tend to cycle in and out as their children grow to school age. At the end of the day society will end up paying a lot more due to lower liquidity than with a plain flat subsidy.
It’s like loan forgiveness for federal workers. Sounds lovely but it just ends up breaking the market for example further subsidizing already wasteful higher education spending.
> Universal childcare. Alternatively, payment for children
These are the same, just different execution. A slot for your child or children if you want to go to work, a payment if you want to stay at home to provide childcare. The care is universal, the delta being who logistically is providing the care.
Edit: Agree with y'all it is both a marketing and execution story.
One is creating defined resources and using bureaucracy to divvy them out and force people into buckets they might not be happy with. e.g. here's your government cheese.
The other empowers individuals to make their own decisions with cash and lets the free market figure out how best to mine those dollars from the individuals receiving them.
I suspect which option a person thinks is right largely depends on how they view humanity. Either you want to trust people to make their own decisions as adults, suffering the consequences of their decisions, or you want to extend childhood deep into "adulthood" making sure they can't make the wrong choice, because the nanny state makes the choice for them.
I supposed in the case of managed welfare, only those who manage become independently wealthy get the privilege of making their own decisions.
I oppose UBI for unrelated reasons: giving everyone money (if it is actually a life-changing amount of money) is unaffordable. It's hugely wasteful to give money to wealthy people.
Instead, I would prefer a system where people receive payments relative to income, like a reverse tax system. This way, people in higher income brackets pay in (in the form of taxes) and people in lower income brackets get a pay out.
This is far more affordable than UBI, meaning we can actually do it, and also puts money where it is most needed.
That kind of taxation would just reduce upward mobility in the workplace though, wouldn’t it?
I wouldn’t accept a job role with more responsibility if it meant losing my low-income subsidies and being taxed instead.
Society needs workers to accept more responsibilities and progress in their careers otherwise there will be less creation of jobs for those at the entry level.
I’m not saying UBI is the right solution, but it would reduce the “cliffs” where people lose money for progressing in their careers.
Those two system are the same - or rather - can be the same depending on how you tweak the parameters.
Wealthy people will pay much more in taxes than what they receive in UBI and you can make this system have the exact same distribution of money as the negative income tax system.
Sure. But the branding matters. The former seems to provoke a vitriolic reaction in some people that, if widespread, could tank it in a way that the identical effect under the second's branding doesn't provoke.
Don't worry, from experience I can assure you people will have the same reaction to the latter as well. The typical argument is that "the poors" will have tons children just to cash out and we can't have that can we.
I believe (anecdotally) that the rate of unwanted pregnancies are not impacted significantly by economic measures, only by ease of access to free birth control measures and widespread sex education, but it's an unpopular opinion. Welfare from the state is always a hard sell unless it benefits a vocal, influential segment.
Universal childcare seems to me significantly less distorting than universal childcare for childcare workers. It’s larger subsidy yes but it corrects for an existing distortion (parents pay full price for childcare to raise children who then end up paying taxes to the state), and it does so in a way that it doesn’t break one specific labor market.
We do not do parents who aren’t already childcare workers a favor by skewing the market for their labor. For some parents, working in childcare is the right choice, but for many it won’t be, and when you artificially inflate short term wages only if they go into childcare, everyone loses.
But the entire childcare market is run on such tight margins that they all require a model of having a waiting list for children, meaning the market is massively underserved by design.
It would seem to me that this is only likely to result in more childcare workers, making it more accessible to all, no?
Not an economist, but I imagine that the job market for child care workers would be less distorted, even if the general service market for child care is more distorted.
In the US, it's a $2,000 tax credit per kid, so you effectively only get paid if your household income is high enough to owe federal income tax, and low enough that the 5% phase-out doesn't eat up the whole benefit. Also, it comes off of your taxable income, so the net benefit to your pocket is (1.0 - marginal_tax_rate) * benefit.
While technically a market distortion, it is incredibly well-aligned with the human considerations. The people most likely to need childcare, are in many ways those well situated to be involved with providing childcare.
The early child-rearing years are hard, and trying to work during this time is hard. The incentives to just add some extra childcare to your life is possibly less difficult than lots of options for a lot of people.
Childcare is always going to be pushing up or near pushing up against Baumol's cost disease constraints.
The only way to make it at all "productive" is cheap dense housing, something that America currently sucks terribly at.
Additionally, the fact that it is a gazillion private providers rather than a simple public service like schools makes it messier.
My prediction is that eventually the housing stuff will be figured out, and also this stuff will be increasingly rolled into the public school system, and both of those will relieve the pressure.
Just be clear, the rise of childcare and restaurants are very clear evidence of the shifting boundary between the private and public spheres. But that shift cannot continue if it just hits more and more commute time constraints. It is like trying to do a faster integrated circuit without shrinking the scale.
Just as we bring the compute to the data, so we bring production closer to the home with mixed things, etc.
And in generally, people don't get what density can offer. For a silly example example, I've bought the clothes I will wear at an event while traveling to that event. Our suburban culture is not keeping up the material possibilities.
When reading expressions like "market distortion" in the context of a topic like child care, I am a bit astonished.
Children are our future, they will work when we retire. Investment pays off. In relation, I find the balance of how the subsidy is spread across the labour market is of a secondary order.
Please don't take this personally. I fairly wonder if the way of looking at these topics is a cultural or political thing.
> There’s a positive externality of readily available childcare: one adult to many children is more efficient than home care,
An externality [1] happens when there is some additional uncaptured aspect of a transaction, a "missing market". It seems implausible that there is a missing market in child care-- there are plenty of choices out there, different service levels and pricing. And when you put your child into child care and are now freed of child care responsibility for working hours, you can directly participate in the labor market, right?
Instead of externality maybe you could support intervention instead by arguing that there are facets of the child-care market that you don't like, resulting in the observed market outcome, such as capital overheads, barriers to entry, transportation barriers, or regulation.
At this point in the political universe these are the only measures that get through. We are living in a post policy era of governance. Hopefully the wonks can find some way to get the upper hand again, but I’m not optimistic.
No one in power is trying to design and put in place comprehensive plans to solve any particular problem. It’s all isolated one-offs, mostly involving dumping money instead of regulation, intended to improve a politician’s or party’s political standing.
The ACA and Dodd-Frank, whatever you think of them, were the last counter examples. That was more than a dozen years ago.
The increasingly accessible and increasingly rapid outrage/feedback loop brought to us by 24/7 cable news, social media, etc. has made it harder to push forward on meaningful complicated legislation.
Something like the Voting Rights Act, Medicare, Social Security, or interstate highways would, I suspect, be strangled in the crib today by vested interests spreading misinformation about them.
On the plus side that marketing clearing will result in those with the most child care experience working in child care, which sounds nice.
That said it’s somewhat amusing to me to pay for someone’s kids to be watched so others can pay said someone to watch their kids instead.
Single income families are considerably more efficient in real terms, but child care only shows up as GDP and taxable economic activity if a stranger does it.
It reminds me of that joke about the two economists in the woods.
I'm a bit bias since I don't pay for childcare but I don't have an issue with childcare being expensive as long as the profit is going directly to the workers.
I have a hard time understanding why we don't want talented and ambitious people to raise our children.
If you educated yourself about the problem you'd understand that childcare workers are usually making very little money while the businesses are taking the profit. Clean up your drink
This is America. It doesn't matter whether a policy if objectively good or bad. It has to have some farcical fig leaf justification beyond making the world better. Making childcare free for all people would pay for itself in increased productivity, improve the financial security of poor families, and would lead to massive improvements in the wellbeing of poor children. But the effects in the real world don't matter, because in La-La Land it's communist and that means it's Stalinist and that means it's Satanist and that means they're going to give your gas stove a sex change and put 5G in your hydroxychloroquine.
American politics are stupid. This shouldn't be a surprise to you.
Childcare is a market, the same way bread is. Like bread, family might provide it out of their own time and money, but otherwise, you have to pay someone.
Doing what is best for the children and parents is a market intervention, and ignoring how markets work when coming up with market interventions has a pretty bad track record.
Markets don't exist in a vacuum, they can involve the public sector: primary education is mainly provided through public funds and institutions, but it remains a market nonetheless. Throwing out what we know about market dynamics isn't a good way to do what's best for children and parents.
Of course policing is a market, what on earth makes you think it isn't?
State funded, yes, but police departments are in a labor competition with other police forces, private security, and anything else a cop could be doing instead. If a department wants more police, or better ones, they have to pay for that.
The idea that the market is more or less important than the upbringing of a child is nonsensical, I have no idea what you mean by that, it's like asking if acceleration is more important than luminance.
I’ve noticed there are two kinds of policy advocates.
Agenda advocates: no matter the problem, they push their agenda. Anti tax people are in this category. Down economy? “Lower taxes!” They shout. Up exonomy? “Lower taxes to keep the momentum!”
Issue advocate: these people are creative and will propose the best solution given the day’s issue. The solutions they propose change over the years because conditions change and new solutions appear.
The comment you rape replying to is from the former. The world needs more of the latter.
> The comment you rape replying to is from the former. The world needs more of the latter.
Well that's a hell of a typo :P
I think the worst example of this I've seen on this site is when someone was advocating removing safety requirements from manufacturing plants because the rate hike in insurance would take care of making sure these companies were giving safe work conditions to their employees.
You know the people who advocate for that do so because they never expect anyone they care about to be working a manufacturing job.
Well if we can't have that then we the line definitely should be drawn at the other extreme where we just don't give a shit about parents spending time with children.
The best for children and parents would probably be that none of the parents have to work at all until the child is some age when they should go to school. But if we implemented that system, there would be a labor shortage, which would reduce the number of teachers, plumbers, mechanics, accountants, software engineers, etc, so the rest of society would suffer. There is no magic wand we can wave to solve all our problems without any tradeoffs. Money distributes resources in a capitalist system, and there are many problems and distortions. If you built a command economy with no money you would still have to make choices about these tradeoffs.
I sympathize with the underlying thought, but I think that it is worth using this as a model of when we should use the tool of socialism and when we should use the tool of capitalism... what should be the balance?
In this case the fundamental problem seems to be that not enough people want to work in childcare, for the money that those businesses are currently offering. Part of that problem is for some of the potential workers the barrier is that paying for childcare for their own children would cost enough to make working in the industry not worth the money for them.
So the two solutions proposed here are: make childcare free for those who provide childcare, and making childcare free for everyone. Both of these solutions solve the problem for potential workers whose main problem is the affordability of their own childcare.
Notably the "free for everyone" does not actually improve solving the one problem we are talking about: it does not unblock anyone else to enable more workers in this one industry. But conversely it would unblock those people from working in other industries, ones that already seem to pay more. So it would probably wind up with less people working in childcare than the more focused "childcare free for childcare workers".
The counter-point to this of course is that we are creating a huge market distortion, essentially "forcing" people into childcare work (not really, but...).
The pure capitalistic solution would be to leave it like it is: let the market decide how much childcare is worth, and that will sort itself out. The problem with this approach is that it winds up producing a less-than-optimal solution: people who would be more productive for society wind up at home doing individual child care, and there are vastly different outcomes for children of well-off families than those of the poor (so societal imbalance based on the birth lottery).
To me the only right solution is for some sort of wage stipend from the government (from tax monies) that goes directly to childcare workers. This would absolutely be the government putting its thumb on the scale to increase the supply of childcare workers, but form the government's perspective it is probably a good investment both to get more workers available in all categories, and to improve outcomes for the children of low-wage families (good both in a floats-all-boats perspective, and a social justice one).
The problem is, of course, that it is absolutely a socialist means of improving things, and those who have made capitalism a religion are going to go nuts about that.
"When all you have is a hammer, everything looks like a nail".
I've seen some pretty heinous opinions on this site, things such as safety regulations shouldn't exist just let the market decide. The companies insurance premiums will raise and so companies will naturally want to be safer!
As if that's more important than _preventing_ the loss of limbs.
I'm glad to see Unity finally choosing to play to its strengths here from an economic standpoint by offering rev share. It's an insurance business model, which a game engine is uniquely able to offer due to its horizontal reach.
There's big financial risk inherent in game development for which studios naturally would "pay" some sum to hedge against, in the format of trading away some upside in the success case for a lower cost in the failure case. In fact, risk aversion, which at least in aggregate often models much real world behavior, dictates that studios would be willing to pay generously, entering into a deal that yields negative EV in exchange for flattening the risk curve. On the other hand, underwriting the risk on Unity's part is basically risk-free because of its horizontal reach across studios. Because of the asymmetrical risk, there's considerable economic surplus to be captured in a way that leaves both parties better off.
Of course all of this only works out if Unity only sufficiently benefits from the big successes. To that end, while generous, the choice to let developers pay the lower of rev share vs. per-install fee seems perplexing to me. When customers can pick after they already know whether their game is successful, Unity fails to set up an effective insurance business and ultimately will lose out on the surplus. The winners will no longer automatically subsidize the losers, Unity may have to raise the costs of both deals to cover its operations, and this all just becomes a more convoluted price increase.
Berkson's paradox[1] is a useful lens to analyze inverse correlation. The pool of people doing interview prep are either smart or determined (to accumulate certifications) or both, but dumb and lazy people don't enter the pool, and an inverse correlation applies. Similarly, if an interview bar evaluates candidates though a combination of soft and hard skills and rejects those who lack both, even if soft and hard skills are independently distributed (or positively correlated with insufficient strength), soft and hard skills will appear to be inversely correlated.
Certs could be neutrally or even positively correlated with interview performance, but by pre-filtering the population, the opposite phenomenon arises.
Can you summarize Moon's sandboxing approach? I understand for phantom deps you want to delegate to pnp/pnpm. But how do you handle sources not declared explicitly as inputs to a build graph node?
If I have package A only list its own sources as inputs, how do you prevent the Node runtime from happily doing require('../B')? If you don't, how do you prevent differences in the state of B from poisoning the remote cache?
General purpose build systems need to make API expressiveness, observability, performance, and correctness tradeoffs in exchange for their platform breadth.
By narrowing breadth, most often by targeting only one ecosystem[1] or making correctness tradeoffs[2], new build systems can be way more expressive and performant. For now.
They’re also fun to write because DAGs are parallelism are cool problems to think about.
[1] e.g. only targeting one language allows you to use that language instead of a generalized DSL to describe your build graph
[2] e.g. abandoning sandboxing altogether and using manual invalidation or just trusting the user
Well done and congrats to the Turborepo team on the launch as well as the Vercel merger, which I think is a great thing for the JS ecosystem!
We now have a healthy competition between several JS domain-specific build tools bit.dev, Nx, Turborepo, and Rush. This is in addition to plugins to general purpose monorepo tooling like rules_nodejs (Bazel). I'm looking forward to the seeing the new ideas that come out from the community as this field matures.
However, I feel a bit sad at the state of general purpose build tools (Bazel/Pants/Buck, make, GitHub Actions, Maven, Nix) or cloud-based IDEs (Cloud9, Codespaces). These tools come off as too complex to operate and build on top of such that we, the JS community, seem to be choosing to build JS-specific tooling from scratch instead. There are a huge number of mostly non-JS-specific problems that monorepo tooling eventually needs to solve: distributed build artifact and test result caching, distributed action execution, sandboxing, resource management and queuing, observability, and integration with other CI tools to name a few. I wish somehow we could reorganize around a smaller set of primitives instead of what appears to be reinventing the wheel.
Regardless, I think all of this effort and attention has lent credence to the monorepo thesis, and I'm very excited to see what's next.
> However, I feel a bit sad at the state of general purpose build tools (Bazel/Pants/Buck, make, GitHub Actions, Maven, Nix) or cloud-based IDEs (Cloud9, Codespaces). These tools come off as too complex to operate and build on top of...
I definitely agree, although I've found Please (please.build) to potentially be a solution in this area. It is a lot simpler, smaller, and more modern than Buck and Bazel, but shares similar properties as to be familiar (I.e. the buildfile syntax). I think it is supposed to be easier to extend with other languages, but I haven't tries that myself.
> There are a huge number of mostly non-JS-specific problems that monorepo tooling eventually needs to solve: distributed build artifact and test result caching, distributed action execution, sandboxing, resource management and queuing, observability, and integration with other CI tools to name a few.
Turborepo author/founder here....
I agree. I built Turborepo because existing tools weren’t meeting our needs.
To solve these problems and still be flexible, many existing build tools end up with lots of configuration bloat. We’re trying to avoid that. We want to reimagine the developer experience of monorepo tooling and make it accessible for everyone.
Hey. Off topic but would you consider handing Formik and TSDX to a community member? Huge appreciation for turborepo and your work on these other libraries -- they have become key parts of the ecosystem and turborepo could follow.
However, this is all the more reason why it could be super helpful to address the governance issue on those projects. Thanks and sorry for disturbing!
Yes, it's possible to use Bazel w/ JS tooling. At Uber we run a 1000+ package monorepo w/ Bazel and Yarn. Someone else mentioned rules_nodejs if you want to go with the popular option that is more or less in line with the Bazel philosophy[0]. We use a project called jazelle[1] that makes some different trade-offs that lets us do some interesting things like not busting the whole world's cache when lockfile changes, and source code generation (including automatic syncing of BUILD files to package.json)
> Is Bazel designed in a way that make it impossible to do JS monorepos well?
Not impossible, but you really need to go all in with it and follow its conventions and practices. See this for the main docs: https://github.com/bazelbuild/rules_nodejs
One thing in particular that doesn't work well in the bazel world is doing your own stuff outside its BUILD.bazel files. If you're used to just npm install and jam some code in your package.json scripts... that doesn't usually work in the bazel world. If you have a lot of logic or tools in your build you'll likely need to go all in and make bazel starlark rules or macros that recreate that logic. Nothing is impossible, but expect to spend time getting up to speed and getting things working the bazel way.
> Is it possible to integrate Turborepo with general-purpose monorepo build tools? Bazel, in particular?
It's definitely possible, but I think the practical limitations would make it too complex to reason around and maintain. You'd end up creating two separate and overlapping systems to declare dependency graphs and input sources and manage caching and execution.
I haven't yet seen a case where the two are actually interleaved. Currently at Databricks, we use Bazel to provide the correctness guarantees and interop needed for CI, and we use JS-specific tooling (non-Bazel) locally to meet our performance needs, where the usage profile is different and where we're willing to make correctness tradeoffs.
> (Is Bazel designed in a way that make it impossible to do JS monorepos well?)
There are limitations in Bazel that don't play nicely with modern JS conventions. For example, Bazel's standard sandbox is based on symlink farms, and Node.js and the ecosystem by default follow symlinks[1] to their real locations, effectively breaking sandboxing. A FUSE or custom filesystem (Google's version of Bazel takes advantage of one[2]) would be better but is not as portable. As another example, Bazel's action cache tries to watch or otherwise verify the integrity of every input file to an action, and when node_modules is 100k+ files, this gets expensive and is prone to non-determinism. Bazel does this for correctness, which is noble but results in practical performance problems. You need to do extra work to "trick" Bazel into not reading these 100k+ files each time.
The problems feel solvable to me, but not easily without adding yet more configuration options to Bazel. The influx of new JS-specific tooling is a reset to this, building the minimum viable set of functionality that the JS ecosystem specifically needs, without the burdens of being a general purpose build system.
I'm sad JS needs a build step. The best build system is none at all IMHO. I'd love to see native support everywhere for typescript or other things we typically depend on a build for today.
>I'm sad JS needs a build step. The best build system is none at all IMHO.
Good news: JS doesn't need a build step. Modern webdev wants a build step mostly because it wants Javascript to feel more like a "serious" language. There are technical benefits to compiling to Javascript, most of which can be served by other means, but the unnecessary complexity of the Javascript ecosystem is mostly about gatekeeping and aesthetics and i will die on that hill.
>I'd love to see native support everywhere for typescript or other things we typically depend on a build for today.
Typescript is part of the problem. You can literally just accept Javascript for what it is - dynamically typed - and write it like any other scripting language.
Typescript is a solution to the jank-fucking-tastic type munging JS does (see: [1] About 1:20 in) and the problems that ensue.... if all you are doing is just making menus appear/disappear on click, by all means keep to JS.
The build steps/tooling are useful when you want to build actual applications rather than decorate a marketing page, and also when you need to support legacy browsers, being able to work with modern sensibilities and get code that'll work in IE11 is a blessing.
> the unnecessary complexity of the Javascript ecosystem is mostly about gatekeeping and aesthetics and i will die on that hill.
JS has much more of a "flavour of the week" problem than more mature ecosystems like PHP, I put that down to a relatively poor stdlib by comparison, rather than aesthetics or gatekeeping.
I'm with you in that a lot of TS use seems dogmatic or ritualistic today. I have a strong feeling in the near future we're going to see the bow string snap back and simple zero build, basic dynamic use of pure JS comes back in vogue.
TS is about readability and maintainability of code at scale. Having types helps immensely in understanding a new codebase and working in a large codebase with many other people.
The problem is Javascript has been entirely commoditized by enterprise, so solutions which should only be relevant to "code at scale" have become mandatory at any scale. You practically can't distribute a Javascript library without writing it in Typescript and submitting it to NPM, with the build step expected.
Yet writing even complex javascript without Typescript and having it work is still entirely possible, just as it is with other weakly typed languages. There should still be room for that, but the concept of simply writing javascript has become so alien it needs to be reintroduced as "vanilla javascript."
Summarizing the 3 major JS package management approaches:
* Classic node_modules: Dependencies of dependencies that can't be satisfied by a shared hoisted version are nested as true copies (OSes may apply copy-on-write semantics on top of this, but from FS perspective, these are real files). Uses standard Node.js node_modules resolution [1].
* pnpm: ~1 real copy of each dependency version, and packages use symlinks in node_modules to point to their deps. Also uses standard resolution. Requires some compatibility work for packages that wrongly refer to transitive dependencies or to peer dependencies.
* pnp[2]: 1 real copy of each dependency version, but it's a zip file with Node.js and related ecosystem packages patched to read from zips and traverse dependency edges using a sort of import map. In addition to the compatibility work required from pnpm, this further requires compatibility work around the zip "filesystem" indirection.
In our real-world codebase, where we've done a modest but not exhaustive amount of package deduplication, pnpm confers around a 30% disk utilization savings, and pnp around a 80% savings.
Interestingly, the innovations on top of classic node_modules are so compelling that the package managers that originally implemented pnpm and pnp (pnpm and Yarn, respectively) have implemented each others' linking strategies as optional configs [3][4]. If MacOS had better FUSE ergonomics, I'd be counting down the days for another linking strategy based on that too.
Node is doing the right thing: if two dependencies in maven have conflicting dependencies, maven just picks an arbitrary one as _the_ version, which results in running with an untested version of your dependency (the dependency is actually depending on a version the developers of that dependency didn’t specify). Because node allows the same dependency to be included multiple times, npm and friends can make sure that every dependency has the right version of its dependencies.
Node does a different thing. It can coalesce two different versions into one if the two things are within a certain semver range, but there's nothing that enforces whether things within a semver range are actually compatible. The most prominent example is Typescript, which famously does not follow semver. Another notable example of how NPM itself does things wrong is that it considers anything in the `^0.x` range as compatible, whereas semver distinctly says the 0.x range is "anything goes".
Incompatible libs, you say? Try this one on: once upon a time a handful of years ago a package-lock.json I worked on drifted so far from package.json that you could not remove package-lock.json and rebuild purely from package.json. The versions specified in the package.json were incompatible with each other, but the package-lock.json had somehow locked itself to a certain permutation of versions that it somehow just worked.
I always shudder to think that different versions of packages live in node_modules and one library produces an object that somehow makes it to the other version of the library and... I'd rather not think of all these implications or I would go crazy.
I agree another the 0.x thing. The rest is basically a result of people refusing to use the versioning system the way it’s designed to be used, which is a problem with a package not with the specified behavior of npm here: violating the rules of semver is UB
I would definitely put part of the blame on the design of the system. It allows anyone to write stuff like `"lodash": "*"`, which is a perfectly valid range as far as semver goes. And then there's things like yarn resolutions, where a consumer can completely disregard what a library specifies as its dependencies and override that version with whatever version they want. And there's aliases (`"lodash": "npm:anotherpackage@whatever"`) and github shorthands and all sorts of other wonky obscure features. And we haven't even touched on supply chain vulns...
>> maven just picks an arbitrary one as _the_ version
No that’s never been the case. If you have conflicting versions of a dependency in your dependency graph, maven chooses the “nearest neighbour” version - it selects the version specified least far away from your project in the transitive dependencies graph.
Pinning a particular choice is easy too - you just declare the dependency and specify the version you want instead of relying on transitive deps.
This is what I mean by an arbitrary version: it’s not determined by the dependency but by some characteristic of the dependency tree. And, this is only necessary because the JVM can’t load two versions of the same dependency (ignoring tricks like the maven-shade-plugin)
The JVM can load the same class any number of times through different class loaders -> only the (class, classloader) tuple has to be unique.
I guess the reason they didn’t went the duplicative direction is that java has safe class loading semantics at runtime and due to valuing storage/memory capacity (which was frankly a sane choice, like 10x bigger java projects compile faster than a js project that pretty much just copies shit from one place to another?)
That’s kind of incredible that yarn pnp out performs pnpm. If that’s generally true across most projects then I’m really glad that turborepo decided to use it for project subslicing.
The practical disk usage difference between pnp and pnpm appears to be almost entirely accounted for by the fact that pnp zips almost all packages. Both store ~1 version of each package-version on disk; it's just that one's zipped and one's not. The mapping entries for package edges in both cases (.pnp.cjs for pnp and the symlink farms for pnpm) are very small in comparison.
Disk utilization is only one metric. The trade-off for Yarn PNP is that it incurs runtime startup cost. For us (~1000+ package monorepo), that can be a few seconds, which can be a problem if you're doing things like CLI tools.
Also, realistically, with a large enough repo, you will need to unplug things that mess w/ file watching or node-gyp/c++ packages, so there is some amount of duplication and fiddling required.
Problames long solved before, but problems that don't matter to the javascript crowd.. I think they actually love that things take so long. It makes them thing they're doing important work.. "We're compiling and initting"
Kudos to interviewing.io to share this analysis. I agree with the many issues in methodology and analysis that others have raised here, and I agree there's a risk that a face-value reading of the blog post is highly misleading. But this is true for all data, and poo-pooing the analysis without crediting the sharing just leads to less sharing. To be clear, I'm supportive of the criticism, but let's also give credit where it's due.
Technical interview performance is a high stakes field for which almost all data is cloistered in individual hiring companies or secretive gatekeepers. In my mind, all efforts, even imperfect ones, to share data is a great step here. We should encourage them to continue to share, including pursuing options to anonymize the raw data for open analysis. The field deserves more transparency and open discussion to push us all to be better.
This is one of the biggest challenges where Bazel falls short of non-Bazel tooling for us in the Web development/Node.js world. The Node ecosystem has characteristics that push Bazel scaling (node_modules is 100k+ files and 1GB+), and Bazel's insistence on getting builds correct/reproducible is in a way its own enemy. Bazel needs to set up 100s of thousands of file watchers to be correct, but the Node.js ecosystem's "let's just assume that node_modules hasn't changed" is good enough most of the time. For us, many non-Bazel inner devloop steps inflate from <5s to >60s after Bazel overhead, even after significant infra tuning.