I generally agree with you. In my experience, the infrastructure engineers are very often the best software engineers. Like there have been half a dozen times in the last year that an infra team at my company has essentially done the architecture design work and high level implementation for a complex system that the dev teams technically owned, but they kept stalling out on.
I think there are at least two reasons for this: the first is that infrastructure engineers (e.g., SREs) just think about reliability and architecture a lot more than SWEs and the second is that the infra eng position generally selects for people who have a penchant for learning new things (virtually every SRE was an SWE who raised their hand to dive into a complex and rapidly evolving infrastructure space).
Also, the best SWEs have been the ones who accepted the reality of infrastructure and learned about it so they could leverage its capabilities in their own systems. And because our org allows infra to leak to SWEs, the corollary is that SWEs are empowered to a high degree to leverage infrastructure features to improve their systems.
The problem is that our field has a deep seated problem with differentiating good engineering with over-engineering. The latter is taken as the former. In my experience SREs/devops will just as often over-engineer and leak their abstractions to everyone else due to paranoia and the current fad in the "cloud native computing" field. Suddenly people need to know about a dozen concepts to debug the travel of a http request. This was not needed before and mostly just a consequence of holes we've dug ourselves (like micro-services) and are now using to justify our over-engineering - all within the part that was previously seen the absolutely simplest one to manage: stateless app servers.
I mean, maybe? I won’t doubt that some people are prone to overkill or hype-driven engineering, but also most of the simplest solutions still look like Kubernetes (e.g., a cloud provider or a platform-as-a-service). The most severe, prevalent danger is people thinking these things are too complex and then trying to roll their own alternative on bare VMs by cobbling together disparate technologies which only the very best sysadmins can do passably (of course every other sysadmin thinks they are in this elite group) and even then the thing they build still leaks implementation details except now there is no market of engineers who are experienced with this bespoke alternative to Kubernetes nor is there a wealth of documentation available online.
> The most severe, prevalent danger is people thinking these things are too complex and then trying to roll their own alternative on bare VMs
No it's not. This was simply just not the case before k8s. Running a stateless app behind a (cloud) LB talking to a (cloud) DB has never been the hard part, and it's even easier today when something like Go is essentially just starting a single binary, or using docker for other languages. People seem to have forgotten how far so few components gets you. But incentives for too many people involved aligns towards increasing complexity.
I mean, yes, if you have a small org with one CRUD app on a handful of VMs, but if you want nice things like "lower environments that mirror production environments", "autoscaling", etc then yeah you pretty quickly run into the kind of complexity that justifies Kubernetes, a cloud provider, a PaaS, etc. Essentially, the problems start coming when you have to manage more than a handful of hosts, and you pretty quickly find yourself reinventing Kubernetes, but you're cobbling it together from a bunch of different tools developed by different organizations with different philosophies using different configuration schemes and your team (including everyone you hope to hire) has to understand them all.
> if you have a small org with one CRUD app on a handful of VMs
I think most would be surprised how big chunk of modern apps are within that space without active intervention by stuff like microservices. No need to stop at a handful of VMs though, although I imagine that most companies could easily be covered by a few chunky VMs today.
And yes, if you're a PaaS, multi-tenancy something something, then sure, that sounds more like a suitable target audience of a generic platform factory.
I don't know. If you have more than a few hosts, it already seems like you already need some configuration management setup so you can reproducibly set up your boxes the same way every time--I certainly wouldn't want to work anywhere with more than a couple of production systems that have humans SSH-ing into them to make changes.
And if that's the territory you're in, you need probably need to set up log aggregation, monitoring, certificate management, secrets management, process management, disk backups, network firewall rules, load balancing, dns, reverse proxy, and probably a dozen other things, all of which are either readily available in popular Kubernetes distributions or else added by applying a manifest or installing a helm chart.
I don't doubt that there are a lot of systems that are running monoliths on a handful of machines, but I doubt they have many development teams which are decoupled from their ops team such that the former can deploy frequently (read: daily or semiweekly) and if they are, I'm guessing it's because their ops team built something of comparable complexity to k8s.
No one changed VMs manually over SSH beyond perhaps deep debugging. Yes, creating a VM image might be needed, it might not, depending on approach [1]. Most of those things you listed are primarily an issue after you've already dug a hole with microservices that you then need to fill. VMs are still a managed solution. I'm not sure where people have gotten the idea that k8s is somehow easier and not requiring significant continuous investment in training and expertise by both devs and ops. It's also a big assumption that your use-case fits within all of these off-the-shelf components rather than having to adapt these sightly, or account for various caveats, that then instantly requires additional k8s expertise. Not to even mention the knowledge required to actually debug a deployment with a lot of moving parts and generic abstraction layers.
One downside I also see compared to the "old-school" approach, albeit maybe an indirect one, is that it's also a very leaky abstraction that makes the environment setup phase stick around seemingly in perpetuity for everyone rather than being encapsulated away by a subset of people with that expertise. No normal backend/frontend dev needed to know what particular linux distro or whatever the VMs were running or similar infra details, just focus on code, the env was set up months ago and is none of your concern now (and I know there's some devops idea that devs should be aware of this, but in practice it usually just results in placeholder stuff until actual full-system load testing can be done anyway). So a dev team working on a particular module of a monolith should be just as decoupled as with microservices. Finally, for stateless app servers, the maintenance required was much rarer than people seem to believe today.
I realize that a lot of this is still subjective and includes trade-offs but I really think the myth building that things was maintenance ridden and fragile earlier is far overblown.
I think there are at least two reasons for this: the first is that infrastructure engineers (e.g., SREs) just think about reliability and architecture a lot more than SWEs and the second is that the infra eng position generally selects for people who have a penchant for learning new things (virtually every SRE was an SWE who raised their hand to dive into a complex and rapidly evolving infrastructure space).
Also, the best SWEs have been the ones who accepted the reality of infrastructure and learned about it so they could leverage its capabilities in their own systems. And because our org allows infra to leak to SWEs, the corollary is that SWEs are empowered to a high degree to leverage infrastructure features to improve their systems.