Data-Oriented Architecture (2020)

bob1029 · on Nov 14, 2021

Focusing on the data and getting everything into 1 happy schema can eliminate problems you didn't even know you were allowed to disregard.

Success with these efforts is unlikely in larger teams. You need to put the most wizened chef in your data modeling kitchen and have them serve the business stakeholders until they are leaving 5/5 reviews on yelp. Design of schemas by committee is how you wind up with N of them and up to N^2 bullshit protocol/mapping/service layers for communicating between them.

You can iterate this problem space with Microsoft Excel. You do not need to write any code to figure out the most difficult part of most software products.

_se · on Nov 14, 2021

"You do not need to write any code to figure out the most difficult part of most software products"

Completely true and something that I wish more people understood and practiced! Most engineers (even seniors) want to immediately start cutting code. I think the saying is "Weeks of coding can save you hours of planning".

monkeybutton · on Nov 14, 2021

>You can iterate this problem space with Microsoft Excel.

I believe this! If you can format your data in excel in such a way that people can easily answer their questions with pivot tables and vlookups, you're most of the way there to a good schema. And it's a much lower cost to produce some test workbooks in excel for people than building and populating full databases using new/prototype services.

twodave · on Nov 14, 2021

I’ve seen multiple dev shops migrate from monolith to “microservices” without changing their data store, and the sad state that results is basically what this article is presenting as a positive. It’s anything but. It is incredibly difficult to reason about an application ecosystem using a shared data store. You’re basically just pushing all the complexity mentioned about microservices in the article into the database itself, except with fewer barriers. Any schema change now has to be verified against multiple (often an unknown number of) owners. Versioning is a much less fun problem in an RDBMS than in an HTTP application. Blurry data boundaries eventually cause enough confusion and poor decisions that you may as well be back to running a monolith since you end up having to scale all your pieces together anyway.

ratww · on Nov 14, 2021

I've also seen it happen more than often than I wish, but the issue in this case is more about skipping a common data access layer rather than usage of a shared data store. This would have been a problem in any system: even in a monolithic application, such lack of proper layer (and proper checks and verifications around data access) will always cause these issues.

zmmmmm · on Nov 14, 2021

I'm not sure if it's the answer but you may be slightly misunderstanding it - you are describing a system without the crucial "Data Access Layer" which sits in front of the database. This layer is what mitigates part of the problem you are describing because it puts a layer of abstraction b/w raw data access and the services using the data. For example it could be a REST API, or using a message queuing backbone or both.

hyperpallium2 · on Nov 14, 2021

Maintaining access for old programs despite evolved schema by was Codd's motivation for relational algebra.

reificator · on Nov 14, 2021

My advice to anyone naming things and trying to evangelize them is to try to find any way the name could be made fun of, any alternate meanings that could be applied, and so on. And do that for the acronym as well.

The article here calls it Data-Oriented Architecture once, and then proceeds straight to DOA. Trying to convince people to switch something as fundamental as their software architecture is difficult enough without calling your idea "Dead On Arrival" from the first paragraph.

jspash · on Nov 14, 2021

Agreed. I work with an internal system unfortunately called the Digital Asset Manager. So all day long we talk about adding things to the DAM database and editing the DAM record and so on. It never gets old - to my juvenile sensibilities.

However, the term does occasionally slip into external meetings and interactions with people. It's not a good reflection of the company for sure.

throwuxiytayq · on Nov 14, 2021

Yes, it's a name that's both funny and memorable. I don't see much of a problem.

marktangotango · on Nov 14, 2021

This is so true, I was once burned by an unfortunate mis-characterization of an acronym for a service I had created. It benefits us all to keep in mind that sales/marketing types prioritize different things than technical merit or clear naming!

gnuvince · on Nov 14, 2021

Well this is fun, now we have data-oriented design, data-oriented programming, and data-oriented architecture, and they mean three different things.

ManuelKiessling · on Nov 14, 2021

Well, to be fair, maybe it’s totally ok if three different names mean three different things…?

sas224dbm · on Nov 14, 2021

The article says that this is an "unusual architectural choice". I strongly disagree. A reference is made to an RTI publication many years ago. Realtime Innovations have been key players in the development of DDS publish-subscribe middleware for donkey's years. I remember NDDS back in 1998. We've successfully used DDS in the military space to enable different suppliers to connect to a DDS backbone-based system via the 'data model'. Especially in DIL environments (Disadvantaged, Intermittent, Lossy) where QoS settings can be applied to different pieces of the data model over a plethora of different bearers (think Satcom, IP radios, etc). It's a very-much typical architectural pattern selected.

KineticLensman · on Nov 14, 2021

Also widely used in federated military simulations [0]

[0] https://en.wikipedia.org/wiki/Run-time_infrastructure_(simul...

ChicagoDave · on Nov 14, 2021

Hard pass. I'll stick with Domain Driven Design, though for simple systems, old school monoliths are fine. Complex enterprise systems end up being a cluster-F when driven into a monolithic state or when the primary architectural foundation is some database.

Of course any traditional relational DBA would love this article. But relational databases are an architectural anti-pattern.

Don't start with the data.

Start with the domains.

ratww · on Nov 14, 2021

> But relational databases are an architectural anti-pattern.

This is such an inflammatory and ignorant statement that I don't really have any other answer other than to call it pure bullshit. Relational Databases are probably one of the few popular areas of our profession that are not only backed by formalisms and research that tell us to do it correctly, it is also proven to work in practice: it is the de facto backbone of most financial, commercial, government systems and, heck, even military, but also even of modern FAANG-style big tech.

Sure, they're boring. And it takes time to learn. And normalisation might has its costs but we can know them all upfront, and we know where and how to break them. The (currently trendy) horizontal scalability is indeed non-trivial with them, but vertical and horizontal-via-sharding are easier than most ad-hoc solutions. But the boredom pays off in guarantees, and to learn you have 50 years of mature material to rely on.

(This is not to say that non-relational databases are useless. Both have its uses, but they're both only really useful if you properly understand them)

Complex enterprise systems are complex for reasons beyond the technology. Often due to processes and bureaucracies. Those complexities often end up turning into "essential complexity" in systems, and Domain Driven Design is not able to reduce such essential complexity, and nothing can.

Nor is Domain Driven Design incompatible with relational databases, especially the tactical part, which is often the major selling point of DDD. It is entirely possible to model domains within databases, and is entirely common.

Also, DDD is not a guarantee of success. If anything, it has not proven itself nowhere near as much relational databases have. As much as the original idea of DDD is good (again, especially the tactical part), it I see it often become rationalisation for the same old architecture astronautics that cause excessive incidental complexity. For people wanting architectural complexity, I can completely understand the distaste for something simple like "keeping it in the DB". It completely negates all the fancy technical patterns prized by developers. This is not to discount the original idea and techniques of DDD: merely saying that the excuse "relational databases can be bad if not well done" applies as often, if not more, to the practice of DDD.

HelloNurse · on Nov 14, 2021

Relational databases work so well that they are also highly effective when used badly.

So it's easy to succumb to temptation and take the easy path (I don't want to add three tables to keep it normalized... let's allow this column to be null, the applications can check... another field for the second phone number, we'll see about the third one... make your queries non-isolated or we could have a deadlock...) with apparently good results, and then when technical debt comes home the architecture amateurs can blame the tools instead of themselves.

ChicagoDave · on Nov 15, 2021

It's not just about poorly designed relational models. It's also about impedance mismatch when converting relational models to business models. You end up automating that using object-relation mappers and this leads to a hardened and immovable design. Eventually, the business will ask for a significant change and the tech staff will put forth their costly assessment for that change. Then the business has to decide to either rewrite the system or to pay for the tech debt or (worst case) hack the new features into the existing architecture. Many times, it all gets punted and the business lives with the complexity.

I've seen this way too many times to back off of my assessment. Relational models do not, in the long term, enable an agile business model. It is an anti-pattern even if it takes years to reveal itself as such.

ratww · on Nov 16, 2021

I've seen several complex businesses with incredibly complex processes and systems that were entirely based on relational databases, and changes were always fast in the database side. The database itself is agile and it is easy to modify and migrate.

The problem always lies on excessive abstractions and patterns on the backend side. Which really isn't solved anywhere by any strategy other than keeping things damn simple.

Your descriptions of "impedance mismatch when converting relational models to business models" and "automating mappers that leads to a hardened and immovable design" are a precise example of that: OOP patterns muddying the development process and causing problems in unrelated parts of the system (in this case the database).

Of course, keeping things simple and programming in a way that fits the relational model is incredibly difficult for developers that have been trained to believe that complexity is the answer to all problems future and past. So the resort for those people is to keep pestering management to throw the baby (relational databases) with the bathwater (relational mappers and other related OOP patterns).

ChicagoDave · on Nov 19, 2021

I used to think this way as well. Then I rebuilt the performance management system at Accenture using domain centric micro services and a document database. Not only were we able to build faster (schemaless is so much easier on change management), but also offers the ability to add features at a frighteningly fast pace. No RDBMS-centric project I have ever worked on comes close to the efficiencies of what we did on that project and on projects since using the same paradigms.

And when people needed the transactional data in a relational model, we streamed events and that was read only in some other tool and the core system didn't need to know about any of that.

Been down this road. Saw a fork in 2015 and haven't looked back.

HelloNurse · on Nov 28, 2021

> domain centric micro services and a document database

It sounds like you put ORM hell behind you by getting rid more of objects than of databases.

> Not only were we able to build faster (schemaless is so much easier on change management), but also offers the ability to add features at a frighteningly fast pace

What kind of change management? Frightening clients with API changes? Eluding code reviews and formal release processes?

ChicagoDave · on Nov 15, 2021

After 30 years of architecture based on RDBMS, it's my opinion that _starting_ with a relational model ends up hiding and obfuscating business functionality at the expense of relational paradigms.

I didn't say relational databases are bad. I said it was an anti-pattern. In my mind, relational databases should be mostly used in data warehouses, reporting, and analytical systems.

Transactional systems are better suited to document databases and CQRS.

ahdh8f4hf4h8 · on Nov 14, 2021

Honestly, object oriented design is the anti-pattern here. Almost none of the benefits promised in the 1990's and 2000's have come to pass, and for most applications type hierarchies are of little help.

For most applications, having a really good relational data model, and controlling data updates through a service/microservice will give most of the benefits of "object oriented" design without any of the drawbacks. The service will act as a "live object", data integrity will be preserved, data science modeling will be easy, and the application can be designed using whatever tomorrow's preferred platform of the day is, or even some low code stuff for simple applications.

dimgl · on Nov 14, 2021

> But relational databases are an architectural anti-pattern.

Say what bro? Surely this is sarcasm. Relational databases make up the foundation of most of the services you use today. Even poorly designed relational database schemas will work in production.

RcouF1uZ4gsC · on Nov 14, 2021

> But relational databases are an architectural anti-pattern.

I disagree. I think relational databases are such a mature technology that you should definitely start with them if you possibly can. They are flexible enough that they can accommodate a lot of different applications. In addition, relational databases are performant and scalable enough that unless you reach MAGAF scale, they will likely work for you.

cerved · on Nov 14, 2021

You can have domains in a single database using schemas

arcbyte · on Nov 14, 2021

There really is no data. Only the domain.

rozularen · on Nov 14, 2021

what other type of database do you use then instead of relational databases?

ChicagoDave · on Nov 15, 2021

Depending on the requirements of the domain, you can use relational, graph, document, or key/value.

Let the domain decide.

jason2323 · on Nov 15, 2021

I think this kind of architecture would be really useful when implemented alongside a dataflow based architecture where all requests can be "logged" to the database and read subsequently through stream processors

RedShift1 · on Nov 14, 2021

This just adds another layer to the system? I'm not really following what the advantages are.

zmmmmm · on Nov 14, 2021

It isn't really brought out as much in the article as it should be ...

A key thing like about this architecture is it gets some of the benefits of microservices (codebases can be split, services deployed and managed separately, low coupling as a default) but without the downsides of fragmenting the data store, which causes a lot of the issues in a pure microservice architecture.

Once you fragment the data store you now have to fully compensate for the loss of transactional integrity at the data layer. It also forces a coherent data model across all the services - no more "service X" has a slightly different model of a data type to "service Y" - there literally is only one definition of the data type.

So I see it as a reasonable midpoint between monolith and microservice that gets most of the benefits of microservices without the most severe drawbacks.

ordiel · on Nov 14, 2021

In my opinion (to summarize) this solve one of the main problem introduced by (wrongly implementing your microservices/ splitting your monolith) which is data intrgrity there is a reason why many cloud services are "eventually consitent" (no wonder the head quartes for such are located in such leftiest cities since its obvious they like the word-playing. By adding any modifier to "consistent" they are telling you it is NOT consitent, even if that is for a "brief" period of time).With this aproach you centralize your data and implement multiple actors working atop the same set of data; as they say in the article this comes to its own costs, mainly due to the limitations of existing ORM's (see https://blogs.tedneward.com/post/the-vietnam-of-computer-sci... you can jum onto "vietnam and OR mapping" to save you a bunch of historic bull crap) where then you have multiple applications working on the same set of data you go onto a more horrible place than data integrity issues, which is data corruption issues... In brief, choose your own poison, you like debugging the DB/data issues, debugging your apps, or having an un scalable system?

HelloNurse · on Nov 14, 2021

What about limiting ORM masochism and making applications interact nicely through the kind of "data access layer" the article discusses?

ordiel · on Nov 14, 2021

Then develompen velocity goes down the drain :/ (which grows exponentially more unmanageable the more developers/applications you add to the mix)

In my opinion what you suggest would be a good solution... But try to pass that by mos. Manager for which (in most companies) we are already late to compleate the newly received feature/requirement

HelloNurse · on Nov 15, 2021

But "development velocity" is velocity at doing things well, not at all the same kind of velocity as the velocity of a truck flying out of a sharp turn because the driver likes being fast and dislikes braking.

A sound architecture (admittedly unlikely) would help the kind of incompetent organization you allude to to maintain order and fail later.

hyperpallium2 · on Nov 15, 2021

Article does not contain the word "relational".

DOA seems to include many applications accessing the one database - which is how it's been done for decades, even before RDB, I guess it's in reaction to SOA.

mpweiher · on Nov 14, 2021

Sounds a lot like a variant of Model-View-Controller, or more precisely, both this and MVC are probably instances of a slightly more general pattern.

What is here referred to as the central data store is the model.

The services are the views. Just like views, they do not hold onto any actual application data, that must be stored in the model.

The services must not communicate with each other, just like views (and controllers/ViewControllers) should not communicate with each other. Instead they communicate via the store, just like in MVC, where the model is updated and then the model notifies other views of changes.

(The last part is the one that a lot of the alleged implementations of MVC get completely wrong, leading to the problems that they then blame on MVC. Which then leads to them creating new patterns to "fix" the "problems" of MVC that would be solved by just implementing MVC correctly).

So yeah: good stuff. Nice to see how good architectural patterns are widely applicable.

perspicace · on Nov 14, 2021

“where the model is updated and then the model notifies other views of changes”

I like your overall analogy but you lost me (as you predicted lol) with that last part. Views notifying other views (probably via listeners and event handlers) sounds more like the MVP pattern to me. But yeah overall I like your take on understanding the article.

AtNightWeCode · on Nov 15, 2021

The concept of one store per service was mainly introduced with micro services to try to fix problems with SOA. Old SOA systems typically have a single data access layer.

anuvrat1 · on Nov 14, 2021

I see this as FaaS with new skin, ad-hoc stateless functions with minimal to zero interaction with each other.

michaelcampbell · on Nov 14, 2021

Thank goodness for reader mode.