Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is one thing when a junior does this because they haven't learned better.

It's quite another when experienced seniors ban the use of SQL features because it's not "modern" or there is an architectural principle to ban "business logic" in SQL.

In our team we use SQL quite heavily: Process millions of input events, sum them together, produce some output events, repeat -- perfect cases for pushing compute to where the data is, instead of writing a loop in a backend that fetches events and updates projections.

Almost every time we interact with other programmers or architects it's an uphill battle to explain this -- "why can't just just put your millions of events into a service bus and write some backend to react to them to update your aggregate". Yes we CAN do that but why do that it's 15 lines of SQL and 5 seconds compute -- instead of a new microservice or whatever and some minutes of compute.

People bend over backwards and basically re-implement what the databases does for you in their service mesh.

And with events and business logic in SQL we can do simulations, debugging, inspect state at every point with very low effort and without relying on getting logging right in our services (because you know -- doing JOIN in SQL is not modern, but pushing the data to your service logs and joining those to do some debugging is just fine...)

I think a lot of blame is with the database vendors. They only targeted some domains and not others, so writing SQL is something of an acquired taste. I wish there was a modern language that compiled to SQL (like PRQL, but with data mutation).



This debate has been happening forever and really the fundamentals haven't changed. It doesn't matter if you pull data from the DB to an old middle tier or to a nice modern microservice architecture, you're almost always losing the performance game at that point.

The database already has most of your data cached in memory, it already built statistics on the best methods to use to join the data, and the data is always local to the DB.

Reading lots of a data from a DB to do the same operation in a microservice means you incur a cost of data retrieval, memory for a copy of the dataset and enough to do a join, the network speed to transfer the data over, and then you're giving up all the indexing and statistics that a database provides.

There is almost never a reason for this unless you're needing to do some kind of data analysis that isn't supported by the DB. Maybe most of all this copy the data to somewhere else to do a thing is never going to scale well.

Don't mind me, I'm just a retired former DB nerd.


> the fundamentals haven't changed

I believe they have, because we've gotten so much better at ORMs. ORMs get a bad rap because people use them terribly (and it's largely the ORM's fault because they encourage their own terrible use). But they are a true "change in fundamentals" that can finally end this debate.

Business logic doesn't belong in the database layer, because the database layer doesn't support the level of abstraction, composability, testing, and type safety that modern programming languages afford. However, that does not mean that business logic doesn't belong in SQL. It just means you have to treat SQL as an output of your actual business layer.

A good use of an ORM looks like a metaprogramming environment for conveniently building syntax trees that get converted into intelligent SQL. You know it's working well if the SQL looks somewhat like you'd write yourself and you can build one SQL statement with multiple layers that are abstracted from each other (in C#, think about passing IQueryables around). The structures are parsed into SQL and executed very explicitly only at the end of the chain, do a lot of work, and never produce SELECT N+1s. A good ORM user is thinking in SQL but writing in C# (or whatever your business layer is in).

A bad use of an ORM is trying to pretend like SQL doesn't exist, or is too scary for regular programmers to think about. It has SELECT N+1s everywhere. A bad ORM user is thinking in C# and hoping the database will roughly do the correct thing.


> I believe they have, because we've gotten so much better at ORMs.

This has almost nothing to do with what the parent was describing. Poor ORM use (or poor ORMs) introduce a different set of ways to mess up performance.

[EDIT] OK, this isn't entirely fair, or at least I didn't explain it well enough (no, it's not getting downvoted, I just decided I'm not happy with it). The problem in question is, at its heart, developers not realizing what they should be letting the database do, and perhaps not even realizing what it could do, or deciding they shouldn't let the database do it for some probably-misguided purity reasons or whatever—ORMs generating more-efficient queries or exposing more features is great and does help with the problem of straightforward, natural use of ORMs sometimes resulting in poorly-optimized queries, but doesn't fix the problem of developers not knowing that a block of logic in [programming language of their application] should have been left to the database instead, whether that's achieved by hand writing some SQL, writing a stored procedure, or directing the ORM to do it. It's a little related in that hopefully better ORMs will result in ORM-dependent developers learning more about what their database could be doing for them, or being more willing to poke around and experiment with the ORM since it's more-pleasant to use, but I'd expect that effect to be pretty marginal.


A problem that may ORMs have is thinking/working with row-expressions rather than sets. If everything is expressed in plural sometimes with zero or one, other times many, then the N+1s mostly go away. The same goes for many interfaces/APIs that have single and multiple forms, only make the multiple forms and have callers call it with [single].

Related pet peeve: calling tables by plural names. The table (or any relation) should be named for the set of X rather than thinking of it as Xs.


> A problem that may ORMs have is thinking/working with row-expressions rather than sets.

Absolutely. There's a reason that the famous Out Of The Tar Pit[1] paper identifies relational algebra as the solution to many programming woes. Thinking in sets instead of individual items is extremely powerful, when using an ORM and also in general. If an ORM makes this hard, use a different one (hopefully there is a better option).

> Related pet peeve: calling tables by plural names

Agreed again! Based on my limited observations, this seems like a big cultural difference between "database people" and "software people". The people who spend most of their time working directly in databases (and trying to basically write fully fledged business applications entirely in the database layer) seem to think of tables as big containers. If you labeled a box full of people (or a binder full of women?), you'd probably label it "People". Whereas "software people" tend to think of a database table as a definition of something, more like a class or type. Clearly the correct label for that definition is "Person".

Actually the latter is also wrong for a different reason: that is not how nouns work. But that's a story too long to fit in this comment.

[1] Out of the tar pit (warning: direct link to 66-page PDF): https://curtclifton.net/papers/MoseleyMarks06a.pdf


It's the other way around - "software people" tend to use plurals for table names, because it then maps nicely to plurals in property names in their object layer, where it's already the established convention.

The "database people" OTOH tend to use singular, which goes at least as far back as C.J. Date. I'm not sure why, but perhaps it's because fully qualified field names read more natural.


Actually, all the old relational texts talk about select * from employees (plural) because they didn’t have aliases for table names in queries.

I agree that when you use aliases for multiple tables when you use (Oracle, non-standard SQL joins) that it looks prettier, but you can also do that with plural table names.

Select employee.name manager.name from employee, manager where employee.manager_id = manager.id

But it doesn’t matter that much when you use the traditional join syntax:

Select employees.name, managers.name from employees left outer join managers as manager on employees.manager_id = managers.id

Still I prefer singular table names, just disagree that it was data people who preferred singular and why.

Of course, with aliases you can return the exact name you want with aliases:

Select e.name as employee, m.name as manager from employees e, managers m where e.manager_id = m.id


it is either a bunch of apples in which case you would call it apples, or it is the box that holds apples, in which case you would call it apple. as in apple_box.

Got to admit I am pretty firmly in "it is the box that holds apples" camp so none of my array, table, dictionary names are plural. Some time I comfort myself by saying "well, apple[5] just sounds better" but that is really up for debate.


But a set is plural. We say "the set of things", not "the set of thing".


I guess it depends on usage that's familiar to you. I would say set of Real numbers and call the set Real (like a class) not Reals like a collection of instances.


I agree with your pet peeve. ActiveRecord (Rails ORM) encourages by convention that your tables are named in the plural, and it sometimes drives me crazy. When writing queries outside the ORM, I end up aliasing tables/relations to the singular so the query is more sane to reason about, e.g.:

> SELECT person.id, person.name, person.age ... FROM people person JOIN ...

Which reads so much nicer to me than:

> SELECT people.id, people.name, people.age ... FROM people JOIN ...

(obviously a somewhat contrived example because of the people/person inflection that makes it awkward already)


> A good ORM user is thinking in SQL but writing in C# (or whatever your business layer is in).

That doesn't sound like metaprogramming; it sounds like insanity brought about by bureaucratic limitations on language choice.


It should be "thinking in relational algebra but writing in C#". Arguable Linq with C# represent relational algebra better than SQL.


But not more efficiently.

Working with small datasets in memory can be fast, but larger datasets consumes more memory and cpu.

The purpose of a database is to efficiently store and retrieve data from disk — and limiting it to only the data you need.

Most database interactions are also over a network, which is always slower than (and in addition to) disk retrieval. This should not be done except when you cannot fit the data on disk (or work with it in memory) on the same system.

Services should not be separated except for the same reasons (exceeding computation or memory) for the same reasons (network, disk latency).

We used to perform billions of complex computations in seconds reading from slow disks and slow systems with much smaller memory footprints on single systems with single cpus.

This article is a parable for the consequences of not understanding that.

The Google and Amazon and Netflix etc white papers are about organizations that build solutions for applications that cannot fit on disk or in memory or be handled by the computation of a single system — and do not apply to 99% of application architectures that use them, including those developed by Google, Amazon, etc.


Consider a function that takes an IQueryable<T> (for any T) and connects it with your change tracking logic to return an IQueryable<Tracked<T>>, or connects it with your comments system to return an IQueryable<Commented<T>>. I can take a query of (nearly) arbitrary complexity represented by an IQueryable<ComplexModel> and turn it into an IQueryable<Tracked<Commented<ComplexModel>>> in one line, while still keeping the result as a single SQL statement. If you hate that type, note that it's almost never explicitly written out like that (thank you, 'var' keyword).

On the other hand, your databases have a "Comments" field in every table; a "LastModifiedBy" in every table. Your database tables grossly violate the single responsibility principle: every single cross-cutting concern is represented in every single one of your "primary" tables. (Level up: every concern is a cross-cutting concern.) Your databases have association tables between X and Y for every primary data type X and every cross-cutting concern Y, leading to a combinatorial explosion of redundant tables. Your SQL queries/views/procedures/triggers are repetitive and full of boilerplate. If your client told you they needed you to change how change tracking is done in your system, you'd have to touch nearly every single SQL module in your entire system.

For me, I need to change the implementation of IChangeTrackingSystem, and that's literally all. All of my other queries don't have to know about the change, because they're simply composed together with whichever IChangeTrackingSystem is in place. Show me how to do that in T-SQL and I'll reconsider my position.

Now that I've tasted this fruit, the old way of doing things sounds like insanity to me. You're being needlessly zealous and close-minded here.

Edit: And another thing! (Shakes fist)

Much of the beauty of modern programming languages is their adaptability to whichever domain you're working in. We no longer need domain-specific languages for every different task, because we can embed those languages inside our parent language; then we don't have to reinvent static typing, write a new IDE, and learn decades of programming language design before we can start on our actual business logic. So, yes:

When writing data access logic, you should be thinking in SQL (or generic relational logic) but writing in C#.

When writing a game renderer, you should be thinking in linear algebra, but writing in C#.

When writing a payroll processing system, you should be thinking in payroll, but writing in C#.

When writing a chemical engineering toolbox, you should be thinking in molecules and reactions and units, but writing in C#.

This way, anyone who knows C# is already halfway (yes, only half) toward being able to maintain your system. If you insist on using a DSL for every single one of these tasks, 80% of your time will be spent on context switching and trying to get them to talk to each other correctly and correcting issues in the DSL itself.

Of course I do often write raw SQL as views and scripts, and of course I write TypeScript and HTML and CSS/SASS when working on a front-end (although I usually "think in HTML and write in TypeScript", not surprisingly). But that's mostly for development and maintenance; not for core business logic or library development.


Let us assume you accept that writing SQL is better than writing the equivalent machine code.

The “meta-programming” is stating there is a “language” representing their problems that is better than SQL. If we call that better framework(language) Blub[1] then that is probably a good metaphor, rather than causing people to get triggered by the generic ORM tag.

[1] https://www.benkuhn.net/blub/


To be fair, there is one important thing the ORM brings though. Sanatizing inputs.


Don't sanitize your inputs; parameterize your queries instead.


I see those as two different solutions to two different, but slightly overlapping problems.


Going to keep those problems a secret?


Imo, the problems are the same but the conditions are different. If the query is used many times, parameterize it. If the input is used many times, sanitize it. If both are used many times, parameterize.


Parameterising works for individual fields in a statement. However for complex queries (the reason for the meta-programmimg comment) you can’t always parameterise the additional subqueries/tables/fields. You can use stored procedures, but that just shifts the necessary code from one language to SQL, and the SQL doesn’t have a robust library you can just use.


> A good use of an ORM looks like a metaprogramming environment for conveniently building syntax trees that get converted into intelligent SQL. You know it's working well if the SQL looks somewhat like you'd write yourself and you can build one SQL statement with multiple layers that are abstracted from each other (in C#, think about passing IQueryables around). The structures are parsed into SQL and executed very explicitly only at the end of the chain, do a lot of work, and never produce SELECT N+1s. A good ORM user is thinking in SQL but writing in C# (or whatever your business layer is in).

Sounds very similar to how Django (python) wants you to pass around QuerySets. It's very easy to set up an initial query with joins/etc, then pass the QuerySet into multiple functions to filter it in multiple different ways (each filter creates a new instance, so you're forking the original set and don't have to specify the joins multiple times), but the query itself is never actually run until you try to read from it.


Also databases (yup, MySQL I'm looking at you) got better at handling subqueries quickly, which makes the job of ORMs a lot easier.


It's still (5.7) really bad sometimes. A simple `IN (subquery)` can sometimes run much better if the subquery is fetched and another round-trip query is made using literal values for the `IN (...)`. I'm sure there are plenty of other 'deoptimizing' query patterns that shouldn't be.


To use SQL (or rather, push compute to the database) well you need to think in sets, indexes, etc; the primitives you work with is rather different from the ones you usually use in C#.

You need to switch mode of thought anyway.

If a really good language for that happens to be expressed in the language of the C# AST -- instead of some new syntax -- that would be fine with me. I do not see a big difference.

But since one needs to switch mode of thought anyway, a new high level language that compiles to SQL and would be usable across all backend languages I would like slightly better. But, whatever fixes the problem of allowing pushing computation to the database without all the warts in SQL I am all for.

Until that really gets a bit further than today I prioritize writing SQL over a bit too leaky abstractions.


A separate high-level language just for queries doesn't make much practical sense, since queries are normally surrounded by plenty of other code.

OTOH something like C# LINQ, which, on one hand, plays nicely with the rest of the language, and on the other, can be directly mapped to SQL (without ORM and other impedance-mismatch-inducing layering) is great. But, necessarily, language-specific to ensure tight integration.


As a data platform manager in a data-rich company I agree: programmers tend to prefer their favourite hammers to sql, even if a modern relational database can do the thing much better.

But! As somebody with a relatively good understanding of a history of sql, relational dbs and related concepts I have to add: SQL is often to blame.

All the the amazing engineering that goes into database engines, clean and coherent ideas of relational algebra, optimisability of a declarative approach to computaion - all of that gets bad rep because of the nightmare of sql-the-language.

The standard, the syntax, every little bit that could go wrong is just wrong from the point of view of a language designer. Composability, modularity, predictability, even core null-related defaults.

But it's everywhere. We just have to accept it.


It’s not just the language, it’s schema evolution, data distribution, and exposed APIs.

I don’t want to give other teams direct access to a DB and have them Not only take a dependency on the schema, but have the ability to run arbitrary queries that may exhaust resources in ways that impact normal operations. If I expose an API, I control the access patterns and can evolve the schema separately to suit the workload.

If other teams need a replica to perform their arbitrary queries, I’d much rather have them using a richer data model that they can normalize into whatever form suits their needs than have to conflate that into a source of truth data store.

If you have a single business unit and can get away with commingling concerns within a small team, great, throw it all in a single DB. If it makes sense to split, however, do it quick and early to avoid a decoupling hell that is more expensive then having split in the first place.


> If I expose an API, I control the access patterns and can evolve the schema separately to suit the workload.

And then they will depend on your API data schema...

Yes, schema evolution is hard, but databases have many tools to help here that you will either have to recreate on your APIs or live without and have a harder time. Either way, all the trouble comes from data evolution, and any schema-only change is trivial to deal with.

Data distribution is something that varies from one DB to another, they usually have very good performance that is hard to replicate on your application layer, but are very hard to setup and keep running. But the point about control of resource usage is a good one.


> And then they will depend on your API data schema...

Which we have methods of evolving. I can put my API in gRPC and know exactly which changes will or won't break compatibility. Try doing that with a database.


> And then they will depend on your API data schema...

+100 to this


But that API schema is unrelated to the underlying DB schema. I’ve been able to run services with different backends (eventually consistent & low latency vs transactional) exposing the same API. That would not have been possible by just giving consumers access to a DB.

The DB can do everything can’t seem to understand this, for some reason.


Any nontrivial change to the base model will mean a lot of complexity in the API layer and degraded performance. Maybe that's worth it for you, maybe if you're exposing this data to hundreds of external users who don't need high performance. But I feel that for most usecases, barebones DB access is the better option.


> That would not have been possible by just giving consumers access to a DB.

That depends on the DB's foreign table support, or equivalent, if it has it. It's certainly not categorically impossible with a DB.


Moving the goalposts, I can see.

Of course, merging different backends can not be done on the backend. Either you add a layer or you do it on the client.


That seems to be an odd way of looking at it, IMO. Most of the time, the entire point of building an API is to present consistent functionality to any consumer, not only ones under your control. Also, a well-behaved API is versioned so as to allow evolution of the API without breaking existing clients who can upgrade to new versions as they are able.


That’s not a new problem though? The classic way to solve this in database is to expose the data API as stored procedures/views and restrict query access on the actual tables to just the DBAs - I think even MySQL which was late to the party here has had this ability for some time now.


> I don’t want to give other teams direct access to a DB [...]

You don't have to. Package up necessary queries into views and/or stored procedures and grant permissions only on those. Views can also shield from schema changes.


That also requires all data to be in the same DB. Often that’s not practical or possible.


Postgres FDW's do wonders to centralize this type of access.


In SQLServer you can define external tables that access data on another server, or even parses a file.


As someone who absolutely loves the power of SQL, I abhor the footguns involved. Especially with null-based ternary logic that is incomprehensible to most people.


You have to learn how your database handles NULLs, different databases do it slightly differently. Once you do that, you're fine.

One big thing is that NULL should never "mean" anything in a business sense. It is the absence of a value, hence it cannot mean anything.

By the same token you need to understand how your database handles concurrency. Different databases do it differently.


> One big thing is that NULL should never "mean" anything in a business sense. It is the absence of a value, hence it cannot mean anything

I always had a problem with that notion. I mean, it has a memory representation, it has a set of operators you can apply to it, defined behavior in UNIQUE and FOREIGN KEY constraints etc. All this is well documented (though can behave slightly differently between databases, as you mentioned).

So, it has a set of valid values (just one: NULL) and a set of valid operations, so it's a type!

And now you have a type that looks somewhat similar to a null in "normal" programing languages, and SQL generally lacks the mechanisms for inventing your own types, so why wouldn't you use it in your business logic where it makes sense?

The design of NULL seems like a historical accident anyway. The best I can discern, there was a need for something to behave as "excluded" in the context of FOREIGN KEYs and outer joins, and so that semantic was just passed along to other areas where it made less sense.

I think a better type system and a better separation between comparison logic and the core type would have obviated most of the NULL's weirdness and made it far less foot-gunny in the process...


The problem is that it isn't just me. It's everybody. Everybody that uses an SQL system has to learn a version of logic that looks like something they've learned before but actually has no relation to it. People can learn it, but it is a chore and takes a lot of real world experience (i.e. costly mistakes) to get it drilled into their head just like it did with me. As someone who does a lot of mentoring of data engineers, it's infuriating that they all have to go through this at some point.

Unfortunately the problem is made worse by the proliferation of languages making the equally heinous mistake of treating null as "false-y". Bad form, Peter, bad form!


Many businesses are absent of value :)


Agreed. The power of SQL is awesome. But really wish people started making better languages that compile to SQL; like it happened with JavaScript


Same. Edgedb is doing something good but I wish I didn’t have to deploy a new service on top of my own db. Postgres only as well.


is there something fundamental from us making a new frontend to something like postgres? I think all the solutions that compile to SQL kinda work but it would be nice to have a new native interface that sucks less.


Nope. Nothing but industry-wide inertia is so massive by now that it just doesn't make sense to switch. Nosqls tried hard and went nowhere.

What i find funny is that most dbs translate sql into an internal representation that is remarkably similar to a proper relational algebra and optimise on that. I'd really just prefer the alebraic language as described in the original ages old paper.

There were also a few dbs trying to push sql-but-better languages... haven't heard about them for a while.


I have experimented with a different query model from sql for time series data. A query took the form of a Rhai script. (Rhai is a scripting language that has great interop with rust, so it was similar to how lua would be used to script parts of a game.)

Each query script would act on a few objects in global scope: `db` (handle to database), `start`, and `end` (time range of grafana dashboard that query was for).

I found being able to write imperative (rather than declarative) code to build a query to be extremely powerful, especially for storing variables and looping over things.

e.g. query script - just to get a feel for it:

    let dalmp = db.ts("pjm-da-lmp/western-hub") // ie retrieve the timeseries named 'pjm..'
        .with_time_range(start, end);
    
    let rtlmp = db.ts("pjm-5min-lmp-rt-lmp/western-hub")
        .with_time_range(start, end)
        .resample("1h", "mean");
    
    let da_err = rtlmp.diff(dalmp);
    
    #{
        dalmp: dalmp,
        rtlmp: rtlmp,
        da_err: da_err,
    }
A query script would be expected to return a dictionary-like object. the keys would be used as labels and the values would each be a time series object.

This is not the perfect solution for every problem but though it might be interesting to see an example of a very different approach to querying compared to sql.


I'm a fan of relational databases, but I think we should concede three points:

1. Databases should accept queries in a structured machine-readable format rather than a plaintext language.

2. SQL in particular is a poorly designed language: it isn't very composable, it has lots of annoying edge cases (like NULLs), and it has a number of annoying limitations (in particular, its historically limited support for structured data within fields).

3. Given how most RDBMSs are designed, you often need to handle denormalization and caching manually. This requires doing a lot of excess data management in a middleware layer--for instance, querying a cache before accessing the DB, or storing data in multiple places for denormalization. Some of this can be done in SQL (e.g. denormalization through TRIGGERs), but since SQL is not a very good language (see (2)) that can be tough.


SQL being a plaintext, human-friendly language is a good thing. SQL is a common skill transferrable between languages and environments, and it is also easily usable by non-developers. If we replaced SQL with some abstract language, what language would you use when talking to the database directly (via psql, sqlplus, or whatever)? Would you need to learn the PythonQuery, JavaQuery, and C#Query languages separately? Would the language used in ETL tools be different still? What language would you write database views, triggers, functions in?


Have you looked at Quel? We have multiple languages to run our code on generic CPUs (C, Python, Java, Ada, Go, Rust, Lisp, etc.) Why must we have only one database language that doesn't do a very good job at Codd's relational calculus?

So many people have drank the kool-aid that SQL is the answer. Maybe it's time to change this.

The success of ORM's could be thought as the market voting against SQL.

https://www.holistics.io/blog/quel-vs-sql/


The blogspam post you linked has zero examples of QUEL. Looking at Wikipedia [0], it seems much uglier and less readable than SQL.

I am not a database theory person. I'm a developer who does not care about Codd's relational calculus. As for ORMs, they are great at solving simpler problems and CRUD data access, and they make the developer’s life easier by giving them nice objects to work with as opposed to raw database rows. However, any advanced analytics/reporting/summary queries tend to look awful with an ORM.

[0]: https://en.wikipedia.org/wiki/QUEL_query_languages


I was hoping you would address the points about composability rather than turning it into a beauty contest between the two languages.


SQL actually isn't all that transferable, because in practice you almost always use an ORM or a query builder instead of writing queries directly into your codebase. So when switching languages you still need to learn the new language's ORM; your knowledge of the underlying SQL will only go so far.

Of course queries should be human-readable, but there's no need for it to be a complete language with its own grammar and templating via prepared statements. The queries could be encoded in JSON or some similar (probably custom) human-readable language that can easily be generated programmatically. MongoDB does this IIRC; it's probably the only thing I like about Mongo, but it's a good idea.


in my experience most engineering shops that care about database perf are not using ORMs except for the most generic CRUD features. you gotta handroll your queries with an eye on EXPLAIN once you pass a billion rows in your tables, in my experience anyway


My experience as well. ORMs are really nice in the beginning, they save a lot of boilerplate code. But they become the enemy once scale and/or performance become an issue.


The thing about #2, is poorly designed in comparison to what?

Chances are high the application layer is written in JavaScript, PHP, Ruby, or Python. We don't even talk about nasty edge cases in those languages because they are uncountable.


Those languages all have first-class functions and OOP-style encapsulation.

In SQL stored procedures (at least in Postgres), you can't have a variable that holds multiple records. You can't even define a variable inside of a block.


DECLARE foo record[];


Temp tables?


None of those languages is as unpleasant as SQL. Not even PHP.


For me, I would much rather SQL especially when using DBT too to manage the queries like any other language. I thought the same way as you and SQL certainly has its warts, but I have grown to appreciate it's elegance. I think over the long-term it is easier to maintain because it is so concise and the most common second language among programmers.


Once you start thinking in sets instead of collections of independent statements as per PHP et al, SQL starts looking a lot better.

If you try to fit SQL into the general purpose programming language box, you're gonna have a bad time. A really bad time.

SQL is a DSL for set theory. Nothing more. Nothing less.


The things that are bad about SQL have nothing to do with set theory. The non-orthogonal syntax. The terribly limited and opaque type system. Ternary logic that fails silently. The utter lack of composability or testability.


Also, views are the essence of composability.

As for testing in the Postgres sphere, there's pgTAP and pg_prove. https://pgtap.org/

Extension development includes a built-in unit testing harness as well.

For use with any database, a tool like Sqitch allows for DB-agnostic verification at each migration point. https://sqitch.org/docs/manual/sqitchtutorial/#trust-but-ver...

Proprietary databases have their own solutions as well. https://learn.microsoft.com/en-us/sql/ssdt/walkthrough-creat...

I think "utter lack" is grossly misrepresenting the state of the art. If you mean "widespread ignorance of existing techniques related to composability or testability," then we are in agreement.


CTEs are quite composable. I might be spoiled by Postgres, but types are quite robust there.

Replace "NULL" with "unknown" in your head, and the ternary makes more sense. When you're building your schema, does an unknown value make sense in that context? Many times not, and the column should either not be nullable or should be referenced in a different table by a foreign key.

3 = NULL

"Is 3 equal to this unknown value?" Maybe yes. Maybe no. It's unknown. Therefore the answer to "3 = NULL" is NULL. The answer is also unknown. Not true. Not false. Unknown.

IS NULL or IS NOT NULL, but never = NULL or <> NULL.

It may be unusual to someone coming from a general purpose programming language's notion of null as a (known) missing value, but that doesn't make it wrong. It means you need to reorient your mind toward set theory, where NULL means "unknown" if you're going to work with SQL and relational databases in general.

Folks often speak of the impedance mismatch between relational models and in-memory object models. NULL is one of those mismatches.


> CTEs are quite composable. I might be spoiled by Postgres, but types are quite robust there.

> Replace "NULL" with "unknown" in your head, and the ternary makes more sense. When you're building your schema, does an unknown value make sense in that context? Many times not, and the column should either not be nullable or should be referenced in a different table by a foreign key.

Cool, now how do I do these two incredibly basic things at the same time and make something not nullable or referencing another table in a CTE?


CTEs are quite composable. -- on SQL land.

I wish SQL is more composable


> in particular, its historically limited support for structured data within fields

This is not particular to SQL though, and is the rationale behind the first normal form. Codd argued that any complex data structure could be represented in the form of relations, so adding non-relational structures would just complicate things for no additional power.


The issue is that, to put things in 1NF, you need to fully normalize everything, which has a big performance penalty since every query now has to JOIN a large number of tables together.

Of course, an RDBMS could be designed to do that without a performance penalty, by storing data in a denormalized form and automatically translating queries for the normalized data accordingly.

But SQL doesn't have the features you'd need to control and manage that sort of transparent denormalization. So you'd end up having to extend SQL to support it properly so that the performance penalty in question could be mitigated in all cases.

edit: Rather than "you need to fully normalize everything," I should have said "you need to split all your data across multiple tables to eliminate the need for structured data within records." The performance penalty happens when you need to do this everywhere for sufficiently complex datasets.


I doubt querying JSON or XML or some other embedded structured format would be faster than querying normalized data. It might be true in some special cases, but certainly not in the general case of ad-hoc queries across nested data structures.

But I totally agree SQL could be improved to make normalization feel like less of a burden. It really highlights a problem when it feels like its more convenient to just dump a JSON array into a field rather than extract to a separate table.


> The issue is that, to put things in 1NF, you need to fully normalize everything

Not unless you redefine “fully normalize” you don’t.


The boundary between "simple" and "complex" data structures is largely arbitrary, though. It's not unreasonable to consider an integer as a complex data structure, consisting of bits, and in some contexts we do that - but in most, it's obviously more convenient to treat it as a single value. In the same vein, it should be possible to treat a tuple of several numbers (say, point coordinates) as a single value as well, in contexts where this makes sense.


In the context of the relational model, simple and composite refer to how they are treated by by the relational operators. You can't select individual bits of an integer (without special-purpose operators) which means an integer is a single value.

Presumable an integer is physically stored as a set of bits, but this is not exposed to the logical layer (for good reasons - e.g. whether the machine uses big endian or little endian should not affect the logical layer). If you actually wanted to operate on individual bits using relational operators, you would use a column for each individual bit.

Having XML or JSON fields is also totally fine according to the relational model as long as they are "black box"-values for the logical layer. But Codd observed that if you wanted to treat individual values as composite you end up with a much more complex query language. And indeed this have happened with XPath and JSON-queries and whatnot embedded in SQL. Presumably it should then be possible to have XML inside a JSON struct, and a table inside the XML. If this is even possible, it would be hideously complex. But normalized relations already allows this without any fuss.


This is an arbitrary distinction in the first place. You can absolutely have a relational database that only allows 0 and 1 as valid values and requires you to build integers up yourself using relational techniques. The relational model itself doesn't care what "values" are.

In practice, simple value tuples make things much more convenient, and the edge cases are minimal. You don't have to allow full-fledged composition like nested tables etc.


> The relational model itself doesn't care what "values" are.

Maybe I misunderstand what you are arguing, but the relational model is defined in terms of values, sets, and domains (data types), but of course the domains chosen for a particular database schema depends on the business requirements.


Codd argued this a long time ago. Unfortunately Codd died twenty years ago, so we can't ask him his current thoughts on the matter. On the other hand, Chris Date dropped this rigid view of the relational model way back in the 1990s.

https://www.youtube.com/watch?v=swR33jIhW8Q


The question isn't who said what when, the question is what reasoning holds today. Codds argument was simply that if we allow tables embedded inside fields and we want to query across multiple "layers" of embedded tables, we get a much more complex query language and implementation for no additional benefit, since the same relationship can be represented with foreign keys. I haven't seen any reasonable counterargument against this.

If I understand Date correctly, he is just saying that individual values can be arbitrary complex as long as they are treated as "atomic" by the relational operators. I don't disagree, but reality is that very soon after you start storing stuff like XML or JSON in database fields, someone wants to query sub-structures, e.g. filter on individual properties in the JSON. And then you have a mess.


Also, caching can be done with materialized views


IIRC in Postgres you need to refresh an entire materialized view all at once, effectively recreating the entire table; you can't just have it update incrementally whenever the underlying data changes.

I think SQL Server can do this, but...then you have to use SQL Server.


> I think SQL Server can do this, but...then you have to use SQL Server.

Yes, SQL Server can update indexed views incrementally, but there are severe limitations:

https://learn.microsoft.com/en-us/sql/relational-databases/v...

If memory servers, indexed views have been in SQL Server for 20-odd years, and haven't seen meaningful improvements in all that time. We still can't do a LEFT JOIN, or join the same table more than once or MAX etc...

The same story with T-SQL, which is firmly stuck in the '80s (not that other databases are better).

There are some extremely powerful features in SQL Server that can be used effectively with some pain, but they could be so much better if Microsoft invested in fully fleshing-out their potential instead of chasing the latest buzzword.

Sorry for the rant.


Yep, which is why materialized views don't tend to work great for a lot of tasks where they initially seem to be the most natural implementation, particularly almost any view that's a feed or aggregate of data in the system over all time. It's so easy to just play with a simple SQL select query until you get it working, then throw it into a materialized view. It'll probably even work for a long time! But as soon as the data in the system grows and that refresh starts getting slower, you're stuck with a (potentially tricky or at least frustrating) migration to another implementation (maybe something like event sourcing).


I think it's being worked on for postgres but probably another major release or two away, quick Google came up with this https://pgconf.ru/en/2021/288667 but I'm sure I've come across discussion in postgres mailing lists / wiki in the past.

Would definitely be a nice feature to have, without it I find the main use case I have for materialised views is batch processing where you want to prepare a complex result set and then stream process it in a Cronjob or similar


There’s also info on this page about it (incremental materialised views).

https://wiki.postgresql.org/wiki/Incremental_View_Maintenanc...

In the meantime my preferred technique is to have a column where I stamp the last generation time for each row and then I rebuild anything that’s changed since then (assuming all your source data has some sort of last updated stamp).


Also computed columns in a covering index.


> It's quite another when experienced seniors ban the use of SQL features because it's not "modern" or there is an architectural principle to ban "business logic" in SQL.

While I agree with the idea of pushing heavy compute to where the data resides, I wholeheartedly disagree with the statement quoted above. Concentrating business logic in SQL makes it effectively untestable. Its not easy to "compartmentalize" SQL code such that each individual piece is testable on its own. Often, this lets major issues go unnoticed until the SQL query executes in production on some unexpected input data and people are woken up at 3AM. Adding on top of this the fact that SQL exceptions are a PITA to debug on a normal day, and it probably isn't any easier at 3AM


Are you talking about stored procedures? SQL itself is super easy to test: insert data into tables, run query, compare output. It's also easy to test queries on production data since every db has a repl. Queries are also mostly composed of read-only, referentially transparent parts so it's super easy to take snippets and test/run them in isolation. For complex updates you join the table you want to update to a read-only query with all the logic to calculate the new values.

SQL is probably the easiest language there is to write well tested, composable, easy to debug code.


> Queries are also mostly composed of read-only, referentially transparent parts so it's super easy to take snippets and test/run them in isolation

True enough, but now you need to ensure that the snippets copied out into tests are in sync with the in-line versions embedded inside your 6 screen long sql query.


In mssql you have "inline tabled valued functions" which you can declare and reuse, and they are inlined and optimization happens across them. We use sqlcode [1] to get around weaknesses in deploying stored functions.

Big drawback though is that functions can only take scalar arguments, not table arguments.

Another method is to just have a bunch of CTE clauses, and append something different to the end of the CTE depending on which parts of it to use (i.e. some string assembling required).

[1] https://github.com/vippsas/sqlcode


Easy to debug and write well-tested - mostly yes. Composable is tough though (make sure all your table aliases are unique, that there are no unambiguous column names, that all your ANDs are in place and don't forget that 1=1 to make it easier to uncomment parts. You already need a somewhat sophisticated query builder just to compose multiple JOINs cleanly. Recently I needed to get a query builder-ish thing going with arbitrary number of conditions which should be done as JOINs and the most I could muster to keep it at least a little manageable was a "pkey IN({literal_subquery})". It does compose as in "yes you can do it" but I wouldn't say it composes very conveniently.


Excuse me, but can you show me where in the SQL language documentation examples I can find how info about its unit test harness framework, teedeedee best practices, mocking, shimming, and dependency injection? And that's just the bare minimum but I start with those as the first thing to learn in any new language.


At my first real programming job, I found an error in the SQL functions that the company had written. After pointing it out, the CEO bet me that I couldn't fix it. Apparently their best programmers had tried and failed.

I did end up fixing it, but it took me a couple weeks.

Beyond being untestable, it's also hard to version control and roll back if there's a problem. We had some procedures around SQL updates that were kind of a pain because of that.


Regarding SQL being "untestable"...

I have put SQL DDL+DML+stored procedures in version control, create/run stored procedure (TDD) unit/integration tests on mock data against other stored proceedures, had pass-fail testing/deployment in my CICD tool right alongside native app code, and done rollback, all using Liquibase change sets (+git+Jenkins).

Using Liquibase .sql scripts for version control isn't hard. Testing is always more-work but it's doable.

I don't completely disagree with you on rollback though as hard, at least full pure rollback-from-anything. Having built tooling to do it once with Liquibase I found the effort to guarantee rollback in all circumstances took more effort than it was worth. A lot of DDL and code artifacts and statements like TRUNCATE are not transaction safe and not easy to systematically rollback. Liquibase did let you specify a rollback SQL command for every SQL command you execute so you could make it work if you had the time, but writing+testing a rollback SQL command for every SQL command you execute wasn't worth it and is indeed materially more effort than just rolling back to earlier .war/.jar/.py/docker/etc files. (The latter are easier in part because they are stateless of course.)

In any case, something like Liquibase can get you a long ways if you have the testing mindset. (Basically it lets you execute a series of SQL changesets and you can have preconditions and postconditions for each changeset that cause an abort or rollback.)


If you mean 5+-page SQLs are untestable, I don't disagree. Like any code you need to break it up once it gets past a certain size.


> Like any code you need to break it up once it gets past a certain size.

And SQL doesn't give you the tools to do that. There's no easy or natural way to split a statement into smaller parts.


SQL doesn’t give you tools to break up code?? Err, I don’t agree there. I can think of four options just off the top of my head.

Simplest is to use WITH clauses (common table expressions/CTEs). They can help readability a lot and add a degree of composability within a query.

Second, you can split one query into several, each of which creates a temporary (or intermediate physical) table.

Third, you can define intermediate meaningful bits of logic as views to encapsulate/hide the logic from parents. For performance you materialize them.

Fourth, you can create stored procedures which return a result set and like any procedural or functional language chain or nest them.

These techniques are available in most databases. More mature databases support forms of recursion for CTEs or stored procedures or dynamic SQL.

As with most programming, proper decomposition and naming helps a fair bit.


> Beyond being untestable, it's also hard to version control and roll back if there's a problem.

Recently, I've learned that SQLServer supports synonyms. So you version functions / procedures (like MySP_1, MySP_2, etc...) and establish a synonym MySP -> MySP_1. Then you test MySP_2 and when ready, change the synonym to point to MySP_2. Of course, all code uses just the synonym.


We use sqlcode as a more general solution to this:

https://github.com/vippsas/sqlcode


I'm not sure where this idea comes from. There are unit testing frameworks for T-SQL and pl/sql. Stored procedure code can be version controlled like any other code.


We care A LOT about testing everything, always.

We use SQL container [edit: new, freshly cloned DB for a test function in less than a second] and find it to be no problem in practice to write integration tests for our pieces of Go code that calls the SQL queries with some business logic in them.

(No, we don't do a sprawling mess of stored procedures calling each other. We just try to not move data over to backends ubless we really need to)

If you really need complex flow, making connection scoped temp tables for temporary results in SQL and having the composability/orchestration through backend functions calling each other and passing the SQL connection between them is doable.

Yes you cannot unit test every small line of SQL, but since SQL is much higher level that isn't really needed. Test the functional behaviour / inputs/outputs and you are fine..

It really isn't different for writing for a GPU in a sense.


> Concentrating business logic in SQL makes it effectively untestable. Its not easy to "compartmentalize" SQL code such that each individual piece is testable on its own.

What are you talking about? SQL is just as easy to compartmentalize and test as anything else.

Each query statement belongs in a function -- there, compartmentalized. Now set up a table state, run the function, and compare with new table state. The test either passes or fails.

Also no idea why you'd think SQL is a PITA to debug. It's a relatively compact and straightforward language once you learn it, and queries are self-contained. It's generally much easier to debug a query than it is to debug something happening somewhere across 10,000 LOC across 400 functions.


It's totally possible to validate data before pushing to a table, and have normal tests around sql massaging it. Saying this as somebody looking at 1000s of transformations done by my teams.

Admittedly, It took a while for data engineers in the industry to accept these practises though.


PS but stored procedures and udss are evil, yes


A lot of people are not comfortable learning languages beyond the ALGOL-like paradigm. SQL's building blocks are incredibly odd if you're used to the idea that work gets done via variables, conditionals, and loops.

Personally I think it's a wonderful (if imperfect), ultra-powerful, and easy language to learn. But if it doesn't click for you and you're under the gun at the job, I bet it's very easy to develop a bad attitude towards SQL.


"SQL's building blocks are incredibly odd if you're used to the idea that work gets done via variables, conditionals, and loops."

They're even weirder when you get into stored procedures, where SQL statements are your very un-ALGOL-like elementary statement, but then in between them you have a procedural language operating on the results, except when the optimizer figures it can "see through" your procedures to get back to something it can optimize through. And the "declarative" nature of them makes understanding cost models a challenge sometimes. You need a deep understanding of how the database works to get the cost model out of your stored procedure code.

Very powerful. I don't do a lot of deep database stuff but I have a couple of times turned something that required an arbitrary number of thousands of back-and-forths with the DB taking seconds from the application code into a one-shot "send this to the DB, get answer back about a millisecond later" using them, and you can end up with performance so good that your fellow developers literally won't believe it's a "conventional stodgy old relational database" blowing their socks off. But it's a weird programming model.


I'd also add that "doesn't click” is sometimes confounded by a respect gradient. With SQL (also CSS) I've seen some people pick up an attitude that those aren't Serious Languages worthy of their time or respect where they avoid learning the fundamental concepts, have problems, and then say the language is too hard or old fashioned to use. I've seen people write many thousands of lines of Java because nobody pushed back on that mountain of fragile code telling them “maybe you should take a day and really get up to speed with how SQL works”.


> A lot of people are not comfortable learning languages beyond the ALGOL-like paradigm. SQL's building blocks are incredibly odd if you're used to the idea that work gets done via variables, conditionals, and loops.

Yes, although somewhat amusingly I've found that non-programmers who have never learnt an imperative programming paradigm tend to find SQL a lot more intuitive than ALGOL-style programming languages.


I'm curious about resources to learning SQL well.

In our company I've done a series of teaching session (we're on about 10th hour now). It's working OK, but wish I knew of good material / blog posts to point at. I feel like there's no good community to learn from like for many backend languages.

E.g. after learning about "cross apply" (T-SQL; "lateral join" in postgres) everything got 10x easier to express in SQL. But: How is a developer who's just getting started in SQL going to know that?

And where is the community that can teach a backend developer getting started with SQL to ignore some of that advice from the DBA and Data Analytics communities? E.g. the advice to "not pin indexes" -- which I believe is 100% wrong advice for a typical backend application where reproducability across environments is key, and where any query not supported directly by an index is probably a bug anyway.


"E.g. after learning about "cross apply" (T-SQL; "lateral join" in postgres) everything got 10x easier to express in SQL."

I feel like even in the past few years the database community has still been learning about what you need the databases to be able to do in order to make good code. I personally don't like the "declarative" memeset and think it set the community back literally decades, but with recent Postgreses (by which I mean, the whole last five years or so... lots of people still running older things) all the functions and functionality is technically there to arbitrarily convert between rows, arrays, columnsets, etc., and more and more you can use them arbitrarily as well, so you can JOIN against a columnset you bodged together from two other queries that you pulled into an array and then put that array into a columnset, without it having to ever be turned into a full "table". Cross apply is another example of that, where IIRC a row can be turned into multiple rows.

The problem is that while all the functionality I've wanted on this front does now seem to exist, it's all incredibly haphazard. Cross joining is an SQL keyword, but arrays look more like a data structure, and I can't remember what all was going on but I recall having more hassle turning arrays into columnsets for some reason. If I were going to be doing this full time I think I'd build myself a matrix cheat sheet of how to convert between all these things, and I bet there's still holes in the matrix even today (is there an opposite of a cross join? dunno, but I wouldn't be surprised the answer is "no").

I feel like I'm doing a lot less "work around things missing in SQL (that I have access to)" than I did 15-20 years ago, but rather than a cohesive and well-organized toolset for dealing with all these things, I've got a haphazard set of Bob's Custom Tool for This and A Semi-Standard, Modestly Extensible Tool for that, neither of which were ever designed with the other in mind, and yeah, in the end I can do everything I want quite nicely but it's up to me to notice that what this tool calls a 1/8 inch quartzic turns out to be the same as a Number Seven smithnoczoid and so in fact they do work together perfectly despite the fact the documentation for neither of them suggests that such a thing is possible, etc.


Mode's SQL school is the best I've seen: https://mode.com/sql-tutorial/


And when it does click for you all the other things that do not involve for/while loops (dicts, map/reduce/group by) become very obvious.


This depends on use case. SQL is the king for batching process - queries are declarative, decades of effort put into optimization.

For real-time / streaming use cases, however, there is yet a mature solution in SQL yet. Flink SQL / Materialize is getting there, but the state-of-the-art approach is still Flink / Kafka Streams approach - put your state in memory / on local disk, and mutate it as you consume messages.

This actually echoes the "Operate on data where it resides" principle in the article.


We do mini-batch processing in SQL. Some hundred milliseconds latency, some hundred events consumed from to the inbound event table per iteration. Paginate through using a (Shard, EventSequenceNumber) key; writers to table synchronize/lock so that this is safe.

Kafka-in-SQL if you wish. Or, homegrown Flink.

(There are many different uses for the events inside our SQL processing pipelines, and have to store the ingested events in SQL anyway)

I am sure real Kafka+Flink has some advantages, but...what we do works really well, is simple, and feels right for our scale.

It is enough batching in SQL to real speed/CPU benefits on inserts/updates into SQL (vs e.g. hitting SQL once per consumed event which would be way worse). And with Azure SQL the infra is extremely simple vs getting a Kafka cluster in our context.


Do you find flink SQL immature? For me it looks a lot like syntactical sugar on top of the datastream api.

Same thing, less code?


I've met so many developers who seem to be willing to do anything, except use sql. I remember one case where we were dealing with clearly relational data. One of the senior developers was adamant that we use a non-relational database. When pushed as to why, he said it was because it would have better performance. I pointed out that this for a process where data would be sent out and we wouldn't expect to receive it back for 3 days or so. A few milliseconds performance boost was hardly beneficial over 3 days. But he was insistent and was senior, so we did it. Shockingly, it turned out to be the wrong decision. Later we learned that the senior developer didn't like sql.


> I think a lot of blame is with the database vendors. They only targeted some domains and not others, so writing SQL is something of an acquired taste. I wish there was a modern language that compiled to SQL (like PRQL, but with data mutation).

There's EdgeQL https://www.edgedb.com/blog/we-can-do-better-than-sql#lack-o... which I like the look of - but as it adds one more layer over a database I haven't used it yet.

Edit: I hadn't seen PRQL before, reading the site now https://prql-lang.org/


I have run into similar attitudes. In my case, it's an unfamiliarity with SQL, fear of the database, and a desire to utilize strongly typed ORM's for everything. Implementing these sorts of queries often requires using raw sql which is seen as taboo by such people. They think it's unsafe. Meanwhile we're pulling more records than we need, searching them, then firing off more queries in a for loop.


How easy is it to verify that the configuration of your database matches a checked-in configuration or source file these days? My beef with a lot of installed procedures in a SQL database comes down to deployment and rollback difficulty.


For Postgres views, materialized views, functions and triggers I've made a tool that can do that, DBSamizdat:

https://sr.ht/~nullenenenen/DBSamizdat


Depends on the DBMS, but pt-config-diff works well for MySQL.

edit: This will only help for configuration, not procedures.

https://docs.percona.com/percona-toolkit/pt-config-diff.html


I feel this has gotten worse with the latest round of data science graduates wanting everything in python.

Probably just perspective. I ducked out of the push for everything to be in Hadoop. And while I can appreciate the foot gun that is indexing everything so that ad hoc queries work, I also have to deal with folks thinking elastic search somehow avoids that trap.

I think I've seen the same claims and push for graphql. :(


This, a thousand times this. A hundred lines of Java/Python/C# can save you at least 10 lines of a stored proc :) Also why don't they teach SQL in most schools???


> Yes we CAN do that but why do that it's 15 lines of SQL and 5 seconds compute -- instead of a new microservice or whatever and some minutes of compute.

This works until your database falls over in production. Recently someone started appending to a json field type over and over in our production database. And then on some queries, postgres crashed due to lack of memory. The fix was to remove the field and code that constantly appended to it, and do something else.

No, the database should not be the answer to all your data problems. Yes a microservice may be the best answer. But for structured data and queries that can run with normal amounts of memory the standard SQL DB is fine.


> No, the database should not be the answer to all your data problems. Yes a microservice may be the best answer.

Or to put it another way: No, microservices should not be the answer to all your data problems. Yes, using the DB may be the best answer.


In one case it might just be slow. In the other your db will fall over. Unfortunately, you may find the answer when your production environment fails.


If you have a single DB instance in production, something has already gone wrong.

Read replicas.


> Recently someone started appending to a json field type over and over in our production database. And then on some queries, postgres crashed due to lack of memory. The fix was to remove the field and code that constantly appended to it, and do something else.

So your choices are doing something manifestly non-optimal in the database in violation of best practices or not use the database for non-trivial business logic?

Sounds like a false dichotomy to me. Maybe just find a better solution?

I mean, if someone writes an O(n^3) algorithm in Python, is the solution to use a better strategy or to swear off Python for anything non-trivial?


> I wish there was a modern language that compiled to SQL (like PRQL, but with data mutation).

That is nice, but much better will be language that is relational itself.

I'm working on one (https://tablam.org) and you can even do stuff like:

for p in cross(products, qty) ?limit 10 do print(p.products.price * p.qty) end

The thing is be functional/relational only is too mind-bending and is very nice to work with procedural construct.

BTW my plan is that the query (?) section will compile to optimal executions defined per storage engine (memory, sqlite, mysql, redis, etc) `products ? price = 10.0` is executed on the server, not on the client.


Interesting! Have you looked at Datalog? In many ways it's the ideal purely relational language.


Yes. I think is not friendly for end-users* and making this on Rust make it easy to plug-in iterators/generators so i go that way.

(Also never wrap my head about how use it internally!)


IME SQL and relational databases have the framework problem, where they expect to be used in a particular way and won't let you use the internals. It's like when a language (ADA?) famously built a high-level version of concurrency into the language, where you were meant to use their "rendezvous" (I've probably mistaken the same), and instead what happened is that programmers implemented mutexes on top of this "rendezvous" and then reimplemented high-level concurrency on top of that.

What we need is databases that expose more of their internals, that are designed to be embedded in applications and used as libraries. All serious databases use MVCC these days, but none of them expose it to the user. All serious databases separate updating the data from updating the index, but few of them expose that to the user. All serious databases know the difference between an indexed join and a table scan, but good luck figuring it out by just looking at an SQL query. Etc.


Oh this reminds me a lot of a familiar kafka vs SQL battle.

Yes it is possible to do it in kafka, but everything is obscured in the sense that you can't peek at what you are doing, you spend your precious CPU on serializing/deserializing and things like backfill is a mess.


I find the opposite; SQL databases work a lot like Kafka underneath, but hide it from you, burning all your CPU to generate this illusion of a globally consistent state of the world that you don't actually want or need and doing everything they can to avoid ever showing you what they're actually doing.


Hold up. I need some clarification here. SQL databases are opaque with regard to resource usage but Kafka is completely transparent in this regard? That is your personal experience and assertion?


Yes. I once saw an outage caused by an SQL database server collapsing because of a query that had been issued 23 days earlier. I saw another one gradually get slower and slower over a period of weeks because a data scientist had left a command prompt open and forgotten about it. Kafka brokers are a lot more consistent and predictable IME.


> "why can't just just put your millions of events into a service bus and write some backend to react to them to update your aggregate". Yes we CAN do that but why do that it's 15 lines of SQL and 5 seconds compute

Do people actually do this? I feel like this is classic Occam's Razor. This is one of the biggest things I do in SQL and I couldn't imagine the time and effort to do it with a separate service.


I interact with SQL on a daily basis but never write a single query by hand. It's all abstracted by the Django ORM. I do have to be mindful of what I do of course but mostly I just plow away with the abstractions offered. Once in a while I have to take a look at the SQL or the query plan but those are few and far between.


Sure, but there's a world of difference between someone who knows SQL and uses a ORM for productivity and someone who uses an ORM because it's the only tool in their box.


Exactly. Effective use of an ORM requires knowledge of the ORM's API in addition to SQL, not in lieu of it.

Using an ORM without a firm understanding of SQL is a recipe for disaster (or at least glacial performance).


I've seen both extremes of this.

On one end, you have applications that execute thousands of SQL queries for each page load, for populating a table with some data or something like that, which has bunches of rules for what should be displayed, all implemented as nested method calls (e.g. the service pattern) in your application. It's a performance nightmare when you get more data, or more users.

On the other end, you have an application where your back end acts just as a view for a DB that has all of the logic in it. There will rarely be proper tests for it. There will rarely be any sort of logging or observability solutions for this in place, the discoverability will be pretty bad, debugging will often be really bad, versioning of changes will also be pretty awkward, but at least the performance will typically be okay.

Just look at the JetBrains Survey from 2021: https://www.jetbrains.com/lp/devecosystem-2021/databases/#Da...

  Do you debug stored procedures?
  47% Never
  44% Rarely
  9% Frequently
  
  Do you have tests in your database?
  14% Yes
  70% No
  15% I don't know
  
  Do you keep your database scripts in a version control system?
  54% Yes
  37% No
  9% I don't know
  
  Do you write comments for the database objects?
  49% No
  27% Yes, for many types of objects
  24% Yes, only for tables
If something that most would consider to be a "good practice" isn't done, then clearly that's a bit of a canary about the state of the technology and the ecosystem around it. Consider that databases are very important for most types of systems, and yet about half of people don't debug their stored procedures, most people don't have tests for them, only about half version their scripts and about half don't bother with comments.

I'm pretty much convinced that it's possible to write bad software regardless of the approach that's used. In my mind, the happy path for succeeding in even sub-optimal circumstances is a bit like this:

  - make liberal use of DB views for querying data, if you have a table in your app, it should have a matching DB view, which will also make debugging easier
  - make use of in-database processing only when it makes a lot of sense and anything else would be a horrible choice (e.g. ETL or batch processes/pipelines), have log tables and such regardless
  - for most other concerns (e.g. typical CRUD), write app logic: most back end languages will have better support for observability and tracing, logging and debugging, as well as scaling for any expensive operations
  - still, be wary of the N+1 problem, that might mean that you don't have enough views, or that you're not using JOINs for your queries properly
  - also, look into using something like Redis for caching, S3 (or something compatible, like MinIO) for binary blobs and something like RabbitMQ for task queues, just because you can shove everything into the DB doesn't mean that you should, sometimes these specialized solutions will have better support and more standardized libraries, than whatever you can concoct


Yep. Folks treat relational databases like dumb bit buckets. They assume they are limited and therefore treat them as limited despite the reality of modern SQL. It should not be a surprise to find that most devs skip modern development practices with it as well.


The main reason to do this is so they can be turned into streams that and have the source and, potentially, multiple destinations decoupled. If the source and destination will always be the same, sure, do it in the DB.


I couldn't have put it better. Use the SQL, Luke.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: