The NoSQL data model - please read as document oriented - is fundamentally flawed because it forces you into denormalization without good reasons like performance optimization but with all the disadvantages.
The only good reason not to use a relational database is that they suck at modeling complex entities and their relationships. Shattering nice entities into small relations to get them normalized, adding join tables to model many to many relationships and all that fun. Everything that is not roughly tree-like is just painful.
More than once I finally decided to just stuff the really complex pieces into a blob - XML or what ever worked best - and just deal with it at the application layer. Not pretty, not fun, but still less painful than modeling and dealing with it in the relational model.
> The NoSQL data model - please read as document oriented
These few words are pointing to a fundamental problem I'm already getting annoyed about, as little as I know about "NoSQL" databases so far: The term NoSQL-database is flawed, just as Fowler says in his introduction talk. Theres's document-oriented databases, and array-ish data storages (column family-oriented), there's graph based databases - and they have very, very different strengths.
The article falls flat on that problem as well. Some data is not relational and you are correct that there is a strong overlap between document based storage systems and relational storage systems. However, I'm dealing with time series data, or with lock-graphs. I could shoe-horn those problems into a relational database, but after tinkering around with better storage engines for my problem... I'd call myself stupid for doing so.
So, as the article says: Use the right tool for the job. And stop calling the tools "SQL" and "NoSQL", those terms are useless.
I completely agree, the right tool for the task is absolutely important. I would never say NoSQL stuff is bad per se [1], but I criticize its use where it is obviously a bad choice and only used because it is the new shiny thing.
[1] Actually I said that in my first comment, but the wording is just bad and probably way to sensationalistic. What I really wanted to say is that it is a really bad fit for modeling you usual business with user, orders, products and the like.
In HTML5 we are stuck with NoSQL (IndexedDB), Mozilla refused to implement WebSQL.
Safari, Chrome and Opera implement WebSQL using the public domain SQLite library. This means WebSQL is available on almost all smartphones/tablets and on the majority of PCs.
Btw. Mozilla implemented their "NoSQL" IndexedDB on top of SQLite in the first place. Various features like bookmarks are also stored in a SQLite database. As well as in FirefoxOS.
Serious question -- why do you need SQL in the web browser? It seems like for the amount of data a person is likely to store locally, it's not worth it. But I'm open to being proven otherwise.
The project is still in beta, so the database will most likely grow with more tables, easily linking into the existing episodes table with an actors table, for instance.
Short: WebSQL enables more powerful web applications. It allows some online features to be used offline.
It's not about SQL it's about a proper relational database with tables, indexes, ACID, etc.
You may remember Google Gears (http://en.wikipedia.org/wiki/Google_Gears ), even back in 2007 one could use the Google Mail (GMail) web interface during a flight, train ride, etc. to read mails, write mails, edit drafts, etc. As soon as the connection had reestablished the data synced between client and server. Later Google rewrote it to HTML5.
There was a trend to move everything to the "cloud". But with the arise of peer-to-peer web applications (WebRTC, etc) and thanks to the bad NSA news we need good client side cache and data storage APIs.
I haven't seen a SQL library as shim on top of IndexedDB. It would be pervert anyway as IndexedDB is implemented on top of SQLite in Firefox.
I would like to ask Mozilla to finally implement WebSQL. With a relational database, more powerful web applications with optional offline support would be possible.
I would immediately support this. This conversation needs to be started again. I'm gonna ping some people on twitter.
I've just searched for a WebSQL on IndexedDB adapter yesterday and it's certainly possible, but it's a nightmare from the perspective of any sane person. This polyfill I found runs SQLite compiled with CLANG to javascript, which means running 2MB of javascript to support something that they already support 'under the hood'. I refuse to work with that.
Great. We would need a blog post that lands on the HN front page and some other HN readers that are equal minded. Or a contact to Mozilla devs.
There is also a similar one called SQL.js - SQLite compiled to JavaScript through Emscripten (probably a big library too): https://github.com/kripken/sql.js
I wouldn't say the model is fundamentally flawed. Looking at the roots of NoSQL (which I believe was first popularized by Google's BigTable), the original intent of a NoSQL database is to allow redundant storage of a lot of data on a DFS.
They're meant for situations where you have data that doesn't fit well into a relational database model, and you need to run BIG queries (think PageRank) over the entire data in a fault-tolerant way. For instance thinking about PageRank, how many discrete data elements are in a single web page? This is "higher-order" data that in a relational database might have hundreds or thousands or tens of thousands of columns per row.
Of course there are also in-memory transactional databases called NoSQL databases which are meant to be replacements for the traditional relational data store, but those are a very different beast from the original NoSQL data model.
That was bad and sensationalistic wording from my side. I should have said more clearly that I meant this in the context of modeling common business domains, not for specialized tasks that don't fit a relation model efficiently.
I think there is another very good reason to not use a relational database. You want to build a prototype really, really fast and your data model and organization is gonna be changing constantly. In my experience I have been able to prototype applications much faster using NoSQL solutions. I think a good approach is to "sketch" your app with NoSQL and once things start to stabilize move to an RDBMS.
This is what tooling is for: Define your entities, and generate a wrapper around a datastructure is for, just like you would do with Doctrine and PHP.
Change your entity? Regenerate the wrapper and generate a migration.
Problem still is: Nobody is writing tooling for WebSQL. Everybody just seems to automatically grab NoSQL, throwing away 30 years of built up software developing knowledge, including the golden rules of Database Normalisation: Don't duplicate your data, and compartmentalize.
If you want to use a purely document oriented model and not have redundancies then you have to partition your object graph into a set of documents that do not overlap and have no relationships between them. I can not think of any real example that fits this model.
Start anywhere, for example, pick your users and create a document for each. Now you take the orders of each user and also put them into the document belonging to the respective user. Fine, that works. Now we come to the products they ordered, and now we have a problem. Unless no product has ever been ordered by two different users we have to duplicate product information and put it into several different documents. And there is the same problem within documents if the user repeatedly ordered the same product.
All interesting problem domains just don't fit the idea of piece of JSON without references, internally or externally. There is of course support for reference in document oriented databases, but then lets just call your documents rows in a table and your references foreign keys and we are almost back at the relational model besides that documents still have a richer model, a bit like a NF² relational model or database with deep support for XML or JSON columns.
> Now you take the orders of each user and also put them
> into the document belonging to the respective user.
So the database doesn't force anything. The data modeler makes a questionable decision.
> All interesting problem domains just don't fit the idea
> of piece of JSON without references, internally or
> externally.
These databases won't prevent data modeling using references, they just often don't provide as much support as a relational database does for doing joins or enforcing key constraints.
But that's my point - either you use documents in the way they are advertised and get redundancy or you normalize into many documents with relationships between them and then your documents just degenerate into entities.
I am not saying there are no usecases where a document oriented model fits well, but I argue that it is not a good fit for modeling your average business.
I'm guessing that "the way [documents are] advertised" refers to some of the "hello world"-style examples that are found in various blogs. I'll agree that in their enthusiasm for showing how easy it is to use put data into and retrieve data from a document store (no O/R mapping!) they assume that the reader is well-versed in data modeling. This has bitten numerous inexperienced developers. I've never seen an article recommend against data normalization though.
I'm with you here. I've found that even phones have enough juice to handle most manipulation, and the website appears more responsive because the server isn't bogged down.
One thing that is not easy to do with a relational model, is offline synchronization for high-availability on mobile devices. The document model was built for this, I find it hilarious that so few of the NoSQL databases are taking advantage of the latent capability.
It is not hard at all. people just hadn't built as much of the tools for it. Which also confuses me because we now have package managers that run on package managers.
My own little WebSQL Object/Relation mapping library uses promises to get away from callback hell, and it's a pure joy to work with.
It handles creating tables and inserting fixtures before the first connect promise is returned, so that you don't have to worry about the setup process and your data being available.
Creating and inserting entities is as easy as
> var P = new Presentation();
> P.set('name', 'test');
> P.Persist().then(function(result) { console.log('Done!') });
I have think about use it, but still it fail if the data is relational. I have a invoice app, and none of the NOSQL I have evaluated guarantee the order of the updates or have atomic updates (need both). Is not fun if a invoice update partially or the related tables get out of sync.
In the other hand, I'm stuck in how sync with a relational model :(
So, it sync well and broke my relations and transactions or it sync hard but keep my data correct.
Typically, the best you can hope for out of your data is that it is a directed graph. Relational databases exist in a sweet spot of efficiency, widespread availability, and fairly decent ability to represent directed graphs.
But you need to know the shape of your graph first. The document-store model tends towards giving up the graph entirely and just going with a tree. Us programmers love tree structures. They make things nice and tidy and can be traversed in finite time.
But I've never worked on a project that could easily be modeled as a tree. Unfortunately, this was often realized after I had been working on the project for a year. You know how phenomenally fucked you are having a tree-like data structure when what you really need is a directed graph? You're basically all the fuckeds.
Directed graphs generalize trees, so trying to shoehorn a tree where a directed graph is needed is basically bringing a Fischer Price tool set with you to Habitat for Humanity.
Thus, from experience, I never start with a tree now. I always start with a directed graph. If I learn that the data is a tree after a year and not actually a directed graph, oh well, look at all this flexibility we didn't need.
Now can somebody convince the W3 and Mozilla that there's nothing wrong with implementing SQLite next to IndexedDB so that I can move forward without having to write an IndexedDB adapter for my clearly relational TV Show -> Season -> Episode data please?
Foreign Relationships on foreign keys are not difficult
Many to many Relationships are not difficult
SQL is not difficult
Joining is not difficult
Grouping is not difficult
Now try to do that in IndexedDB / NoSQL, and suddenly you're in a world of hurt. It can work, but can it perform? Maybe. With time and patches. So wham, let's throw the option of implementing the probably most well tested piece of software (in the universe probably, see http://sqlite.org/testing.html ) out of the window.
> Now can somebody convince the W3 and Mozilla that there's nothing wrong with implementing SQLite next to IndexedDB
There's nothing wrong with implementing SQLite next to IndexedDB, and its not like W3 is going to send the Internet Standards cops to bust the browser vendors that have.
OTOH, there is something wrong with standardizing "whatever the version of SQLite happens to be used in the most recent version of Chrome happens to do" in a W3C spec. Speccing out an API and a specific supported subset/dialect of SQL for WebSQL that could support multiple independent compatible implementations would be appropriate (and it could even be based closely on what a particular version of SQLite does) -- but no one involved was interested enough in doing that to actually, well, do it, and that's why WebSQL ended up in limbo.
I really don't understand this argument. They were able to throw up IndexedDB support from scratch in a jiffy, but because something that actually works on nearly all platforms (including mobile!) happens to be Chrome's (??) that is a reason to hold back?
Sqlite has been around for almost 15 years now. It's time to adopt it as a standard. Even if they're not satisfied with it, they can easily put a 'draft' stamp on it and provide people with a working, well-tested way to use it today. Firefox already has the support for it since your internal settings and favorites are also stored in guess what.... SQLite databases!
What's even more confusing is that Mozilla is actually listed on the Sqlite.org webpage as a sponsor, but they refuse to land it in Firefox. All because of hipster politics and creative arguments like the one above.
Sure it can use work. Migrations suck. But it will never evolve any further as long as they refuse to adopt it.
> They were able to throw up IndexedDB support in a jiffy, but because something that actually works on nearly all platforms (including mobile!) happens to be Chrome's (??) that is a reason to hold back?
No, because the proposed spec did not actually specify the behavior in a way that was independently implementatble.
The spec did not specify the supported query language, either by simple reference to a specific version of the SQL standard (which would be problematic, because a complete and correct implementation of any, at least recent, version of the SQL standard is rare, and probably inappropriate for the use case), or by more complex reference-with-identified-exceptions, or by just listing out the supported features and expected behavior.
It was, therefore, not possible even in principle to have mutually-compatible, independent implementations (and all of the existing implementations just did "link to some version of SQlite and do whatever it does".)
> Sqlite has been around for almost 15 years now. It's time to adopt it as a standard.
Its a good, widely used, tool -- and one that is rapidly changing. But the whole point of web standards is to specify behavior in a way which permits mutually compatible, independent implementations. And WebSQLDB didn't do that, and didn't really seem to be progressing toward doing that.
> What's even more confusing is that Mozilla is actually listed on the Sqlite.org webpage as a sponsor, but they refuse to land it in Firefox.
SQlite is -- or has been, at least, not sure if it still is -- used in Firefox. What they haven't included in Firefox is a WebSQL database implementation, because they believe -- and rightly so -- that WebSQL database wasn't appropriate as a web standard, and didn't show any sign of heading toward something that would be appropriate as such a standard.
> It was, therefore, not possible even in principle to have mutually-compatible, independent implementations (and all of the existing implementations just did "link to some version of SQlite and do whatever it does".)
SQLite library is in public domain, has more lines of code test cases than lines of code C library code and its SQL API is stable. Plus the SQL language is well documented: https://www.sqlite.org/lang.html
W3C could also simply just fork SQLite at any time and modify the SQL dialect.
Mozilla and Oracle both are official gold sponsors of SQLite, both companies use the library in their own products though made a lot of effort to create IndexedDB and especially tried to deprecate WebSQL - nice job!
Microsoft has several embedded SQL libraries (JetBlue, JetRed, SQL Server Express) and it would be a piece of cake for them to modify the SQL parser a bit to the WebSQL SQL dialect.
The point is that someone has to actually sit down and define the exact feature set that a WebSQL implementation is supposed to implement, such that a development team could sit down with a copy of the standard and no other code and implement their own WebSQL implementation from scratch.
The fact that SQLite is available is entirely irrelevant to any of this. The W3C can't just say "do it like SQLite does it" - that's not a standard. The WebSQL spec that exists says:
> User agents must implement the SQL dialect supported by Sqlite 3.6.19.
Unless someone actually sits down and goes through the standardisation process to define WebSQL properly, this is not the way to go, or the next thing will be "implement this tag like Firefox does".
May I refer once again to http://sqlite.org/testing.html and the fact that it's running in the real world on millions of devices? This is a non-argument in this discussion! What if IndexedDB has a bug which manifests differntly on differnt platforms? You would refer to the specs, which can be tested.
The SQL standard defines far more than you'd want in a web browser, and I'm not sure anyone actually wants all of it. So someone has to sit down and define which bits to cut out, anyway... and then browser developers have to develop a database engine, which I'm fairly sure none want to do. We barely have any open-source software that implements SQL correctly as it is.
SQlite is -- or has been, at least, not sure if it still is -- used in Firefox. What they haven't included in Firefox is a WebSQL database implementation, because they believe -- and rightly so -- that WebSQL database wasn't appropriate as a web standard, and didn't show any sign of heading toward something that would be appropriate as such a standard.
Istill haven't seen any argument on why it isn't possible to freeze it at some point and agree on the fact that that's already an interoperable standard that works in the field right now.
This is IMO still an the result of a NoSQL biased group of people that decided to go another way for the sake of being cool. SQLite is specced and tested inside out and back and forth (LITERALLY!) You can write a spec on that better than you can write a spec on how IE 6 behaves.
IndexedDB is fine for storing JSON objects, etc. but a relational database with SQL query syntax, indexes, etc. more powerful and means less code to write. With IndexedDB one has to reinvent the wheel to just get basic query features.
WebSQL is not deprecation, the W3C Working Group Note actually says:
'This specification is no longer in active maintenance
and the Web Applications Working Group does not intend to
maintain it further'.
WebSQL is only available in Webkit based Browsers (Safari, Chrome) which means most mobile browsers.
As SQLite is in public domain, no company would "loose their face" if they choose to use it. They could fork off SQLite and change the SQL query syntax (parser) to whatever the W3C finds suitable. https://www.sqlite.org
Mozilla Firefox and FirefoxOS both already ship SQLite for years and can be accessed by its internal JavaScript API. And several Microsoft products already use it anyway (e.g. Forza Xbox games). Microsoft has of course also various other SQL database libraries like MS Access JetRed (http://en.wikipedia.org/wiki/Extensible_Storage_Engine ), MS Outlook JetBlue (http://en.wikipedia.org/wiki/Extensible_Storage_Engine ) and MS SQL Server Express (http://en.wikipedia.org/wiki/SQL_Server_Express ) the SQL backend originally forked off for WinFS for Longhorn (Vista beta). It would be trivial for Microsoft to choose one of its many SQL engines and add it to IE 12. The same goes for Mozilla (just expose the API of SQLite).
One can speculate that a less powerful HTML5 API translates in the long run to more SQL server licenses for Oracle and Microsoft. If the web app devs cannot do the processing & storage on the client side, one has to do it on the server side.
Anyway, I hope that we get an SQL API for HTML 5.x that also Mozilla and/or Microsoft implements. As of now WebSQL works fine in Webkit based browser which includes Safari, Chrome, Opera and includes also 95% of all smart phones.
This is only if you ignore why people moved to NoSQL in the first place.
PgSQL is great for a lot of things, but I would argue that if you're using it you're betting on your overall product/service having some other killer advantage than data processing. The competent wing of the NoSQL crowd are using it in strange ways that enable new classes of product and service that cannot be achieved with PgSQL.
I get the feeling you're comparing NoSQL to what PgSQL was 4-5 years ago. Modern PgSQL has a pretty rich set of tools for handling schemaless data, and combines that with its traditionally strong support for relational data. PgSQL has a direction that cares a lot more about data than doctrine.
Though your last line makes me curious. What is an example of one of these new classes of product and service that can't be achieved with PgSQL?
A lot of the Hadoop users are in the right ballpark. Not necessarily that stack, but that approach to things, and the problems they are attacking, especially graph based data.
My broader point is that if your project fits into pgSQL then you need another unique selling point and the data functions are just an implementation detail of some other aspect of your offering, whereas for many of the people not using that kind of thing their analytics and data systems are their selling point. (That's a backhanded compliment to pgSQL, in that it's easy enough to get right on small systems that any competent developer should be able to manage it, thus reducing the market value though). There is, of course, the blurred line of crazy MySQL deployments, many of which are barely relational.
I'm trying to follow this reasoning. I guess the idea is that if a system has a smaller market share, then it's less likely to be used by your competitors, and can therefore offer a competitive advantage. Is that right?
That doesn't make sense to me because it only really applies when there is a high likelihood of competitors using one product but not the other. Although postgres is doing great, in most markets it's still far more likely that your competitors are using oracle or sql server. So any advantage postgres has -- and I believe there are many -- offers a potential competitive advantage.
For instance, you could argue that data systems are a critical selling point of Heroku, and they use postgres.
I think your point ultimately boils down to: "postgres is not quite at the forefront of certain analytic use cases", which I agree with. It is at the forefront of many other use cases though.
pgSQL does what it does easily enough that it reduces the barriers to entry to such a level for traditional RDBMS workloads (which there are plenty of) that such workloads are simply not economically worth pursuing (especially for startups) except as small components in larger systems where the value add is elsewhere. The "other" world of big data/time series/graphs/nosql is hard to get right, and so is worth more, as if you crack it someone else copying you is decidedly non-trivial, meaning that it alone can form the core of a successful business.
This is a bit like what the web people are trying to do in mobile, where if HTML5 was magically the best cross platform mobile deployment option when we wake up tomorrow the value of mobile developers will collapse, and the web people will then cease to be remotely excited about mobile.
Oh, so it's a barrier-to-entry argument? That makes more sense. The barrier to doing something with postgres is pretty low, so competitors can more easily copy your ideas unless there are more barriers somewhere else.
An interesting point. More broadly: if what you're working on is not hard (and awkward), then others can copy your idea easily. Technology like postgres makes a new class of problems easy, and thus you need to find new problems to solve if you want a sustainable business.
There's a sizable contingent (but still a tiny minority) of developers who will outright dismiss anything SQL and typically view databases as these things you kinda pick on a whim based on whatever is easiest to shoehorn your MVP into. Fancy things like a powerful query language, robust data types, transactions, ACID-compliance etc? YAGNI.
Although I don't share this view, I cant say its always wrong. If you're building simple MVPs to find product-market fit for ideas why bother? If there's a 90% chance what you're building will never evolve to need those features, why invest in them?
But if your "product" ends up being a 10% survivor you better have a plan to move to something more powerful before you compound too much technical debt from your NoSQL database. Once you start scaling your business and have to face competition, NoSQL becomes productivity tarpit. Queries that take 20 minutes to write and optimize in the SQL realm can turn into day-long exercises in NoSQL.
The reality is most developers aren't cranking out disposable software used to test the market for new ideas. We're working on things that already have a place, we just need to make them better. For us, the best option right now is SQL + maybe some specialized databases for certain purposes (Columnar for analytics, graph for relationship analysis, full-text for text, etc)
When I read stuff like this I try reeeeal hard to not immediately dismiss it as @hipsterhacker stereotypes, but weird groupthink like this shocks me.
Keeping data that is by it's nature relational in a relational database is, to my mind, obvious. That it isn't for startups that are building their entire business on data foundations (because it's OLD!) is genuinely mind-boggling to me. I guess I'm the one that's old now.
Probably the same people that call themselves "tech-savvy" and make such fundamental errors in matching their tools with their needs because "fashion".
They're about as tech-savvy as the people that think voting machines are progress.
That is all perfect until your project has matured into 2, 3, or 5 years old and you need to do patches and migrations. Or how about you have to transfer your project to another developer? I bet he/she'll be happy not having a spec of what's in where, when!
Those can just as easily be screwed up in relational databases. For example getting the schema wrong, or only updating it by accretion (adding new tables).
A developer being below average (as half of all developers are), using the wrong tool for the job, not understanding things etc will make a mess. A claim that developers can screw things up is uninteresting (and true).
A claim that there is no scenario in which a "NoSQL" database is the better solution is a lot more interesting, and requires more than "some developers could screw it up" to substantiate.
I love CouchDB because it works directly with web browsers without middleware (and supports awesome replication), but I wouldn't mind structured records like in SQL databases. I end up enforcing schemas using update validation functions anyway. As long as it inputs and outputs in JSON over HTTP, I'd be happy.
Why refer to the database that didn't fit this use case as "the particular NoSQL database that we selected", while specifically mentioning the one that DID fit by name? I know Postgres is great, no convincing necessary :)
Probably avoiding a backlash from partisans of the that db complaining about the bad press and from partisans of other NoSQL dbs complaining that the points in the article don't apply to them.
I think the author is conflating the document-oriented variant of NoSQL with the whole family of databases.
I cannot criticize him for leaving Couch|Mongo if their data is rich in relations, but I would like to read his thoughts on graph databases for instance.
Maybe their data is too large for Neo4J, but for querying n:m relationships, this model can offer advantages.
TL;DR; of my comment: NoSQL is a good optimization sometimes. Don't prematurely optimize. If you don't know what you need, you need an RDBMS.
NoSQL / Document databases got so cool so fast, and were usable with little to no actual know-how that people just lost their minds and used them by default. That was always the wrong decision.
RDBMS are the swiss army knives. They do "all the things". But power, responsibility, etc.
I use various types of NoSQL models for various special purposes, they are great, and should be used, for those purposes. You almost always need an RDBMS. If it's not your gold record (because you need quick writes and can lock a whole document at a time), then you need to replicate to RDMBS for better ad hoc reporting. OTH, if RDBMS is your gold, you may want to replicate to NoSQL for fast lookups.
I'm currently working through a situation where the latter is required. I have hundreds of thousands of records and the need to solve a classic backpack problem (with hundreds of thousands of potential items to go in the pack, each of which can have n instances)... so I've replicated a few billion pre-solved solutions to Azure Table Storage. It solves a particular problem, but its not my WHOLE application's data store.
The only good reason not to use a relational database is that they suck at modeling complex entities and their relationships. Shattering nice entities into small relations to get them normalized, adding join tables to model many to many relationships and all that fun. Everything that is not roughly tree-like is just painful.
More than once I finally decided to just stuff the really complex pieces into a blob - XML or what ever worked best - and just deal with it at the application layer. Not pretty, not fun, but still less painful than modeling and dealing with it in the relational model.