This is not true. Any column store database (bigquery, Redshift, snowflake) implements distributed compute behind the scenes. When an analyst/business intelligence people have a query return in 3 seconds instead of 15 seconds, it's actually huge. Not just in aggregate amount of time saved, but in creating a quick feedback loop in testing hypothesizes. This is especially true considering that most analyst type people look at data as aggregates across some dimension (e.g. sales per month , unique visitors per region, etc...)
These types of questions are orders of magnitude faster with a distributed backend.
I was just playing with some data from our manufacturing system, about 30 GB. I pulled the data to my laptop (very expensive Apple one) and while it fits on my disk just fine, it took about 15 minutes to download.
I imported it to ClickHouse which took a while due to figuring out whatever compression and LowCardinality() and so on. I ran a query and it took ClickHouse about 15 seconds. DuckDB pointed to the parquet files on my SSD took 19 seconds to do the same. Our big data tool took 2 seconds, while working with data directly in cloud storage.
Now of course this is entirely unfair - the big data thingie has over twenty times more CPUs than my laptop, and cloud storage is also quite fast when accessed from many machines at once. If I ran ClickHouse or DuckDB on 100 CPU machine with terabyte of RAM it might have still turned out faster.
But this experiment (I was thinking of using some of the new fancy tech to serve interactive applications with less latency) made me realize that big data is still a thing. This was a sample - one building from one site, which we have quite a few of.
I'd love to understand the shape of this data and some of the types of queries you're performing. It would be very helpful as we build our product here at motherduck.
I have no doubt that there are situations where the cloud will be faster, especially when provisioned for max usage [which many companies do not]. However, there are a lot of these situations even where the local machine can supplement the cloud resources [think re decisions a query planner can make].
Feel free to reach out at ryan at motherduck if you want to chat more.
These types of questions are orders of magnitude faster with a distributed backend.