Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Size isn't the real problem, it's time.

Are you going to take the time / money to set up a warehouse, get all the data into with an ETL product, set up dbt or some other transformation layer, set up a BI tool and build the reports and dashboards, etc.

Regardless the size of your data, you still need to get it in one place and model it in a way it's actually usable.



Exactly. It isn't just time to set up all the data in a way that makes the right query possible. It is also having queries fast enough to be able to run a vast number of them in order to find what you are looking for (or even things you were not looking for).

https://didgets.substack.com/p/data-science-and-serendipity


Is it queries on live data or data thats been moved usually?


It’s moving the data around that is slow… and expensive. Getting the data into the data warehouse, then getting it to the processors then moving it around to filter and transform.

Getting your data to the cloud is expensive, but then you can’t do anything with it because distributing it to process in multiple stages is too expensive and you’re already paying so much to keep all that useless data.


Can't we just give it to that one IT guy down in the basement?


Hey, I used to be that guy (and still am).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: