Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think database schemas deserve to be protected with one’s life as the holy ground of the system. If the schema is fucked, everything else will be fucked too.


Schemas require domain knowledge. When domain knowledge is unclear or lacks ownership, it can lead to a range of issues that impact both data integrity and system functionality. Things that screw this up in the financial world include: working in different countries, acquiring new branches, new hires, and leavers. And people who think they can insist the database schema be protected somehow. A manager told me to add the last reason, it wasnt my idea and makes little sense.


With a database you can lock down the schema. In reality though, many data system are composed mainly of people emailing in Excel spreadsheets. Good luck enforcing any sort of schema there.

My day job is writing a desktop/file-based ETL system. I have just added in a schema version feature to cover these sort of issues. It was one of the most requested features, because most people aren't able to control the schemas of the data they receive.


Yeah excel based data ingest is a pretty brutal problem to solve.


We can automatically handle some schema drift if columns are renamed or reordered, or columns added or deleted. But if they are both renamed AND re-ordered, you are out of luck!


If you detect a level of drift that you can't handle, this is the perfect opportunity to delegate that bit of work to an LLM, if it's a problem that you deal with regularly enough to feel the cost of it to your business.

The latest generation of LLMs are pretty, actually very, good at this kind of situation, where somebody has renamed something - but kept some semblance of the meaning - and also moved it so a basic, or even a fuzzy, comparison might not be able to make a good match.

But a model like GPT-4o-mini will eat a problem like this for breakfast, and it's now incredibly cheap to use it for this kind of thing as well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: