for me the issue is that DuckLake's feature of flushing inlined data to parquet is still in alpha. one of the main issues with parquet is when writing small batches you end up with a lot of parquet files that are inefficient to work with using duckdb. to solve this ducklake inlines these small writes to the dbms you choose (postgres) but for a while it couldn't write them back to parquet. last I had checked this feature didn't yet exist, and now it seems to be in alpha which is nice to see, but I'd like some better support before I consider switching some personal data projects over. https://ducklake.select/docs/stable/duckdb/advanced_features...
Data inlining is also currently limited to only the DuckDB catalog (ie it doesn't work with Postgres cataglogs)[0]. It's improving very quickly though and I'm sure this will be expanded soon.
DuckLake format has an unresolved built-in chicken and egg conflict: it requires SQL database to represent its catalog. But this is what some people are running away from when they choose Parquet format in the first place. Parquet = easy, SQL = hard, adding SQL to Parquet makes the resulting format hard. I would expect a catalog to be in Parquet format as well, then it becomes something self-bootstrapping and usable.
DuckLake is more comparable to Iceberg and Delta than to raw parquet files. Iceberg requires a catalog layer too, a file system based one at its simplest. For DuckLake any RDBMS will do, including fs-based ones like DuckDB and SQLite. The difference is that DuckLake will use that database with all its ACID goodness for all metadata operations and there is no need to implement transactional semantics over a REST or object storage API.
It is not a chicken and egg problem, it is just a requirement to have an RDBMS available for systems like DuckLake and Hive to store their catalogs in. Metadata is relatively small and needs to provide ACID r/w => great RDBMS use case.