Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Reiser argued that if you optimised a filesystem for very tiny files, then many cases where apps invent their own ad-hoc file-systems-in-a-file could be eliminated and apps would become easier to read/write and more composable.

For example, instead of an OpenOffice document being a zip of XMLs, you'd just use a directory of XMLs, and then replace the XMLs with directories of tiny files for the attributes and node contents. Instead of a daemon having a config file, you'd just have a directory of tiny files. He claimed that apps weren't written that way already because filesystems were wasteful when files got too tiny.

Git is an example of a program that uses this technique, to some extent at least (modulo packfiles).

In reality, although that may have contributed, there are other reasons why people bundle data up into individual files. To disaggregate things (which is a good place to start if you want a filesystem-db merge) you also have to solve all those other reasons, which ReiserFS never did and as a project that "only" wanted to reinvent the FS, could not have solved.

Apple hit some of those issues when they tried making iLife documents be NeXT bundles:

1. Filesystem explorers treat files and directories differently for UI purposes. Apple solved it nicely by teaching the Finder to show bundle directories as if they were files unless you right click and select "Show contents". Or rather partly solved ... until you send data to friends using Windows, or Google Drive, or anything other than the Finder.

2. Network protocols like HTTP and MIME only understand files, not directories. In particular there is no standardised serialisation format for a directory beyond zip. Not solved. iLife migrated from bundles to a custom file format partly due to this problem, I think.

3. Operating systems provide much richer APIs for files than directories. You can monitor a file for changes, but if you want to monitor a directory tree, you have to iterate and do it yourself. You can lock a file against changes, but not a directory tree. You can check if a file has been modified by looking at its mtime, but there's no recursive mtime for directory trees. You can update files transactionally by writing to a temporary file and renaming, but you can't atomically replace a directory tree. Etc.

So the ReiserFS concept wasn't fully fleshed out, even if it had been accepted into the kernel. Our foundational APIs and protocols just aren't geared up for it. I've sometimes thought it'd be a neat retirement project one day to build an OS where files and directories are more closely merged as a concept, so files can have sub-files that you can browse into using 'cd' and so on, and those API/protocol gaps are closed. It wouldn't give you a full relational database but it'd be much more feasible to port apps to such an OS than to rewrite everything to use classical database APIs and semantics



>>> 2. Network protocols like HTTP and MIME only understand files

Love when someone says something that makes my brain work!

For the most part you're spot on. HTTP has multipart messages that in theory could be extended to be composite of anything. So we could have those bundles! Oddly we can send to the server with a multipart message (forms)!!

I think that MIME is an interesting slice the OTHER way. You could store versions of the same document in a directory so HTML and JSON and XML OR a video or image in two formats and serve them up based on the MIME request.

Now if we could make one of those a multi part message...


The problem is the case where you want to upload or attach >1 document that's actually a directory. You need a way to signal that the first 3 files are a part of document A, and the next 5 are part of document B, and although you could invent a file name convention to express this nothing understands it. Email clients would show 7 attachments, web server APIs would show 7 files, browsers would need to be patched to let you select bundles in the file picker and then recursively upload them, how progress tracking works would need to change, etc.

And then how do you _download_ them? Browsers don't understand MIME at download time.

None of it is hard to solve. But, nobody ever did, and the value of doing things this new way is usually going to be lower than the value of smooth interop with everyone's different browser/OS/email/server combos.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: