My big pet peeve is AWS adding buttons in the UI to make "folders".
It is also a fiction! There are no folders in S3.
> When you create a folder in Amazon S3, S3 creates a 0-byte object with a key that's set to the folder name that you provided. For example, if you create a folder named photos in your bucket, the Amazon S3 console creates a 0-byte object with the key photos/. The console creates this object to support the idea of folders.
Hmm well there's no folders but if you interact with the object the URL does become nested. So in a sense it does behave exactly like a folder for all intents and purposes when dealing with it that way. It depends what API you use I guess.
I use S3 just as a web bucket of files (I know it's not the best way to do that but it's what I could easily obtain through our company's processes). But in this case it makes a lot of sense though I try to avoid making folders. But other people using the same hosting do use them.
Except stuff like s3 cli has all these weird names for normal filesystem items and you have to bang your head to try to figure it out what it all means
(also don't get me started on the whole s3api thing)
In a filesystem, there’s an inode for /some. It contains an entry for /some/dir, which is also an inode, and then in the very deepest level, there is an inode for /some/dir/file.jpg. You can rename /some to /something_else if you want. Think of it kind of like a table:
+-------+--------+----------+-------+
| inode | parent | name | data |
+-------+--------+----------+-------+
| 1 | (null) | some | (dir) |
| 2 | 1 | dir | (dir) |
| 3 | 2 | file.jpg | jpeg |
+-------+--------+----------+-------+
In S3 (and other object stores), the table is like this:
The kind of queries you can do is completely different. There are no inodes in S3. There is just a mapping from keys to objects. There’s an index on these keys, so you can do queries—but the / character is NOT SPECIAL and does not actually have any significance to the S3 storage system and API. The / character only has significance in the UI.
You can, if you want, use a completely different character to separate “components” in S3, rather than using /, because / is not special. If you want something like “some:dir:file.jpg” or “some.dir.file.jpg” you can do that. Again, because / is not special.
Except, S3 does let you query by prefix and so the keys have more structure than the second diagram implies: they’re not just random keys, the API implies that common prefixes indicate related objects.
That’s kind of stretching the idea of “more structure” to the breaking point, I think. The key is just a string. There is no entry for directories.
> the API implies that common prefixes indicate related objects.
That’s something users do. The API doesn’t imply anything is related.
And prefixes can be anything, not just directories. If you have /some/dir/file.jpg, then you can query using /some/dir/ as a prefix (like a directory!) or you can query using /so as a prefix, or /some/dir/fil as a prefix. It’s just a string. It only looks like a directory when you, the user, decide to interpret the / in the file key as a directory separator. You could just as easily use any other character.
One operation where this difference is significant is renaming a "folder". In UNIX (and even UNIX-y distributed filesystems like HDFS) a rename operation at "folder" level is O(1) as it only involves metadata changes. In S3, renaming a "folder" is O(number of files).
> In S3, renaming a "folder" is O(number of files).
More like O(max(number of files, total file size)). You can’t rename objects in S3. To simulate a rename, you have to copy an object and then delete the old one.
Unlike renames in typical file systems, that isn’t atomic (there will be a time period in which both the old and the new object exist), and it becomes slower the larger the file.
From reading the above, if you have a folder 'dir' and a file 'dir/file', after renaming 'dir' to 'folder', you would just have 'folder' and 'dir/file'.
If you have something which is dir/file, then NORMALLY “dir” does not exist at all. Only dir/file exists. There is nothing to rename.
If you happen to have something which is named “dir”, then it’s just another file (a.k.a. object). In that scenario, you have two files (objects) named “dir” and “dir/file”. Weird, but nothing stopping you from doing that. You can also have another object named “dir///../file” or something, although that can be inconvenient, for various reasons.
> That’s something users do. The API doesn’t imply anything is related.
Querying ids by prefix doesn’t make any sense for a normal ID type. Just making this operation available and part of your public API indicates that prefixes are semantically relevant to your API’s ID type.
I can look up names with the prefix “B” and get Bart, Bella, Brooke, Blake, etc. That doesn’t imply that there’s some kind of semantics associated with prefixes. It’s just a feature of your system that you may find useful. The fact that these names have a common prefix, “B”, is not a particularly interesting thing to me. Just like if I had a list of files, 1.jpg, 10.jpg, 100.jpg, it’s probably not significant that they’re being returned sequentially (because I probably want 2.jpg after 1.jpg).
"filesystem" is not a name reserved for Unix-style file systems. There are many types of file system which is not built on according to your description. When I was a kid, I used systems which didn't support directories, but it was still file systems.
It's an incorrect take that a system to manage files must follow a set of patterns like the ones you mentioned to be called "file system".
You're free to argue whatever you want, but claiming that a file system should have folders as the parent commenter did, or support specific operations, seems a bit meaningless.
I could create a system not supporting folders because it relies on tags or something else. Or I could create a system which is write-only and doesn't support rename or delete.
These systems would be file systems according to how the term has been used for 40 (?) years at least. Just don't see any point in restricting the term to exclude random variants.
> Fair enough, basing folders on object names split by / is pretty inefficient. I wonder why they didn't go with a solution like git's trees.
What, exactly, is inefficient about it?
Think for a moment about the data structures you would use to represent a directory structure in a filesystem, and the data structures you would use to represent a key/value store.
With a filesystem, if you split a string /some/dir/file.jpg into three parts, “some”, “dir”, “file.jpg”, then you are actually making a decision about the tree structure. And here’s a question—is that a balanced tree you got there? Maybe it’s completely unbalanced! That’s actually inefficient.
Let’s suppose, instead, you treat the key as a plain string and stick it in a tree. You have a lot of freedom now, in how you balance the tree, since you are not forced to stick nodes in the tree at every / character.
It’s just a different efficiency tradeoff. Certain operations are now much less efficient (like “rename a directory” which, on S3, is actually “copy a zillion objects). Some operations are more efficient, like “store a file” or “retrieve a file”.
I think it is fair to say that S3 (as named files) is not a filesystem and it is inefficient to use it directly as such for common filesystem use cases; the same way that you could say it for a tarball[0].
This does not make S3 a bad storage, just a bad filesystem, not everything needs to be a filesystem.
Arguably is it good that S3 is not a filesystem, as it can be a leaky abstraction eg in git you cannot have two tags name "v2" and "v2/feature-1" as you cannot have both a file and a folder with the same name.
For something more closely related to URLs than filenames forcing a filesystem abstraction is a limitation as "/some/url", "/some/url/", and "/some/url/some-default-name-decided-by-the-webserver" can be different.[1]
[0] where a different tradeoff is that searching a file by name is slower but reading many small files can be faster.
[1] maybe they should be the same, but enforcing it is a bad idea
I think what you’re describing is simply not a hierarchical file system. It’s a different thing that supports different operations and, indeed, is better or worse at different operations.
> […] what the special 0-byte object refers to. It represents an empty folder.
Alas, no. It represents a tag, e.g. «folder/», that points to a zero byte object.
You can then upload two files, e.g. «folder/file1.txt» and «folder/file2.txt», delete the «folder/», being a tag, and still have the «folder/file1.txt» and «folder/file2.txt» file intact in the S3 bucket.
Deleting «folder/» in a traditional file system, on the other hand, will also delete «file1.txt» and «file2.txt» in it.
But if the S3 semantics are not helping you, e.g. with multiple clients doing copy/move/delete operations in the hierarchy you could still end up with files that are not in "directories".
So essentially an S3 file manager must be able to handle the situation where there are files without a "directory"—and that I assume is also the most common case as well for S3. Might just not have the "directories" in the first place.
I have personally never seen the 0-byte files people keep talking about here. In every S3 bucket I’ve ever looked at, the “directories” don’t exist at all. If you have a dir/file1.txt and dir/file2.txt, there is NO such object as dir. Not even a placeholder.
Deleting folder/ in a traditional file system will _fail_ if the folder is not empty. Userspace needs to recurse over the directory structure to unlink everything in it before unlinking the actual folder.
"folders" do not exist in S3 -- why do you keep insisting that they do?
They appear to exist because the key is split on the slash character for navigation in the web front-end. This gives the familiar appearance of a filesystem, but the implementation is at a much higher level.
Let’s start with the fact that you’re talking to an HTTP api… Even if S3 had web3.0 inodes, the querying semantics would not make sense. It’s a higher level API, because you don’t deal with blocks of magnetic storage and binary buffers. Of course s3 is not a filesystem, that is part of its definition, and reason to be…
I think if you focus too narrowly on the details of the wire protocol, you’ll lose sight of the big picture and the semantics.
S3 is not a filesystem because the semantics are different from the kind of semantics we expect from filesystems. You can’t take the high-level API provided by a filesystem, use S3 as the backing storage, and expect to get good performance out of it unless you use a ton of translation.
Stuff like NFS or CIFS are filesystems. They behave like filesystems, in practice. You can rename files. You can modify files. You can create directories.
Right, the NFS/CIFS support writing blocks, but S3 basically does HTTP get and post verbs. I would say that these concepts are the defining difference. To call S3 a filesystem is not wrong in abstract, but it’s not different than calling Wordpress a filesystem, or DNS, or anything that stores something for you. Of course, it will be inefficient to implement a block write on top of any of these, that’s because you have to literally do it yourself. As in, download the file, edit it, upload again.
I think the blocks are one part of it, and the other part is that S3 doesn’t support renaming or moving objects, and doesn’t have directories (just prefixes). Whenever I’ve seen something with filesystem-like semantics on top of S3, it’s done by using S3 as a storage layer, and building some other kind of view of the storage on top using a separate index.
For example, maybe you have a database mapping file paths to S3 objects. This gives you a separate metadata layer, with S3 as the storage layer for large blocks of data.
Another challenge is directory flattening. On a file system "a/b" and "a//b" are usually considered the same path. But on S3 the slash isn't a directory separator, so the paths are distinct. You need to be extra careful when building paths not to include double slashes.
Many tools end up handling this by showing a folder named "a" containing a folder named "" (empty string). This confuses users quite a bit. It's more than the inodes, it's how the tooling handles the abstraction.
Coincidentally I ran into an issue just like this a week ago. A customer facing application failed because there was an object named “/foo/bar” (emphasis on the leading slash).
This created a prefix named “/“ which confused the hell out of the application.
Not only you cannot rename a single file, but you also cannot rename a "folder" (because that would imply a bulk rename on a large number of children of that "folder")
This is the fundamental difference between a first class folder and just a convention on prefixes of full path names.
If you don't allow renames, it doesn't really make sense to have each "folder" store the list of the children.
You can instead have a giant ordered map (some kind of b-tree) that allows you for efficient lookup and scanning neighbouring nodes.
UMich LDAP server, upon which many were based, stored entrys’ hierarchical (distinguished) names with each entry, which I always found a bit weird. AD, eDirectory, and the OpenLDAP HDB backend don’t have this problem.
You can create a simulated directory, and write a bunch of files in it, but you can't atomically rename it--behind the scenes each file needs to be copied from old name to new.
I’m fine with it, I actually appreciate the logic and simplicity behind it, but the amount of times I’ve tried to explain why “folders” on S3 keep disappearing while people stare at me like I’m an idiot is really frustrating.
(When you remove the last file in a “folder” on S3, the “folder” disappears, because that pattern no longer appears in the bucket k/v dictionary so there’s no reason to show it as it never existed in the first place).
The web console even collapses them like folders on slashes, further obfuscating how it actually works. I remember having to explain to coworkers why it was so slow to load a large bucket.
Yeah, the UI and CLI show you “folders”. It’s a client-side thing that doesn’t exist in the actual service. Behind the scenes, the clients are making specific types of queries on the object keys.
You can’t examine when a folder was created (it doesn’t exist in the first place), you can’t rename a folder (it doesn’t exist), you can’t delete a folder (again, it doesn’t exist).
Yes, which is why it's not ideal to reuse the folder metaphor here. Users have an idea how directories work on well-known filesystems and get confused when these fake folders don't behave the same way.
It sounds to me like you’re arguing about what the definition of “folders” is.
“Any hierarchical path structure is a folder” is maybe your definition of “folder”, from what I can tell. I would say that S3 lets you treat paths as hierarchical, but that S3 does not have folders—obviously I have a different definition of “folder” than you do.
We’ve discovered that we have different definitions of “folder”, and therefore, we are not going to agree about whether it is true that “S3 does not have folders” unless we have an argument about what the correct definition of “folder” is. I’m not really interested in that discussion—it’s enough to understand what somebody means when they say “S3 does not have folders” even if you think their definitions are wrong.
Directories actually exist on the filesystem, which is why you have to create them before use and they can exist and be empty. They don't exist in S3 and neither of those properties do, either. Similarly, common filesystem operations on directories (like efficiently renaming them, and thus the files under them) are not possible in S3.
Of course it can still be useful to group objects in the S3 UI, but it would probably be better to use some kind of prefix-centric UI rather than reusing the folder metaphor when it doesn't match the paradigm people are used to.
Speaking of user interfaces with optical illustions about directory separators:
On the Mac, the Finder lets you have files with slashes in their names, even though it's a Unix file system underneath. Don't believe me? Go try to use the Finder to make a directory whose name is "Reports from 2024/03/10". See?
But as everyone knows, slash is the ONLY character you're not allowed to have in a file or directory name under Unix. It's enforced in the kernel at the system call inteface. There is absolutely no way to make a file with a slash in it. Yet there it is!
The original MacOS operating system used the ":" character to delimit directory names, instead of "/", so you could have files and directories with slashes in their names, justs not with colons in their names.
When Apple transitioned from MacOS to Unix, they did not want to freak out their users by reaming all their files.
So now try to use the Finder (or any app that uses the standard file dialog) to make a folder or file with a ":" in its name on a modern Mac. You still can't!
So now go into the shell and list out the parent directory containing the directory you made with a slash in its name. It's actually called "Reports from 2024:03:10"!
The Mac Finder and system file dialog user interfaces actually switche "/" and ":" when they show paths on the screen!
Try making a file in the shell with colons in it, then look at it in the finder to see the slashes.
However, back in the days of the old MacOS that permitted slashes in file names, there was a handy network gateway box called the "Gatorbox" that was a Localtalk-to-Ethernet AFP/NFS bridge, which took a subtly different approach.
It took advantage of the fact (or rather it triggered the bug) that the Unix NFS implementation boldly made an end-run around the kernel's safe system call interface that disallowed slashes in file names. So any NFS client could actually trick Unix into putting slashes into file names via the NFS protocol!
It appeared to work just fine, but then down the line the Unix "restore" command would totally shit itself! Of course "dump" worked just fine, never raising an error that it was writing corrupted dumps that you would not be able to read back in your time of need, so you'd only learn that you'd been screwed by the bug and lost all your files months or years later!
So not only does NFS stand for "No File Security", it also stands for "Nasty Forbidden Slashes"!
>The NFS protocol wasn't just stateless, but also securityless!
>Stewart, remember the open secret that almost everybody at Sun knew about, in which you could tftp a host's /etc/exports (because tftp was set up by default in a way that left it wide open to anyone from anywhere reading files in /etc) to learn the name of all the servers a host allowed to mount its file system, and then in a root shell simply go "hostname foo ; mount remote:/dir /mnt ; hostname `hostname`" to temporarily change the CLIENT's hostname to the name of a host that the SERVER allowed to mount the directory, then mount it (claiming to be an allowed client), then switch it back?
>That's right, the server didn't bother checking the client's IP address against the host name it claimed to be in the NFS mountd request. That's right: the protocol itself let the client tell the server what its host name was, and the server implementation didn't check that against the client's ip address. Nice professional protocol design and implementation, huh?
>Yes, that actually worked, because the NFS protocol laughably trusted the CLIENT to identify its host name for security purposes. That level of "trust" was built into the original NFS protocol and implementation from day one, by the geniuses at Sun who originally designed it. The network is the computer is insecure, indeed.
UFS allows any character in a filename except for the slash (/) and the ASCII NUL character. (Some versions of Unix allow ASCII characters with the high-bit, bit 8, set. Others don't.)
This feature is great — especially in versions of Unix based on Berkeley's Fast File System, which allows filenames longer than 14 characters. It means that you are free to construct informative, easy-to-understand filenames like these:
1992 Sales Report
Personnel File: Verne, Jules
rt005mfkbgkw0 . cp
Unfortunately, the rest of Unix isn't as tolerant. Of the filenames shown above, only rt005mfkbgkw0.cp will work with the majority of Unix utilities (which generally can't tolerate spaces in filenames).
However, don't fret: Unix will let you construct filenames that have control characters or graphics symbols in them. (Some versions will even let you build files that have no name at all.) This can be a great security feature — especially if you have control keys on your keyboard that other people don't have on theirs. That's right: you can literally create files with names that other people can't access. It sort of makes up for the lack of serious security access controls in the rest of Unix.
Recall that Unix does place one hard-and-fast restriction on filenames: they may never, ever contain the magic slash character (/), since the Unix kernel uses the slash to denote subdirectories. To enforce this requirement, the Unix kernel simply will never let you create a filename that has a slash in it. (However, you can have a filename with the 0200 bit set, which does list on some versions of Unix as a slash character.)
Never? Well, hardly ever.
Date: Mon, 8 Jan 90 18:41:57 PST
From: sun!wrs!yuba!steve@decwrl.dec.com (Steve Sekiguchi)
Subject: Info-Mac Digest V8 #3 5
I've got a rather difficult problem here. We've got a Gator Box run-
ning the NFS/AFP conversion. We use this to hook up Macs and
Suns. With the Sun as a AppleShare File server. All of this works
great!
Now here is the problem, Macs are allowed to create files on the Sun/
Unix fileserver with a "/" in the filename. This is great until you try
to restore one of these files from your "dump" tapes, "restore" core
dumps when it runs into a file with a "/" in the filename. As far as I
can tell the "dump" tape is fine.
Does anyone have a suggestion for getting the files off the backup
tape?
Thanks in Advance,
Steven Sekiguchi Wind River Systems
sun!wrs!steve, steve@wrs.com Emeryville CA, 94608
Apparently Sun's circa 1990 NFS server (which runs inside the kernel) assumed that an NFS client would never, ever send a filename that had a slash inside it and thus didn't bother to check for the illegal character. We're surprised that the files got written to the dump tape at all. (Then again, perhaps they didn't. There's really no way to tell for sure, is there now?)
It is also a fiction! There are no folders in S3.
> When you create a folder in Amazon S3, S3 creates a 0-byte object with a key that's set to the folder name that you provided. For example, if you create a folder named photos in your bucket, the Amazon S3 console creates a 0-byte object with the key photos/. The console creates this object to support the idea of folders.
https://docs.aws.amazon.com/AmazonS3/latest/userguide/using-...