I meant TRANSPARENT filesystem level dedupe. They are doing it at the application level. filesystem level dedupe makes it impossible to store the same file more than once and doesn't consume hardlinks for the references. It is really awesome.
ZFS is great! However, it's too complicated for most Linux server use cases (especially with just one block device attached); it's not the default (root filesystem); and it's not supported for at least one major enterprise Linux distro family.
File system dedupe is expensive because it requires another hash calculation that cannot be shared with application-level hashing, is a relatively rare OS-fs feature, doesn't play nice with backups (because files will be duplicated), and doesn't scale across boxes.
A simpler solution is application-level dedupe that doesn't require fs-specific features. Simple scales and wins. And plays nice with backups.
Hash = sha256 of file, and abs filename = {{aa}}/{{bb}}/{{cc}}/{{d}} where
That costs even more, unreuseable time and effort. It's simpler to dedupe at the application level rather than shift the burden onto N things. I guess you don't understand or appreciate simplicity.
> [W]e shipped an optimization. Detect duplicate files by their content hash, use hardlinks instead of downloading each copy.