From the article: > [W]e shipped an optimization. Detect duplicate files by thei...

UltraSane · 2026-04-10T15:54:16 1775836456

I meant TRANSPARENT filesystem level dedupe. They are doing it at the application level. filesystem level dedupe makes it impossible to store the same file more than once and doesn't consume hardlinks for the references. It is really awesome.

mmh0000 · 2026-04-10T15:59:33 1775836773

Filesystem/file level dedupe is for suckers. =D

If the greatest filesystem in the world were a living being, it would be our God. That filesystem, of course, is ZFS.

Handles this correctly:

https://www.truenas.com/docs/references/zfsdeduplication/

UltraSane · 2026-04-10T16:01:37 1775836897

I was talking about block level dedupe.

mmh0000 · 2026-04-10T16:08:14 1775837294

I thought you might be.

I just wanted to mention ZFS.

Have I mentioned how great ZFS is yet?

otterley · 2026-04-10T18:19:58 1775845198

ZFS is great! However, it's too complicated for most Linux server use cases (especially with just one block device attached); it's not the default (root filesystem); and it's not supported for at least one major enterprise Linux distro family.

vmilner · 2026-04-11T08:30:26 1775896226

It's not as good as ed: https://www.gnu.org/fun/jokes/ed-msg.html

burnt-resistor · 2026-04-10T20:07:56 1775851676

File system dedupe is expensive because it requires another hash calculation that cannot be shared with application-level hashing, is a relatively rare OS-fs feature, doesn't play nice with backups (because files will be duplicated), and doesn't scale across boxes.

A simpler solution is application-level dedupe that doesn't require fs-specific features. Simple scales and wins. And plays nice with backups.

Hash = sha256 of file, and abs filename = {{aa}}/{{bb}}/{{cc}}/{{d}} where

aa = hash 2 hex most significant digits

bb = hash next 2 hex digits

cc = hash next 2 hex after that

d = remaining hex digits

UltraSane · 2026-04-10T20:23:44 1775852624

All good backup software should be able to do deduped incremental backups at the block level. I'm used to veeam and commvault

burnt-resistor · 2026-04-11T12:22:38 1775910158

That costs even more, unreuseable time and effort. It's simpler to dedupe at the application level rather than shift the burden onto N things. I guess you don't understand or appreciate simplicity.

UltraSane · 2026-04-11T21:35:56 1775943356

This article shows it really isn't that simple and is easy to mess up. Who cares if your storage and backup software both dedupe?

otterley · 2026-04-10T23:28:54 1775863734

For ZFS, at least, `zfs send` is the backup solution. And it performs incremental backups with the `-i` argument.

UltraSane · 2026-04-11T01:58:01 1775872681

zfs send is really awesome when combined with dedupe and incremental