Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

File system dedupe is expensive because it requires another hash calculation that cannot be shared with application-level hashing, is a relatively rare OS-fs feature, doesn't play nice with backups (because files will be duplicated), and doesn't scale across boxes.

A simpler solution is application-level dedupe that doesn't require fs-specific features. Simple scales and wins. And plays nice with backups.

Hash = sha256 of file, and abs filename = {{aa}}/{{bb}}/{{cc}}/{{d}} where

aa = hash 2 hex most significant digits

bb = hash next 2 hex digits

cc = hash next 2 hex after that

d = remaining hex digits

 help



All good backup software should be able to do deduped incremental backups at the block level. I'm used to veeam and commvault

That costs even more, unreuseable time and effort. It's simpler to dedupe at the application level rather than shift the burden onto N things. I guess you don't understand or appreciate simplicity.

This article shows it really isn't that simple and is easy to mess up. Who cares if your storage and backup software both dedupe?

For ZFS, at least, `zfs send` is the backup solution. And it performs incremental backups with the `-i` argument.

zfs send is really awesome when combined with dedupe and incremental



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: