Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This isn't quite true - data is streamed from the client through a pipeline made up of all of the replicas, as it's written. It's true you'll lose data if you crash in the middle of a block, _unless_ you call the sync() function which makes sure the data has been fully replicated to all of the nodes.


Hadoop only writes a block from a client to a DataNode when a whole block is available. This is to minimize the amount of "open connections" in the datanodes (it can take a long time for the client to generate 64MB of data, while distributing the block over the replicas can occur in a relatively short time).

For more information about this, see: http://hadoop.apache.org/common/docs/current/hdfs_design.htm... and http://hadoop.apache.org/common/docs/current/hdfs_design.htm...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: