Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The downside is that you have to recreate full data constraints within your application - the database offers none of it.

Of course constraints are tedious when you're in "experimentation" mode (to quote another post I see here) and are doing rapid, early development. But once you're in production with data that's critical/important (i.e., not someone's list of their favorite songs; more like, their bank statements and medical histories), constraints are the bees knees.

Once you have data constraints in place, now migrations are hard - whether or not you're on a SQL database. You need to either update all old documents to match new schemas, or open up your constraints to "expect both" (where by "both" I really mean, "any number of 18 different formats...oh make that 19") - and that is the potentially slippery slope here into a coding crapfest.

Disclaimer: I'm the author of a very popular SQL tool for Python (SQLAlchemy) as well as a new database migrations tool (Alembic).



While I can see the advantages of document databases in some use cases and I can't help but feel that a lot of adoption of MongoDB and its ilk are more related to the rough edges of interfaces/ORMS to relational databases and not fundamental flaws in the relational databases them themselves. This coupled with fact the rising generation of developers is less familiar with SQL than the last is leading to some curious choices for datastores. It should be an interesting next couple of years.

Also great work on Alembic, I started using it a few weeks ago and am very pleased.


It has to be noted that a lot of the >referential< constraints are just necessary because an RDBMS wants you to split your "object" over several tables. What I found pleasant with the non-relational databases is that these constraints CAN just fall away because you e.g. nest "comments" inside of an array in a "blogpost" object. When deleting the blogpost all comments will be deleted too and cascading deletes are just not necessary.

Some of the other constraints you'll have to implement in your software. The advantage: you don't put application logic outside of your application. The disadvantage: Every bit of code touching that value has to know the limitations. I wonder if this could be solved by using a message queue and just have dedicated step for updating/deleting data


> I wonder if this could be solved by using a message queue and just have dedicated step for updating/deleting data

See but now you're building some big thing. Let's just include that in MongoDB or whatever, a "constraints engine". So that you don't have to build it from scratch each time, and can have some mature, well tested thing instead of something ad-hoc and probably buggy. Now you need to carefully build migrations again !


One of the messier parts of having to implement referrential constraints in the application is that you have to handle the concurrency issues yourself. A RDBMS handles the locking of the referred rows for you.


Referential integrity is to databases what pointers are to code. You definitely need it.

What's worse than migrations? Being unable to turn an application off, since it uses the old schema.

With ChronicDB we support indefinite backward compatibility. Unlike per-record versioning tricks, application code does not need to be aware of migration code.


one of the tricks you can use is migrate on read. instead of having a global per-db versioning you have schema versioning on a per-record basis.

each version migration is a function that takes a record in an old state a migrates it to a new one.

once you read a record from the db that has an outdated version number you run all the pending migrations and then save.

with this scheme your application code only has to know how to deal with the latest schema version. all the rest is handled by migrations.


The up side is, you can easily create elegant conventions for this. The payoff is less friction between you and your features.


Using conventions for data constraints is quite dangerous. Whereas bugs in software can be corrected, bugs in corrupt data often cannot, as the data is created in a temporally-dependent way (i.e. you captured it wrong) as well as that it may be massive (i.e. your corruption is widespread across TBs of data). Basically, you might only get one chance with data.

A convention-based approach, or even a well thought out data-enforcement approach, will have bugs and failures, and you just have to hope these failures aren't severe enough that you lose your "one" chance.

Relational constraints OTOH when used in their usual way make it virtually impossible to have situations like this.


I couldn't disagree more, but we obviously have different experiences.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: