Does someone here use DuckDB in production? Is it as stable as SQLite?
The hosting company I have to work with has a very old version of SQLite installed on the server and they don't want to update it. So I was looking at whether I could replace it with DuckDB since it seems to be easy to install with pip.
pysqlite3-binary is a Linux only package in PyPI that includes a recent version of SQLite
apsw is a cross-platform package, it brings in a modern version as well, and additionally exposes everything SQLite can do (useful for me, you can write VFS drivers in python for it). The version of apsw in pypi is hopelessly out of date, the homepage has more details on how to install the latest version.
I very, very much like the idea of DuckDB, but have had many headaches with what I would consider "low-medium" amounts of log data (400 million rows, 180GB+, after stripping out any un-needed text) which works without issue using SQLite, btw, other than the expected slowness of row-oriented storage on the column-aggregate queries that I needed.
I have tried many different ways of importing the data with no luck. Usually it turned into a memory issue. This was irrespective of how I tried to import, whether it was in batches through a script, or using the built-in import functionality, or breaking the data into chunks prior to import, or anything else I tried. I occasionally keep trying to find a way, because I like the idea and features of DuckDB so much.
Of note, MonetDB slurped it right in without issue, but I would much rather use something not so "heavy" like DuckDB.
Also, regarding SQLite, I have DBs with billions of rows that work flawlessly, though not as fast as postgres.
You could try Clickhouse. It's a bit heavier than DuckDB and the default mode is server-client. But you can also use the client (a single binary) without a server to directly query data from csv or parquet files.
edit: added a better link, the stand-alone mode is called clickhouse-local
DuckDB is designed for query processing, not updating data - I guess you would not want to use it for the transaction processing workloads just because it's easier to install. If you are doing mostly complex queries, then DuckDB is great, but I think there is going to be little usage directly in production, I'd expect it's more used for data processing
And how great is SQLite for transactional workloads? Every transaction locks up the entire db. There is no page or even table level locking. DuckDB can't be worse than that.
In practice, I've found that most SQLite write operations take low-milliseconds to complete - so even under a very heavy write load (assuming WAL mode) you're unlikely to see any contention between writes.
For my own stuff I maintain an in-memory Python queue of write operations and apply them in order against a single write connection - while running numerous other read-only connections to serve SELECTs.
Buddy, google DuckDB, it's like OLAP SQLite. You'll thank me later.