More

ncruces · 2026-04-29T07:16:56 1777447016

No, but commits are growing 14x YoY: https://x.com/kdaigle/status/2040164759836778878

ncruces · 2026-04-28T12:17:02 1777378622

> So the currently delayed feature development is now gonna be further delayed, yet almost every week we see new features and changes, just the other day the single issues view was changed, as just one example.

They did that as a panic mode hack to mitigate performance: https://news.ycombinator.com/item?id=47912521

ncruces · 2026-04-28T11:04:15 1777374255

More numbers: https://x.com/kdaigle/status/2040164759836778878

What's the question here, you don't believe growth is currently exponential, or do you think it shouldn't be hard to scale, when 10x YoY is not enough?

OtherShrezzing · 2026-04-28T11:16:43 1777375003

As a business user, our costs have gone up while service has gone down dramatically. Meanwhile our marginal cost to GitHub has hardly changed. Where our costs to them have increased, they mostly charge us per cpu minute, so obviously aren’t making any kind of loss on our account.

I’m sure they’re experiencing scaling issues across the platform, but it’s unacceptable for that to have a negative impact on us when we're sending them $250/dev/yr for (what is in all honesty) hosting a bunch of static text files.

ncruces · 2026-04-28T11:39:26 1777376366

I understand that, and maybe GitHub became a bad deal because of that.

But if anything, their post and your reply are precisely an endorsement of usage based billing.

The bit that's growing 13x YoY (and which they expect will easily blow past that) is unmetered - commits. The bit that is metered (for some, not all folks) - action minutes, grew only 2x YoY.

GitHub was not built to limit the number of commits, checkouts, forks, issues, PRs, etc - nor do we want them to - but that's what's growing ridiculously as people unleash hordes of busy beaver agents on GitHub, because their either free or unlimited.

Where there are limits - or usage based billing - people add guardrails and find optimizations.

Because for all the talk, agents don't bring a 10x value increase; otherwise, they'd justify a 10x cost increase.

Besides, other forges are having issues too. Even running your own. We have Anubis everywhere protecting them for a reason.

conartist6 · 2026-04-28T16:30:49 1777393849

That sounds bad. Paying users don't want huge and every-growing numbers of freeloaders reducing the return for each dollar they spend...

That would only lead to further and further degradation of service until the paying customers were absolutely desperate to find a deal that didn't require them to lug around such a heavy ball and chain.

It all made sense at the beginning when Github was free for OSS and OSS was thriving, but now these billions of commits are mostly incredibly low value. I'd bet the average commit now doesn't create 1/10th of the value the average commit did in, say, 2018

rdevilla · 2026-04-28T11:26:58 1777375618

> we're sending them $250/dev/yr for (what is in all honesty) hosting a bunch of static text files.

You know, you can just host your own code forge. Or you can just drop gitolite on a server. Or pull directly from each others' dev machines on a LAN.

GitHub is not git.

OtherShrezzing · 2026-04-29T13:03:22 1777467802

Our 20-dev company is unfortunately exactly the wrong size to justify self-hosting. We're not large enough that it can be someone's dedicated role, and we're not small enough that we can be experimental around our vendors for something so critical to our output.

We're actively looking into alternatives outside of GitHub though.

tracker1 · 2026-04-28T14:11:35 1777385495

I'm curious how Azure DevOps reliability has been for comparison. My current job is managing stories in DevOps with SCC in GitHub ent. While I like Github slightly more, have been curious about the decision.

stackskipton · 2026-04-28T17:05:00 1777395900

We use Azure DevOps at work for few things. It's been pretty rock solid since all agents don't recommend it and it's different architecture.

It's also legacy at this point since Microsoft is pouring all resources into GitHub but for most people/companies, they could probably use Azure DevOps just fine.

joeywas · 2026-04-29T02:49:26 1777430966

Concur on the rock solid comment. We use Azure Devops with git repos, lots of pipelines using self hosted or Microsoft hosted agents. There was an issue with Microsoft hosted agents a few months ago, but that didn't last long, and is the only issue in my memory.

I do prefer github interface over azure devops.

dist-epoch · 2026-04-28T11:27:52 1777375672

> we're sending them $250/dev/yr for (what is in all honesty) hosting a bunch of static text files.

so start a GitHub competitor which bills $50/dev/yr for solving this easy problem and make a lot of money?

graemep · 2026-04-28T13:34:05 1777383245

In that case, why are you using them at all?

maccard · 2026-04-28T11:35:48 1777376148

These numbers should have been in the blog post, not the graphs that are present.

> What's the question here, you don't believe growth is currently exponential, or do you think it shouldn't be hard to scale

I think you're putting words in my mouth here; I didn't say either of those things. I'm saying that this blog post is a meaningless platitude when the github stability issues predate this, and that all this post says is "we hear you're having issues".

ncruces · 2026-04-28T11:51:40 1777377100

Sorry if I misread your intent.

I just think their charts, taken at face value, show substantially the same thing (for PRs, commits, new repos).

Either those charts are a bald-faced lie (the tweet could be as well) or there is no way for that chart to be something else.

The only way to fake exponential growth like that would be to use an inverse log scale (which would be a bald-faced lie).

It doesn't even really matter what's the y-axis baseline, unless we really think growth was huge in 2020, then cratered to zero by 2023, now back to the previous normal.

As for the rest of the post, I do think it's panic mode platitudes. But I honestly don't know what I'd write instead that's better.

You can already see people complaining loudly where they instead of "we'll do better" decided to limit usage.

maccard · 2026-04-28T12:27:36 1777379256

No problem - it's tough online sometimes.

> I just think their charts, taken at face value, show substantially the same thing (for PRs, commits, new repos).

The problem is that these charts show the massive exponential growth in 2026. But this didn't start in 2026, this has been going on since early last year. My team had more build failures in 2025 due to actions outages or "degraded performance" than _any other reason_ and that includes PR's that failed linting or tests that developer were working on.

> As for the rest of the post, I do think it's panic mode platitudes. But I honestly don't know what I'd write instead that's better.

IMO, this needed to be written a 6 months ago (around the time that the memo of them prioritising the migration to Azure was released), and then this post should have been "We're still struggling, this isn't good enough. Here's the amount of growth, here's what we've done to try and fix it, and here's what we're planning over the next 3-6 months", instead of "Our priorities are clear: availability first, then capacity, then new features" and "We are committed to improving availability, increasing resilience, scaling for the future of software development, and communicating more transparently along the way." This isn't transparency (yet).

ncruces · 2026-04-28T11:00:57 1777374057

Some previous numbers: https://x.com/kdaigle/status/2040164759836778878

maccard · 2026-04-28T11:28:42 1777375722

This is the data that should be in the blog post. Thanks for sharing.

ncruces · 2026-04-28T09:24:54 1777368294

Shameless plug… compiling it to Go is a great option too: https://github.com/ncruces/wasm2go

I've used it to translate SQLite (with a few extensions) and, that I know of, it's been used (to varying degrees of success) to translate the MARISA trie library (C++), libghostty (Zig), zlib, Perl, and QuickJS.

More on-topic, I use a mix of an unevaluated expression stack and a stack-to-locals approach to translate Wasm.

nu11ptr · 2026-04-28T16:04:53 1777392293

Interesting. I started working on this same idea a couple of years ago as a way to bypass CGo. Eventually I moved on to something else. Glad someone else is working on this. How does the generated Go performance compare to the original WASM performance?

ncruces · 2026-04-29T22:01:38 1777500098

That's going to depend on what you mean for "original Wasm performance".

What were you using to run Wasm instead of this?

I can compare with wazero, which I was previously using, and say performance stayed mostly in the same ballpark. Things that crossed the Go-to-Wasm boundary very often became much faster, things that stayed mostly in Wasm became slightly slower, as the wazero compiler is pretty good.

wasm2go also does not support SIMD, so if your Wasm module uses/benefits from SIMD, you'll notice.

lukevp · 2026-04-28T15:46:00 1777391160

Does that mean you could compile a wasm program to go, then run it with wazero? How many levels deep can it go?? Might be a fun blog topic :)

ncruces · 2026-04-29T22:17:33 1777501053

Not very deep.

Go generates large Wasm modules, because it bundles its goroutine scheduler, garbage collector and standard library into the module.

Translating that back to Go will give you a pretty big Go file.

Go is "known" for being fast to compile, but that huge Go file will take (at least?) as long to compile as compiling the Go toolchain does.

wasm2go is best used on moderately sized modules (like SQLite). Last I heard, the person who tried to translate Perl got a 80MB Go file that was taking them 20min to compile.

https://github.com/ncruces/wasm2go/discussions/15

ncruces · 2026-04-26T21:21:59 1777238519

So… you're 6?

tristanj · 2026-04-27T07:55:35 1777276535

Yes. Group #6 is optimal because everyone in group #4 appears selfish and uncaring for others. Group #5 has the moral high ground, but actually voting with group #5 is risky. Therefore the best option is to virtue signal you are in group #5 and get the moral benefits, but vote with group #4 and guarantee survival. Group #6 gives you the benefits of both.

This is similar to Newcomb's two-box paradox, where the optimal strategy is deception. The winning play is to preemptively convince everyone you're only going to take the second box, but then actually take both.

ncruces · 2026-04-23T22:25:44 1776983144

Claims kill this, IMO.

Unless you have a single "reader", you don't mind the delay, and don't worry about redoing a bunch of notifications after a crash (and so, can delay claims significantly), concurrency will kill this.

vrajat · 2026-04-24T09:07:49 1777021669

I wrote a simple queue implementation after reading the Turbopuffer blog on queues on S3. In my implementation, I wrote complete sqlite files to S3 on every enqueue/dequeue/act. it used the previous E-Tag for Compare-And-Set.

The experiment and back-of-the-envelope calculations show that it can only support ~ 5 jobs/sec. The only major factor to increase throughput is to increase the size of group commits.

I dont think shipping CDC instead of whole sqlite files will change the calculations as the number of writes mattered in this experiment.

So yes, the number of writes (min. of 3) can support very low throughputs.

russellthehippo · 2026-04-23T22:45:21 1776984321

exactly. then you're building distributed locking and it's probably time for a different tool

ncruces · 2026-04-23T15:24:17 1776957857

Probably missing something, why is `stat(2)` better than: `PRAGMA data_version`?

https://sqlite.org/pragma.html#pragma_data_version

Or for a C API that's even better, `SQLITE_FCNTL_DATA_VERSION`:

https://sqlite.org/c3ref/c_fcntl_begin_atomic_write.html#sql...

infogulch · 2026-04-23T16:55:43 1776963343

Yeah the C API seems like a perfect fit for this use-case:

> [SQLITE_FCNTL_DATA_VERSION] is the only mechanism to detect changes that happen either internally or externally and that are associated with a particular attached database.

Another user itt says the stat(2) approach takes less than 1 μs per call on their hardware.

I wonder how these approaches compare across compatibility & performance metrics.

russellthehippo · 2026-04-24T07:39:35 1777016375

I just tested this out. PRAGMA data_version uses a shared counter that any connection can use while the C API appears to use a per-connection counter that does not see other connections' commits.

infogulch · 2026-04-24T19:28:42 1777058922

Really? That's the opposite of what I understand the docs say.

Care to share your code? This may become a bug report.

russellthehippo · 2026-04-25T10:58:12 1777114692

Reporting back. This appears to be a bug in my original test the code of which sadly I did not commit anywhere. I went back to regenerate these tests and proved the opposite - the C API is better than PRAGMA and works across connections. I am going to make that update as I've proved across dozens of versions of SQLite that this is not in fact the case.

russellthehippo · 2026-04-26T02:27:41 1777170461

Reporting back again. It seems I was actually right the first time - the C API's SQLITE_FCNTL_DATA_VERSION doesn't work cross connection. It is cached on each read - but if there aren't any reads (i.e. just polling SQLITE_FCNTL_DATA_VERSION) then it doesn't work.

PRAGMA data_version is pretty fast (1500ns with prepared statement) and doesn't have that issue.

Checking the wal-index is sub-nanosecond when mmapped but has slightly different behavior on Windows.

Here's the link to the thread on this, my scripts are all there.

https://github.com/russellromney/honker/issues/5

russellthehippo · 2026-04-24T20:20:34 1777062034

Will do

psadri · 2026-04-23T15:26:41 1776958001

For one it seems to be deprecated.

ncruces · 2026-04-23T15:31:15 1776958275

It's not.

psadri · 2026-04-24T05:07:32 1777007252

You are correct. I apologize. I seemed to have read the next pragma’s depreciation notice!

Aside from this - SQLite has tons of cool features, like the session extension.

russellthehippo · 2026-04-23T20:25:29 1776975929

Yep, definitely still in use. Do yall above have an opinion if the pragma is better than the syscall? What are the trade offs there? Another comment thread mentioned this as well and pointed to io uring. I was thinking that dism spam is worse than syscall spam.

ncruces · 2026-04-23T22:10:12 1776982212

Depends on what to mean by better.

I may be wrong, but I think you wrote somewhere that you're looking at the WAL size increasing to know if something was committed. Well, the WAL can be truncated, what then? Or even, however unlikely, it could be truncated, then a transaction comes and appends just enough to it to make it the same size.

If SQLite has an API it guarantees can notify you of changes, that seems better, in the sense that you're passing responsibility along to the experts. It should also work with rollback mode, another advantage. And I don't think wakes you up if a large transaction rolls back (a transaction can hit the WAL and never commit).

That said, I'm not sure what's lighter on average. For a WAL mode database, I will say that something that has knowledge of the WAL index could potentially be cheaper? That file is mmapped. The syscalls involved are file locks, if any.

russellthehippo · 2026-04-23T22:37:32 1776983852

Interesting, thank you for the response and explanation. Honker workers/listerners are holding an open connection anyway. I do trust SQLite guarantees more than cross-platform sys behavior. I will explore the C API angle.

ncruces · 2026-04-23T09:15:23 1776935723

Only if the barrier of entry is low.

Which it won't be, if at every turn you choose the hyperscaler.

ncruces · 2026-04-22T10:11:22 1776852682

Similar rules in Europe, or at least, Portugal.