Maybe there’s a git trick I don’t know, but I’ve found making small branches off each other painful. I run into trouble when I update an earlier branch and all the dependent branches get out of sync with it. When those earlier branches get rebased into master it becomes a pain to update my in-progress branches as well
If I understood you correctly, you want to propagate changes in a branch to other branches that depend on it? Then --update-refs is for you[1]. That way, you only need to update the "latest" branch.
Stacking branches for any extended period of time is definitely a poor mixing of the concepts of branches and commits. If you have a set of changes you need to keep in order, but you also need to maintain multiple silos where you can cleanly allow the code to diverge, that divergence constitutes the failure of your efforts to keep the changes in order.
Until you can make it effortless, maintaining a substantial commit structure and constantly rebasing to add changes to the proper commit quickly turns into more effort than just waiting to the end and manually editing a monster diff into multiple sensible commits. But we take the challenge and tell ourselves we can do better if we're proactive.
I take from GP that they try to make their branches small, and keep the cycle of development->review->merging small, so that the problem stacked PRs seeks to solve doesn't materialize in the first place.
Stacked PRs in my experience has primarily been a request to merge in a particular order. If you're the only merger, as in GP's case, there's no need to request this of yourself.
Whenever I send a big diff. I spend some time annotating with comment first to helps the reviewer. A good summary of the changes in the description, the I annotate the diff of the PR, explaining approaches, the design of a specific changes, tricky part of the code, boilerplate,... Trying to guess the context is where the review bottleneck is, so I present it alongside the code.
As someone who used phabricator and mercurial, using GitHub and git again feels like going back to the stone ages. Hopefully this and jujutsu can recreate stacked-diff flow of phabricator.
It’s not just nice for monorepos. It makes both reviewing and working on long-running feature projects so much nicer. It encourages smaller PRs or diffs so that reviews are quick and easy to do in between builds (whereas long pull requests take a big chunk of time).
I'm so glad git won the dvcs war. There was a solid decade where mercurial kept promoting itself as "faster than git*†‡" and every time I tried it wound up being dog slow (always) or broken (some of the time). Git is fugly but it's fast, reliable, and fugly, and I can work with that.
> I'm so glad git won the dvcs war. There was a solid decade where mercurial kept promoting itself as "faster than git".
It wasn't the Mercurial team saying it was faster than Git; that was Facebook after contributing a bunch of patches after testing Mercurial on their very large mono-repo in 2014 [1]:
For our repository, enabling Watchman integration has made Mercurial’s status command more than 5x faster than Git’s status command. Other commands that look for changed files–like diff, update, and commit—also became faster.
In fact they liked Mercurial so much they essentially cloned it to create their own dvcs, Sapling [2]. (An aside: Facebook did all of this because it was taking too long getting new engineers up to speed with Git. Shocker.)
Today, most of the core of Mercurial has been rewritten in Rust; when Facebook did their testing, Mercurial was nearly 100% Python. That's where the "Mercurial is slow" thing came from; launching a large Python 2.x app took a while back in the day.
I was messing with an old Mercurial repo recently… it was like a breath of fresh air. If I can push to GitHub using Mercurial… sign me up.
You can push to GitHub using Sapling. I wish Sapling open source was given more love, as the experience for non-Facebookers is subpar. No bash completion outside the box, no distro packages, no good help pages, random issues interacting with a Git repo...
Sounds like what my teachers used to say: “a personal problem”. Literally nobody outside FB knows what they’re missing and until they fix that, literally nobody cares.
No, the "hg is fast" marketing claim that retreated to "hg is Big-O fast and you are dumb for caring about constant terms and factors even if they clearly dominate your use case" predates 2014 and the Facebook patches. These talking points were old in 2010. Mercurial was always dog slow and always gaslighting about it.
I'm glad BigCo made tools to serve their needs, but their needs aren't my needs or most peoples' needs.
> Mercurial has been rewritten in Rust
I'm glad they saw the light eventually! Ditto for the rest of the Rust Tooling Renaissance.
What is kind of funny here is that you're right locally. At the same time, the larger tech companies (Meta and Google, specifically) ended up building off of hg and not git because (at the time, especially) git cannot scale up to their use cases. So while the git CLI was super fast, and the hg CLI was slow, "performance" means more than just CLI speed.
I was never a fan of hg either, but now I can use jj, and get some of those benefits without actually using it directly.
>At the same time, the larger tech companies (Meta and Google, specifically) ended up building off of hg and not git because (at the time, especially) git cannot scale up to their use cases.
Fun story: I don't really know what Microsoft's server-side infra looked like when they migrated the OS repo to git (which, contrary to the name, contains more than just stuff related to the Windows OS), but after a few years they started to hit some object scaling limitations where the easiest solution was to just freeze the "os" repo and roll everyone over to "os2".
They wrote something that allowed them to virtualize Git -- can't remember the name of that. But it basically hydrated files on-demand when accessed in the filesystem.
The problem was I think something to do with like the number of git objects that it was scaling to causing crazy server load or something. I don't remember the technical details, but definitely something involving the scale of git objects.
Probably a lot of Googlers don't know. It's ancient history, was called google3 even in 2006 when I first joined.
google1 = code written by Larry, Sergey and employee number 1 (Craig). A hacky pile of Python scripts, dumped fairly quickly.
google2 = the first properly engineered C++ codebase. Protobufs etc were in google2. But the build system was some jungle of custom Makefiles, or something like that. I never saw it directly.
google3 = the same code as google2 but with a new custom build system that used Python scripts to generate Makefiles. I suppose it required a new repository so they could port everything over in parallel with code being worked on in google2. P4 was apparently not that great at branches and google3 didn't use them. Later the same syntax for the build files was kept but turned into a new languages called Starlark and the Makefile generator went away in favor of Blaze, which directly interpreted them.
Yes, the server is based on Perforce, called Piper, but the CLI is based on mercurial. So locally you're doing hg and then when you create a CL, it translates it into what p4 needs.
Right, and I'm glad there are projects serving The Cathedral, but I live in The Bazaar so I'm glad The Bazaar won.
The efforts to sell priest robes to fruit vendors were a little silly, but I'm glad they didn't catch on because if they had caught on they no longer would have been silly.
I might be the outlier, but am I the only one who doesn't care much about the speed of git?
I've been using git since 2011 as my main vcs for personal and professional work as a freelancer contractor. Whenever I "wait" for git, it is either limited by the bandwidth (git clone) or by the amount of commit hooks that I implemented for linting, verification etc. The percentage of time actually spent in git internal execution must be a tiny fraction of my day to day usage. What IS affecting me (and my the teams I work in) is usability and UX experience. I.e. if people would screw up stuff (no matter if in git or mercurial) we spent far more time fixing this - I don't think the impmentation speed would matter here.
The only case I can imagine is when doing a full checkout of a big repo, but even there, there is --depth which is quite practical.
Isn't it kind of like how you don't care much about the oxygen content of the air around you, but you'd miss it if it was gone? I've done development with Mercurial, simple processes were irritatingly slow, particularly if you stray from the better-supported opinionated path.
I spent a long time educating teams of developers about git's usability quirks. I don't do that as much anymore - partly because the quirks have been worked out, partly because the developers have better guardrails and resources to learn from.
This whole time (the past 15 years) git has been getting faster without most of us noticing, because big companies have been investing in speeding it up. The reason you don't notice or care is that they work on a very different scale. Thousands of users, thousands of PRs per day, millions of CI/CD jobs all hitting the repo.
Now the cycle is repeating again because these numbers are shooting through the roof because of agentic coding.
Define "large"; I've never ran into serious performance issues during the ~15 years I've used Git, which either means the projects I've worked in aren't actually large large, or Git is fast enough for most use cases.
not OP, and indeed git is fast-enough in many cases, but git not cutting it at Google and Facebook scale, combined with the versatility of mercurial (monkeypatching and extensions system) was the reason why they both invested heavily in mercurial instead of git.
Among the tricks being used was remotefilelogs, which is a way to "hydrate" content locally on-demand, which was mimicked in git many years later with Microsoft's git-vfs. Same goes with binary/large files that git eventually got as git-lfs.
It's funny to think that a big reason for git to be "fast" today is by playing catch-up with mercurial, which carries this "forever stigma" of being slow.
Mercurial's model is different from Git that these things you list does not make sense there.
Rebase does not make sense in Mercurial because it has the concept of fixed branches. A commit is permanently linked to the branch on which it was made. So you are supposed to use merges.
I know. It's an opinion about how to develop that a lot of people hold - a declining proportion, mind you, like Mecurial's declining market share - and it's one that they're able to represent in Git's model, with Git's features. They're even able to do it without exposing me to it. But the same isn't true in reverse. Strictly superior?
Believe me, I tried to have an open mind about it. Then one day I was getting ready to go on a work trip with a half-finished feature on my work laptop, and realised there was simply no in-model way for backing that wip up to the repo. If I lost my laptop, I lost the progress. mercurial-scm fails at SCM.
>in-model way for backing that wip up to the repo.
That is because you have this notion of a "clean history", (which IIUC prevented you from making this permanent wip commit) which in reality does not have a lot of use. For most project, "useful history" or "real history" is better than a "clean" history.
> one that they're able to represent in Git's model, with Git's features. They're even able to do it without exposing me to it. But the same isn't true in reverse. Strictly superior?
not sure what you mean to say, but for thoroughness' sake, no: git and mercurial concepts are not interchangeable, with git having mostly an inferior model.
To give examples: git has no concept of branching (in the way every VCS but Git uses the term). A branch in git is merely a tag on the tip of a series meant to signify that all ancestors belong to the same lineage. This comes with the implication that this lineage information is totally lost when two branches merge (you can't tell which side of the merge corresponded to which lineage). The ugly and generalised workaround is to abuse commit message (e.g. "merge feat-ABC into main") to store an essential piece of the repository history that the VCS cannot take.
Another example is phasing: mercurial records at commit level whether it was exchanged with others or not. That draws a clean line between the history that's always safe to rewrite, and which that is subject to conflicting merges if the person you shared those commits with also happened to rewrite them on their end.
> Then one day I was getting ready to go on a work trip with a half-finished feature on my work laptop, and realised there was simply no in-model way for backing that wip up to the repo. If I lost my laptop, I lost the progress. mercurial-scm fails at SCM.
Sorry to be blunt, but that's a skill issue: hg is no different than every other VCS in that regard. If you want your WIP changes to leave your laptop, you've got to push them somewhere, just like you would in git.
I'd like to fill up some inaccuracies in your response:
- rebasing in Mercurial simply means chopping a subtree off of the history and re-attaching it to a different parent commit. In that sense, rebasing is a very useful and common history-rewriting operation. In fact, it's even simpler and more powerful/versatile than in git, because mercurial couldn't care less if the sub-tree you are rebasing belongs to a branch or not: it's just a DAG. It gets transplanted from A to B. A may or may not be your checked commit, or be the tip of a branch, doesn't matter.
- that mercurial requires a configuration toggle before rebasing can be used (i.e. that the user need to enable the extension explicitly) is a way to encourage interested users to learn their tool, and grow its capabilities together with their knowledge. It's opinionated, it may be too much hand-holding for some, but there is an elegant simplicity in keeping the help pages and autocomplete commands just as complex as the user can take it.
Sure, but since commits have a branch attribute attached to them, "rebasing" does not appear to be "first class". It is something that has to be bolted on with an extension.
> because mercurial couldn't care less if the sub-tree you are rebasing belongs to a branch or not
IIUC Git also does not care much about the rebase target being a "branch".
I agree that Mercurial provides more value out of the box than git because it preserves branch info in commits.
I can live with Git because Git is "enough" if used carefully and after coming to terms with the non-intutive UI.
It doesn't seem to support Mercurial though (not to imply that you were implying that it did). All I can find in this proxy/mirror thing to integrate it by presenting the Mercurial repo as a Git server:
https://peterlavalle.github.io/post/forgejo-actions/
Whatever your opinion on one tool or another might be - it does seem weird that the "market" has been captured by what you are saying is a lesser product.
So far you've only gotten responses to "how can a worse product win?", and they are valid, but honestly the problem here is that Mercurial is not a better product in at least one very important way: branches.
You can visit any resource about git and branches will have a prominent role. Git is very good at branches. Mercurial fans will counter by explaining one of the several different branching options it has available and how it is better than the one git has. They may very well be right. It also doesn't matter, because the fact that there's a discussion about what branching method to use really just means Mercurial doesn't solve branches. For close to 20 years the Mercurial website contained a guide that explained only how to have "branches" by having multiple copies of the repository on your system. It looks like the website has now been updated: it doesn't have any explanation about branches at all that I can find. Instead it links to several different external resources that don't focus on branches either. One of them mentions "topic", introduced in 2015. Maybe that's the answer to Git's branching model. I don't care enough to look into it. By 2015 Git had long since won.
Mercurial is a cool toolbox of stuff. Some of them are almost certainly better than git. It's not a better product.
To me mercurials branching is closer to the development process and preserves more information, because it records the original branch a commit was made.
Git does not have such concept. That is a trade off and that trade off works great for projects managed like Linux kernel. But for smaller projects where there is a limited number of people working, the information preserved by mercurial could be very valuable.
It also had some really interesting ideas like change set evolution, which enabled history re-writing after a branch has been published. Don't know its current status and how well it turned out to be..
Just FTR - git /can/ store that information, but it requires human input.
If you rebase the feature branch into the main branch THEN follow it up with the merge commit that records the branch name you store the branches (that have been made a part of main) and can see where they are in your log
Mercurial's notes can become cumbersome if there are a large number in the repository, but, obviously, humans can sort that out if it gets out of hand
It's interesting that branches, which is a marquee feature of git, became less important at the same time as git ate all the other vcs. Outside of OS projects, almost all development is trunk based with continuous releases.
Maybe branching was an important reason to adopt git but now we'd probably be ok with a vcs that doesn't even support them.
Not sure if it's true. I mean, I do agree with the core of it, but how do you even do PRs and resolve conflicts, if there are no branches and a developer cannot efficiently update his code against the last (remote) version of master branch?
Trunk based development has every developer in the company committing straight to main - no PRs, supposedly no merge conflicts (but reality is that main moves fast and if someone else is working in the same files as someone else, there will be merge conflicts)
A middle ground is small PRs where people are constantly rebasing to the tip of main to keep conflicts to a minimum
Trunk based development is still a hotly debated topic. I personally prefer branches at this point in time, trunk based development has caused me more trouble than it's claimed worth in the past, BUT that could be a me limitation rather than a limitation of the style
This is so strange, because, at a low level, a branch isn't even a "thing" in git. There is no branch object type in git, it's literally just a pointer to a commit, functionally no different from a tag except for the commands that interact with it.
Meanwhile mercurial has bookmarks. TBF I'm not sure when it got those but they've been around forever at this point. The purpose is served.
I think there are (or perhaps were) some product issues regarding the specifics of various workflows. But at least some of that is simply the inertia of entrenched workflows and where there are actual downsides the (IMO substantial) advantages need to be properly weighed against them.
Personally I think it just comes down to the status quo. Git is popular because it's popular, not because it's noticably superior.
> I think there are (or perhaps were) some product issues regarding the specifics of various workflows.
I love jumping in discussions about git branching, because that's a very objective and practical area where git made the playing field worse. Less and less people feel it, because people old-enough to have used branch-powered VCSes have long forgotten about them, and those who didn't forget are under-represented in comparison to the newcomers who never have experienced anything else since git became a monopoly.
Yes, every commit is prefixed with the branch name. Because, unlike mercurial, git is incapable of storing this in its commit metadata. That's ridiculous, that's obscene, but that's the easiest way to do it with git.
Worse products win all the time. Inertia is almost impossible to overcome. VHS vs Betamax is a classic. iPod wasn’t the best mp3 player but being a better mp3 player wasn’t enough to claw market share.
Google and Meta don’t use Git and GitHub. Sapling and Phabricator much much better (when supported by a massive internal team)
I mean, in the fickle world that is TECH, I am struggling to believe that that's what's happened.
I personally went from .latest.latest.latest.use.this (naming versions as latest) to tortoise SVN (which I struggled with) to Git (which I also was one of those "walk around with a few memorised commands" people that don't actually know how to use it) to reading the fine manual (well 2.5 chapters of it) to being an evangalist.
I've tried Mercurial, and, frankly, it was just as black magic as Git was to me.
That's network effects.
But my counter is - I've not found Mercurial to be any better, not at all.
I have made multiple attempts to use it, but it's just not doing what I want.
And that's why I'm asking, is it any better, or not.
Networking effects are significantly strengthened by necessary user buy in. VC is hard, and every tool demands its users to spend a non-significant amount of time learning it. I would guess the time to move from black magic to understanding most of git is ~100h for most people.
The thing is, to understand which one is actually better, you would have to give the same amount of investment in the second tool, which is not something most people are willing to do if the first tool is "good enough". That's how Python became the default programming language; people don't miss features they do not understand.
A little over a decade ago, with only svn experience, I tried both mercurial and git. There was something about how mercurial handled branches that I found extremely confusing (don't remember what), while git clicked immediately - even without reading the manual.
Mercurial has a more consistent CLI, a really good default GUI (TortoiseHg), and the ability to remember what branch a commit was made on. It's a much easier tool to teach to new developers.
Hmm, that feels a bit subjective - I'm not going to say X is easier than Y when I've just finished saying that I found both tools to have a lot of black magic happening.
But what I will point out, for better or worse, people are now looking at LLMs as Git masters, which is effectively making the LLM the UI which is going to have the effect of removing any assumed advantage of whichever is the "superior" UX
I do wish to make absolutely clear that I personally am not yet ready to completely delegate VCS work to LLMs - as I have pointed out I have what I like to think of as an advanced understanding of the tools, which affords me the luxury of not having an LLM shoot me in the foot, that is soley reserved as my own doing :)
"better" in that sentence is very specific. Worse is also worse, and if you're one of the people for whom the "better" side of a solution doesn't apply, you're left with a mess that people celebrate.
Not always, but in this case the superior product (i.e. VHS) won. At initial release, Beta could only record an hour of content, while VHS could record 2 hours. Huge difference in functionality. The quality difference was there, but pretty modest.
I suppose one lesson could be that there are different dimensions of superiority, different products may be superior in different ways.
Of course, products also can win market dominance for reasons external to the product's quality itself (marketing, monopoly lock-in, other network effects, consumer preferences on something other than product quality itself, etc).
> The issue is solely that OG Mercurial was written in Python.
Are we back to "programming language X is slow" assertions? I thought those had died long ago.
Better algorithms win over 'better' programming languages every single time. Git is really simple and efficient. You could reimplement it in Python and I doubt it would see any significant slowness. Heck, git was originally implemented as a handful of low level binaries stitched together with shell scripts.
Every time I've rewritten something from Python into Java, Scala, or Rust it has gotten around ~30x faster. Plus, now I can multithread too for even more speedups.
Python is absurdly slow - every method call is a string dict lookup (slots are way underused), everything is all dicts all the time, the bytecode doesn't specialize at all to observed types, it is a uniquely horrible slow language.
I love it, but python is almost uniquely a slow language.
Algorithms matter, but if you have good algorithms, or you're already linear time and just have a ton of data, rewriting something from a single-threaded Python program to a multithreaded rust program I've seen 500x speedups, where the algorithms were not improved at all.
It's the difference between a program running overnight vs. in 30 seconds. And if there are problems, the iteration speed from that is huge.
To be fair, Python as implement today is horribly slow. You could leave the language the same but apply all the tricks and heroic efforts they used to make JavaScript fast. The language would be the same, but the implementations would be faster.
Of course, in practice the available implementations are very much part of the language and its ecosystems; especially for a language like Python which is so defined by its dominant implementation of CPython.
Fair! I guess I didn't mean language as such, but as used.
But a lot of the monkey-patching kind of things and dynamism of python also means a lot of those sorts of things have to be re-checked often for correctness, so it does take a ton of optimizations off the table. (Of course, those are rare corner cases, so compilers like pypy have been able to optimize for the "happy case" and have a slow fall-back path - but pypy had a ton of incompatibility issues and now seems to be dying).
You don't even need to go all V8, you could just build something like LuaJIT and get most of the way there. LuaJIT is like 10k LOCs and V8 is 3M LOC.
The real reason is that it is a deliberate choice by the CPython project to prefer extensibility and maintainability to performance. The result is that python is a much more hackable language, with much better C interop than V8 or JVM.
Python has a JIT compiling version in GraalPy. If you have pure Python it works well. The problem is, a lot of Python code is just callouts to C++ ML libs these days and the Python/C interop boundary just assumes you're using CPython and requires other runtimes to emulate it.
I've rewritten a python tool in go, 1:1. And that turned something that was so slow that it was basically a toy, into something so fast that it became not just usable, but an essential asset.
Later on I also changed some of the algorithms to faster ones, but their impact was much lower than the language change.
I don’t know if people think this way anymore, but Python gained traction to some degree as a prototyping language. Verify the logic and structures, then implement the costly bits or performance sensitive bits in a more expense-to-produce more performant language.
Which is only to say: that rewrite away from python story can also work to show python doing its job. Risk reduction, scaffolding, MVP validation.
> git was originally implemented as a handful of low level binaries stitched together with shell scripts.
A bunch of low level binaries stitched together with shell scripts is a lot faster than python, so not really sure what the point of this comparison is.
Python is an extremely versatile language, but if what you're doing is computing hashes and diffs, and generally doing entirely CPU-bound work, then it's objectively the wrong tool, unless you can delegate that to a fast, native kernel, in which case you're not actually using Python anymore.
Well, you can and people do use Python to stitch together low level C code. In that sense, you could go the early git approach, but use Python instead of shell as the glue.
Their point was that by offloading the bottlenecks to C, you've essentially conceded that Python isn't fast enough for them, which was the original point made above
Python is by far the slowest programming language, an order of magnitude slower than other languages
One of the reason mercurial lost the dvcs battle is because of its performance - even the mercurial folks admitted that was at least in part because of python
> Are we back to "programming language X is slow" assertions? thought those had died long ago.
Yes we are? The slow paths of mercurial have been rewritten in C (and more recently in Rust) and improved the perf story substantially, without taking away from the wild modularity and extensibility hg always had.
> You could reimplement it in Python and I doubt it would see any significant slowness
I doubt it wouldn't be significantly slower. I can't disprove it's possible to do this but it's totally possible for you to prove your claim, so I'd argue that the ball is in your court.
You must belong to the club of folks who use hashmaps to store 100 objects. It's amazing how much we've brainwashed folks to focus on algorithms and lose sight of how to actually properly optimize code. Being aware of how your code interacts with cache is incredibly important. There are many cases of using slower algorithms to do work faster purely because it's more hardware friendly.
The reason that some more modern tools, like jj, really blow git out of the water in terms of performance is because they make good choices, such as doing a lot of transformations entirely in memory rather than via the filesystem. It's also because it's written in a language that can execute efficiently. Luckily, it's clear that modern tools like jj are heavily inspired by mercurial so we're not doomed to the ux and performance git binds us with.
> You must belong to the club of folks who use hashmaps to store 100 objects.
Apparently I belong to the same club -- when I'm writing AWK scripts. (Arrays are hashmaps in a trenchcoat there.) Using hashmaps is not necessarily an indictment you apparently think it is, if the access pattern fits the problem and other constraints are not in play.
> It's amazing how much we've brainwashed folks to focus on algorithms and lose sight of how to actually properly optimize code. Being aware of how your code interacts with cache is incredibly important.
By the time you start worrying about cache locality you have left general algorithmic concerns far behind. Yes, it's important to recognize the problem, but for most programs, most of the time, that kind of problem simply doesn't appear.
It also doesn't pay to be dogmatic about rules, which is probably the core of your complaint, although unstated. You need to know them, and then you need to know when to break them.
Most code most people work on isn't about algorithms at all. The most straightforward algorithm will do. Maybe put some clever data structure somewhere in the core.But for the vast majority of code, there isn't any clear algorithmic improvement, and even if there was, it wouldn't make a difference for the typically small workloads that most pieces of code are processing.
I'll take it back a little bit, because there _is_ in fact a lot of algorithmically inefficient code out there, which slows down everything a lot. But after getting the most obvious algorithmic problems out of the way -- even a log-n algorithm isn't much of an improvement to a linear scan, if n < 1000. It's much more important to get that 100+x speedup by implementing the algorithm in a straightforward and cache friendly way.
My core complaint is that folks repeat best practices without understanding them. It's simple to provide API semantics that appear like a map without resorting to using hashmap. I fear python style development has warped people's perception for the sake of simplifying the lives of developers. And all users end up suffering as a result.
You barely have to try to have Python be noticeably slow. It's the only language I have ever used where I was even aware that a programming language could be slow.
They died because everyone knows that Python is infact very very slow. And that’s just totally fine for a vast number of glue operations.
It’s amusing you call Git fast. It’s notoriously problematic for large repos such that virtually every BigTech company has made a custom rewrite at some point or another!
Now that is interesting too, because git is very fast for all I have ever done. It may not scale to Google monorepo size, it would ve the wrong tool for that. But if you are talking Linux kernel source scale, it asolutely, is fast enough even for that.
For everything I've ever done, git was practically instant (except network IO of course). It's one of the fastest and most reliable tools I know. If it isn't fast for you, chances are you are on a slow Windows filesysrem additionally impeded by a Virus scanner.
The fact that Git has an extremely strong preference for storing full and complete history on every machine is a major annoyance! “Except for network IO” is not a valid excuse imho. Cloning the Linux kernel should take only a few seconds. It does not. This is slow and bad.
The mere fact that Git is unable to handle large binary files makes it an unusable tool for literally every project I have ever worked on in my entire career.
Takes 21 seconds on my work laptop, indeed a corporate Windows laptop with antivirus installed. Majority of that time is simply network I/O. The cloned repository is 276 MB large.
Actually checking the kernel out takes 90 seconds. This amounts to creating 99195 individual files, totaling 2 GB of data. Expect this to be ~10 times faster on a Linux file system.
Git LFS is a gross hack that results in pain and suffering. Effectively all games use Perforce because Git and GitLFS suck too much. It’s a necessary evil.
I continue to use gerrit explicitly because I cannot stand github reviews. Yes, in theory, make changes small. But if I'm doing larger work (like updating a vendored dep, that I still review), reviewing files is... not great... in github.
Can these tools e.g. do per-commit review? I mean, it's not the UI what's the problem (though it's not ideal), it's the whole idea of commenting the entire PR at once, partly ignoring the fact that the code in it changes with more commits pushed.
Phabricator and even Gerrit are significantly nicer.
Unless you have a “every commit must build” rule, why would you review commits independently? The entire PR is the change set - what’s problematic about reviewing it as such?
There's a certain set of changes which are just easier to review as stacked independent commits.
Like, you can do a change that introduced a new API and one that updates all usages.
It's just easier to review those independently.
Or, you may have workflows where you have different versions of schemas and you always keep the old ones. Then you can do two commits (copy X to X+1; update X+1) where the change is obvious, rather than seeing a single diff which is just a huge new file.
I'm sure there's more cases. It's not super common but it is convenient.
> Unless you have a “every commit must build” rule, why would you review commits independently?
Security. Imagine commit #1 introduces a security vulnerability (backdoor) and the features. Then #2 introduces a non-obvious, harmless bug and closes the vulnerability introduced in #1 [0]. At some point, the bug will surface and rolling back commit #2 will be an easy fix, re-introducing your bug.
Alternatively, one of the earlier commits might, for example, contain credential dumping code. Once that commit is mainlined, CI might either automatically run on it or will be able to be run on it since it's no longer marked as unsafe PR.
[0] Think something like #1 introduces array access and #2 adds a bounds-check in a function a layer above - a reviewer with the whole context will see the bounds check and (possibly) consider it fine, but to someone rolling back a commit the necessity will not be obvious.
Squash merge is an artifact of PRs encouraging you to add commits instead of amending them, due to GitHub not being able to show you proper interdiffs, and making comments disappear when you change a diff at that line. In that context, when you add fixup commits, sure, squashing makes sense, but the stacked diffs approach encourages you to create commits that look like you want them to look like directly, instead of requiring you to roll them up at the end.
Boy do I hate Github/Lab/Bucket style code reviews with a burning passion. Who the hell loses code review history? A record of the very thing that made my code better? The "why" of it all, that I am guaranteed to forget tomorrow morning.
Nobody would be using `--force` or `--force-with-lease` as a normal part of development workflow, of their own volition, if they had read that part of the git-push manpage and been horrified (as one should be).
The magit key sequence for this abominable operation is `P "f-u"`. And every single time I am forced to do it, I read "f-u" as it ought to be read.
Rebase-push is the way to do it (patch sets in Gerrit).
Rebase-force-push is absolutely not.
You see, any development workflow inevitably has to integrate changes from at least one other branch (typically latest develop or master), without destroying change history, nor review history. Gerrit makes this trivial.
It's a bit difficult to convey exactly why I'm so rah-rah Gerrit, because it is a matter of day-to-day experience of
- Well, a single commit of a few lines to maybe a hundred lines *is* the correct unit of code review, rebase, revert etc. Manually "Sizing PRs" to that review context size is utter BS. I have better things to do in life than to book-keep PR sizes. Make a single well-contained, revertible commit. Then keep making those. And now you have a commit history that is clean, that you can merge, bisect, and bulk-revert at will. Octopus merges are a good thing. `git-log` is *designed* to let us view changes in any sequence we wish, *including* the so-called "linear" history. `git log --online`.
- Trivial for committer to send up reviews-preserving rebase-push responses to commit reviews (NO force-push, ever --- that's an "admin" action to *evict* / permanently wipe out disaster scenarios such as when someone accidentally commits and pushes out a plaintext secret or a giant blob of the executable of the source code etc.).
- Fast-for-the-reviewer, per-commit, diff-based, inline-commenting code reviews.
- The years-apart experience of being able to dig into any part of one's (immutable) software change history to offer a teaching moment to someone new to the team.
Same here. Don't understand why Github hasn't supported this until now. I'm tired of reviewing PRs with thousands of lines of changes, which are getting worse nowadays with vibe coding.
What does Facebook use internally these days. I'm amazed that the state of review tools is still at or behind what we had a decade ago for the most part.
Previous iterations have been a bit dated in terms of UI, but modern versions are pretty good. What interactions are bizzare? Leaving comments, approving a change and running presubmit tests are all pretty straightforward.
tangled.org supports native stacking with jujutsu, unlike github's implementation, you don't need to create a new branch per change: https://blog.tangled.org/stacking/
You should definitely try out https://github.com/hokwangchoi/pilegit. It's platform-agnostic and I use for my workflow with Phabricator, Github, Gitlab and Gitea. No learning curves for cross-platform operations!
My understanding was that that was more a function of how arc submitted stuff to Phabricator, rather than solely Phabricator itself. arc at submission time submitted a bunch of different commits as a single Phabricator DREV or whatever the terminology is/was (basically a DREV is the {domain}/D123 webpage you'd do a review on). But other tools that submitted commits to Phabricator instances (and maybe even arc itself with the right flag?) submitted each commit as its own separate DREV, so each commit got its own separate /D{N} page and its own review, but all linked together in a stack. And then still landed as separate commits in the actual repo. This is how code submission works with Mozilla's use of Phabricator.
You can get what you want from `git log --first-parent` without having to toss out information.
See how the Linux kernel handles git history to see a good example of non-linear history and where it helps. They use merge commits, ie commits with more than one ancestor, all the time.
Right but recursion is only a smaller part of why the optimization is important. It means tail-called functions still build on the stack and long function chains—as is common with fp—can overflow
They may, but the amount of information they're getting is low.
As a hiring manager I'll look at progression of titles *within a company*. This shows a track record of upward mobility. But if they go from "senior" in one company to "principal" in another, I find it meaningless.
> As a hiring manager I'll look at progression of titles within a company. This shows a track record of upward mobility.
That's quite shallow for those who are 'Member of Technical Staff' which does not have this which is why titles are meaningless for experienced candidates.
Someone can give themselves that title, all because they know the founders; thus it can be exploited.
So instead, I get the candidate to exactly explain to me what did they actually build / do and how much money did they make / save the organization and it must be in the millions to qualify or did they build side-projects that contributed to this or not.
In this era, "titles" aren't enough and you need verifiable proof of work with monetary returns in the millions and I favour those who just build things that make money without asking permission from a manager.
> In this era, "titles" aren't enough and you need verifiable proof of work with monetary returns in the millions and I favour those who just build things that make money without asking permission from a manager.
While I like this approach personally, it's worth noting that lots of companies will fire people like you say you want, particularly if their manager is threatened by them (or they're difficult to work with etc).
I am also surprised, but not because I believe Meta to care about the ethics of the whole thing. After all their privacy scandals, I’d assume they’d have policies in place to prevent something that can so easily be leaked. But here we are
The subject of the function coloring article was callback APIs in Node, so an argument you need to pass to your IO functions is very much in the spirit of colored functions and has the same limitations.
The coloring is not the concrete argument (Io implementation) that is passed, but whether the function has an Io parameter in the first place. Whether the implementation of a function performs IO is in principle an implementation detail that can change in the future. A function that doesn't take an Io argument but wants to call another function that requires an Io argument can't. So you end up adding Io parameters just in case, and in turn require all callers to do the same. This is very much like function coloring.
In a language with objects or closures (which Zig doesn't have first-class support for), one flexibility benefit of the Io object approach is that you can move it to object/closure creation and keep the function/method signature free from it. Still, you have to pass it somewhere.
> Whether the implementation of a function performs IO is in principle an implementation detail that can change in the future.
I think that's where your perspective differs from Zig developers.
Performing IO, in my opinion, is categorically not an implementation detail. In the same way that heap allocation is not an implementation detail in idiomatic Zig.
I don't want to find out my math library is caching results on disk, or allocating megabytes to memoize. I want to know what functions I can use in a freestanding environment, or somewhere resource constrained.
> Performing IO, in my opinion, is categorically not an implementation detail. In the same way that heap allocation is not an implementation detail in idiomatic Zig.
It seems you two are coming at this from opposing perspectives. From the perspective of a library author, Zig makes IO an implementation detail, which is great for portability. It lets library authors freely use IO abstractions if it makes sense for their problem.
This lets you, as an application developer, decide the concrete details of how such libraries behave. Don't want your math library to cache to disk? Give it an allocating writer[0] instead of a file writer. Want to use an library with async functionality on an embedded system without multi threading? Pass it a single threaded io[1] runtime instance, implement the io interface yourself as is best for your target.
Of course someone has to decide implementation details. The choices made in designing Zig tend to focus on giving library authors useful abstractions thst give application authors meaningful control over important decisions for their application.
The problem with function coloring is that it makes libraries difficult to implement in a way that's compatible with both sync and async code.
In Python, I needed to write both sync and async API clients for some HTTP thing where the logical operations were composed of several sequential HTTP requests, and doing so meant that I needed to implement the core business logic as a Generator that yields requests and accepts responses before ultimately returning the final result, and then wrote sync and async drivers that each ran the generator in a loop, pulling requests off, transacting them with their HTTP implementation, and feeding the responses back to the generator.
This sans-IO approach, where the library separates business logic from IO and then either provides or asks the caller to implement their own simple event loop for performing IO in their chosen method and feeding it to the business logic state machine, has started to appear as a solution to function coloring in Rust, but it's somewhat of an obtuse way to support multiple IO concurrency strategies.
On the other hand, I do find it an extremely useful pattern for testability, because it results in very fuzz-friendly business logic implementation, isolated side-effect code, and a very simple core IO loop without much room in it for bugs, so despite being somewhat of a pain to write I still find it desirable at times even when I only need to support one of the two function colors.
My opinion is that if your library or function is doing IO, it should be async - there is no reason to support "sync I/O".
Also, this "sans IO" trend is interesting, but the code boils down to a less ergonomic, more verbose, and less efficient version of async (in Rust). It's async/await with more steps, and I would argue those steps are not great.
From a performance perspective, asynchronous IO makes a lot of sense when you're dealing concurrently with a large number of tasks which each spend most of their time waiting for IO operations to complete. In this case, running those tasks in a single-threaded event loop is far more efficient than launching off thousands of individual threads.
However, if your application falls into literally any other category, then suddenly you are actually paying a performance penalty, since you need the overhead of running an event loop any time you just want to perform some IO.
Also, from a correctness perspective, non-concurrent code is simply a lot less complex and a lot harder to get wrong than concurrent code. So applications which don't need async also end up paying a maintainability, and in some cases memory safety / thread safety, penalty as well.
The beautiful thing about the “async” abstraction is that it doesn’t actually tie you to an event loop at all. Nothing about it implies that somebody is calling `epoll_wait` or similar anywhere in the stack.
It’s just a compiler feature that turns functions into state machines. It’s totally valid to have an async runtime that moves a task to a thread and blocks whenever it does I/O.
I do agree that async without memory safety and thread safety is a nightmare (just like all state machines are under those circumstances). Thankfully, we have languages now that all but completely solve those issues.
You surely must be referring to Rust, the only multithreaded language with async-await in which data races aren't possible.
Rust is lovely and all, but is a bad example for the performance side of the argument, since in practice libraries usually have to decide on an async runtime, so in practice library users have to launch that runtime (usually Tokio) to execute the library's Futures.
Sure, but that’s a library limitation (no widespread common runtime interface that libraries such as Tokio implement), not a fundamental limitation of async.
Thread safety is also a lot easier to achieve in languages like C#, and then of course you have single-threaded environments like JS and Python.
Exactly, there is nothing wrong with function coloring. It's a design choice.
Colored functions are easier to reason about, because potential asynchronicity is loudly marked.
Colorless functions are more flexible because changing a function to be async doesn't virally break its interface and the interface of all its callers.
Zig has colored functions, and that's just fine. The problem is the (unintentional) gaslighting where we are told that Zig is colorless when the functions clearly have colors.
As mentioned, the problem with coloring is not that you see the color, the problem is that you can't abstract over the colors.
Effectful languages basically add user-definable "colors", but they let you write e.g. a `map` function that itself turns color based on its parameter (e.g. becoming async if an async function is passed).
I think talking about colouring often misses the point. Sync & async code are fundamentally different; languages without coloured functions make everything async. Everything in go (for instance) is running in an async runtime, and it's all preemptable.
> I don't want to find out my math library is caching results on disk, or allocating megabytes to memoize. I want to know what functions I can use in a freestanding environment, or somewhere resource constrained.
On that vein, I would often like to know whether the function I can is creating a task/thread/greenlet/whatever that will continue executing, concurrently, after it returns. Making that be part of the signature is approximately called “structured concurrency”, and Zig’s design seems to conflate that with taking an io parameter. This seems a bit disappointing to me.
> A function that doesn't take an Io argument but wants to call another function that requires an Io argument can't.
Why? Can’t you just create an instance of an Io of whatever flavor you prefer and use that? Or keep one around for use repeatedly?
The whole “hide a global event loop behind language syntax” is an example of a leaky abstraction which is also restrictive. The approach here is explicit and doesn’t bind functions to hidden global state.
Is that a problem in practice though? Zig already has this same situation with its memory allocators; you can't allocate memory unless you take a parameter. Now you'll just have to take a memory allocator AND an additional io object. Doesn't sound very ergonomic to me, but if all Zig code conforms to this scheme, in practice there will only-one-way-to-do-it. So one of the colors will never be needed, or used.
I think this article does an alright job selling ro over RxGo, but doesn’t really explain why using a reactive library is better than plain go. The channel/goroutine example is fine, as they say, but they hand wave how this will fall apart in a more complex project. Conversely, their reactive example is mapping and filtering an 4 item array and handwave how the simplicity will remain no matter the size of the codebase.
I’ve worked in a few complex projects that adopted reactive styles and I don’t think they made things simpler. It was just as easy to couple components with reactive programming as it was without.
I joined the sapling/subversion company this year, but haven’t had the chance to use jj. But given its resemblance I must say sapling has been great. Much more intuitive than git, and I find commit stacks much easier to follow than branches. I do wonder how it will work without the level of support of Meta, since you won’t have the same commit stack review UI (basically a series of pull requests being reviewed at the same time). So something like what this author is working on is needed.
What he calls import highlighting is something I only find reliably setup in Xcode, but is probably one of the most useful things to highlight for me. To be able to tell at a glance whether I’m looking at a library function or an internal was very useful. Probably doesn’t matter for languages that enforce calling with the module (eg, module::foo()), but in C/Objective-C and Swift it’s annoying when it’s missing.
reply