> The rank-2 type (that is, the type s is scoped within the parenthesis and can't escape) of runST ensures that the mutable references created inside the computation cannot escape due to being tagged with the type s. Internally, all sorts of imperative nonsense may occur. Externally, the function is pure. The world outside the boundary gets none of the mutation, only the result.
C does not have parametric polymorphism, nor rank-2 quantifications, so no, this cannot be done in C.
This was not the original claim in the thread I responded to which as "even with a dynamically typed language, you can keep escaped strings as an Escaped class, with escape(str)->Escaped and dangerouslyAssumeEscaped(str)->Escaped functions (or static methods)." so this was about abstract data types and not polymorphism.
Regardless, you can also have some limited parametric polymorphism in C with macros. This is very poor, but parametric polymorphism in Rust is based on monomorphization so it is also quite poor. You can also have higher-order polymorphism in C but then you need to use subtype polymorphism.
You can have the trap during production, and then it is safer. If you need to catch the problem at run-time, there are checked integer options in C that you can use.
You should know that the type is promoted to int first, which is also what makes your example work. This is what happens when you perform the computation using an non-promoting 8 bit unsigned type: https://godbolt.org/z/fxxva4nWq
For signed overflow we have sanitizers, and for conversions C compilers warnings in C. Bounds checking can also be done with sanitizers (but is a bit more tricky). So no, I do not think the undefined behavior is really a big problem. In fact, it helps us find the problem because every overflow can be considered a programming error.
Error due to unsigned wraparound are a much bigger issue, because the lead to subtle issues where neither automatic warnings nor sanitizers help, exactly because it is well-defined and no automatic tool can tell whether the behavior is intended or wrong.
> Error due to unsigned wraparound are a much bigger issue
This is a type design mistake. The unsigned integers should not wrap by default. It makes absolute sense, given all the constraints and the fact that it's doing New Jersey "implementation simplicity dominates" design that K&R C only provides a wrapping unsigned type, but that's an excuse for K&R C which is a 1960s programming language.
The excuse gets shakier and shakier the further you move past that. C3 even named these types differently, so they're certainly under no obligation to provide the wrapping unsigned integers as if that's just magically what you mean. In most cases it's not what you mean. The excuse given in the article is way too thin.
Rust's Wrapping<u32> is the same thing as the wrapping 32-bit unsigned integer in C or C++ today, but most people don't use it because they do not actually want the wrapping 32-bit unsigned integer. This is a "spelling matters" ergonomics class again like the choice to name the brutally fast but unstable general comparison sort [T]::sort_unstable whereas both C and C++ leave the noob who didn't know about sort stability to find out for themselves because they name this just "sort" and you get to keep both halves when you break things...
> But what I do not believe is that there is a real need for a non-wrapping non-negative integer type.
So the most obvious counter example is so obvious you might not even have remembered it's a type, the unsigned 8-bit integer or byte.
But frankly if you don't have the wrapping mistake they just make for a pretty good general purpose index, they're a useful counter, there's a reason we called these the "Natural numbers".
I am not convinced. A byte is for low-level accessing of memory, you shouldn't really do any computation with it, except maybe low-level bit-fiddling or crypto, but then the non-wrapping non-negative inter is not correct either.
Natural numbers are nice, but then we invented zero and negative number so we got a group structure for addition which is really useful. Because even for a counter, or some index, you may want to to addition and subtraction and then you definitely do not want a non-wrapping non-negative integer for intermediate results.
And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.
Was this last part added or did I just miss it? Huh.
> And the rust design with unsigned type where subtraction does not return a signed type but may fail at return or silently produce the wrong results, seems the worst possible design imaginable to me.
You can ask for whatever you meant, and indeed asking for what you meant is crucial here because if we express ourselves we get the desired results.
For example u8::borrowing_sub lets us do the arithmetic style you may have learned in primary school in which we track whether we "borrowed" one because of our subtractions, this might be useful in some places and is certainly easier to understand.
u8::checked_sub tells us either the answer or that it would overflow, which might allow us to take a different course of action and not need the subtraction.
u8::saturating_sub performs saturating arithmetic, if it would overflow we get the largest value in the appropriate direction instead, this often makes sense in e.g. signal processing.
u8::unchecked_sub promises we know the subtraction doesn't overflow and so no checks are needed, this is a performance optimisation if you really need it.
u8::wrapping_sub_signed performs the wrapping arithmetic you say is sometimes a good idea, with specifically a signed i8 parameter rather than an unsigned one if we want that.
The truth here is that you might want a lot of different operations and the C choice is not only to provide a single choice, which made a lot more sense 50+ years ago than it does today, but to provide a singularly bad default.
It was added, but immediately after the rest. I was just quickly refreshing my memory on what Rust was doing. I think it is a terrible design.
If you want special function which protects you from errors in specific scenarios it is easy enough to do this in C. But I do think the C defaults are actually ok and having all the wrapper functions and boiler plate has its downsides too (What I would admit is bad in C are the default warning modes in C compilers).
> The truth here is that you might want a lot of different operations and the C choice is not only to provide a single choice, which made a lot more sense 50+ years ago than it does today, but to provide a singularly bad default.
C has a single '+' operator, just like Rust has. And what that operator does depends on the types to the left and to the right. You can cast between integer types to achieve different behaviours depending on what you want.
About u8::unchecked_sub() etc, those are just regular functions. Not really a language thing. Yes, nothing of that is standardized in C AFAIK, but I'll happily use e.g. __builtin_add_overflow() or whatever in practice.
We can argue all day long what are the right defaults, checked or unchecked operations. If you want to be safe, you want the compiler to emit checks. It's probably possible to get some of those in GCC. If you want to emit streamlined machine code, you'll definitely not want to add checks after every machine instruction.
> About u8::unchecked_sub() etc, those are just regular functions. Not really a language thing. Yes, nothing of that is standardized in C
Well "regular functions" in the sense that these are methods of the primitive type u8†, and of course neither C nor C++ can do that at all. So, yeah, it's a language thing.
In C++ what you'd do here instead is invent custom types and add the methods you want to the types, and I would give C++ credit here if the stdlib provided say, a bunch-of-bits base type with all the bit-twiddling methods defined and maybe specialisations for the 32-bit and 8-bit unsigned integers or something, but AFAICT it doesn't do anything like that.
"I could go out of my way to do this" is true for everything in any of the general purpose languages by their nature.
† In Rust if we define a function associated to a type T with a "self" first parameter then you can call that function as just a method on any value of type T and the appropriate parameter is inferred. So e.g. u8::checked_sub(u8::MAX, 10) is Some(245) but u8::MAX.checked_sub(10) is also Some(245) because it de-sugars to the same call.
__builtin_add_overflow() is type-generic as well, what is the big difference to this Rust stuff? I'd say Rust is more ergonomic here (standardized calls, I don't have to resort to GCC builtins). But it's not fundamentally different in capabilities.
I really don't care about functions vs methods, what is the difference, it's just syntax. Actually, keeping to regular function calls is mostly more readable to me, compared to using methods, using short unqualified method names, mixing function calls and methods calls, nesting/chaining functions and methods.
Sure, other than the ergonomics it's the same. But, other than the ergonomics Fortran and C# are the same so what are we even talking about ?
Better ergonomics means it's more likely the programmer writes what they actually meant, and if you do that modern compilers have got better at making what you meant fast even as they remain the same or perhaps slightly worse at converting vague gestures which aren't clear about what you meant into what you had hoped for without expressing it.
The wrapping APIs do come up a lot in cryptography, but in bit twiddling I think they're as often a hindrance because we actually want to be pulled up short if we're trying to squeeze things where they won't fit.
It have definitely written C code which tries to use 257 values for a byte, with zero playing both its role as "just zero" in some places, and then also serving as 256 because "it's never zero" in other places and of course this is a nasty bug if one of those "it's never zero" zeroes gets into the "it's just zero of course" code paths or vice versa.
The "Wrapping will fix my arithmetic ordering" thing is in this article too and I think that's also a terrible idea, maybe even worse than the wrapping unsigned integer types themselves because it leads to a muddled idea of what's really going on.
Just this week I've had a C compilers silently delete me an entire function call because of UB (infinite loop without side effects). Took me a day to figure out. So that's a problem for me.
I don't think I've ever had an hard to debug issue in Go because of signed/unsigned wrap around. Particularly a memory issue.
If anything, and there I guess I agree with the article, I wish Go had implicit conversions to wider types: to make the problematic ones stand out.
I guess the reason it doesn't is that they're different named types, which would be a problem when you create a named type for the purpose of forcing explicit type conversions. But maybe the default ones could implicitly implement a numeric tower, where exact conversions can be implicit.
That depends. But some sanitizer are cheap enough that you can usually always run them.
Regarding infinite loops, C++ and C differ with C++ being more aggressive. But also compilers differ with clang being more aggressive. https://godbolt.org/z/Moe6zYKqo
In general, I do not recommend to use clang if you worry about UB. gcc is a bit more reasonable and also has better warnings.
It was undefined what happens at unsigned overflows and underflows. Therefore a compiler could choose to implement "unsigned" as either non-negative numbers or as integer residues.
The fact that "sizeof" is unsigned and the implicit conversions between "unsigned" numbers are consistent only with non-negative numbers. Therefore the undefined behavior should have been defined correspondingly.
Instead of this, at some version of the standard, I am lazy to search it now, but it might have been C99, they have changed the behavior from undefined to defined as the behavior of integer residues.
I do not know the reason for this choice, it may have been just laziness, because it is easier to implement in compilers and it leads to maximum performance in the absence of bugs. In any case this decision has broken the standard, because the arithmetic operations have become incompatible with the implicit conversions between "unsigned" types and with the semantics of "sizeof", which must be non-negative.
For non-negative numbers, the correct conversions are from smaller sizes to bigger sizes, while for integer residues the correct conversions are only in the opposite direction, from bigger sizes to smaller sizes (e.g. a number that is 257 modulo 65536 is also 1 modulo 256, so truncating it yields a correct value, while a number that is 1 modulo 256 when modulo 65536 it could be 257, 511, 769 etc. so you cannot extend it without additional information).
Judging from the implicit conversions, it is clear that the intention of the designers of C during the seventies was that "unsigned" numbers must be non-negative integers and not integer residues. The modern C standard is guilty of the current inconsistencies that greatly increase the chances of bugs
My copy of K&R already has unsigned modulo arithmetic: "unsigned numbers
are always positive or zero, and obey the laws of arithmetic modulo 2n, where n is the number of bits in the type." So if it changed it was before that, but don't think so.
I get your argument about the conversion order, but I do not buy it in terms of language design. You also do not want to go to a quotient ring implicitly, so I do not agree that this conversion direction would be more "correct" for implicit conversion either and from a practical point of view the C design is defensible.
I think the motivation originally was merely to expose the common capabilities of the hardware, nothing more. What we miss from this perspective are polynomials over F_2, but nobody pushed for this too hard so far.
I find it the opposite. Unsigned integers are intuitive, while signed integers are unintuitive and cause a lot of tricky bugs. Especially in languages, where signed overflow is undefined behavior.
It's pretty rare to have values that can be negative but are always integers. At least in the work I do. The most common case I encounter are approximations of something related to log probability. Such as various scores in dynamic programming and graph algorithms.
Most of the time, when you deal with integers, you need special handling to avoid negative values. Once you get used to thinking about unsigned integers, you quickly develop robust ways of avoiding situations where the values would be negative.
It is interesting that you find unsigned integers more intuitive. My experience (also with students, but also analysis of CVE give plenty of evidence) is that the opposite is true: signed integers in C are a model of integers which have a nice mathematical structure which people learn in elementary school. Yes, this breaks down on overflow, but for this you have to reach very high numbers and there is very good tooling to debug this. In contrast, unsigned integers in C are modulo arithmetic which people learn at university, if at all, and get wrong all the time, and the errors are mostly subtle and very difficult to find automatically.
You are right that often you need to constrain an integer to be non-negative or positive, but usually not during arithmetic, but at certain points in the logic of a program. But then in my experience it is better expressed as some assertion.
I work in bioinformatics. The numbers are typically large enough that you either have to think about numeric limits all the time if you use 32-bit integers (or bit-packed arrays), or you end up wasting (tens of) gigabytes with 64-bit integers.
I've also done a lot of succinct data structures, data compression, and things like that. When you manipulate the binary representation directly, it's easier to connect representation to unsigned semantics than to signed semantics.
Unsigned integers are usually integers modulo 2^n, which gives them a convenient algebraic structure. Whether you find that intuitive or not probably depends on your education. From my perspective, abstract algebra and discrete mathematics are things you learn in the first year of your CS degree.
Signed ints are also integers modulo 2^n, as concerns +, -, and *. Both unsigned and signed ints have the exact same modular arithmetic structure, for +, -, and *. It is only for other operations (ordering comparisons, or / and %) where they differ, and on these operations, neither signed nor unsigned ints have any convenient algebraic structure commonly encountered elsewhere.
Modulo arithmetic is taught to American students as clock arithmetic in elementary school as well. Signed integers are better described as truncated 2-adics, which are definitely an advanced topic. It's basically a Greenland/Iceland situation. The complicated, difficult one is given a friendly name to trap naive programmers.
Signed ints are no more and no less truncated 2-adics than unsigned ints. Signed ints and unsigned ints are isomorphic as rings; they are both the ring of integers modulo some 2^n. It is only on other operations (ordering comparisons, or the / and % of C) where they differ, and in neither case do these operations have anything to do with 2-adics.
It's a much more useful correspondence than just an isomorphism. Take your base-2 2-adic integer. Truncate it to whatever bitwidth you want. That's the two's complement representation of that number. Unsigned pops out as a special case, obviously.
I've mostly seen this used by cryptographers to prove results over arbitrary bitwidths, since you can prove it over the p-adics and deal with truncation/division separately.
The fixed-size signed types of C etc are no more the actual integers, and no less modular arithmetic, than the fixed-size unsigned types. They both implement the exact same modular arithmetic for +, -, and *. It's only for other operations (ordering comparisons, or / and % which in turn are defined in C in terms of ordering structure, as in rounding towards zero) where they differ. And in either case, that ordering structure is not one commonly encountered anywhere outside of the context of fixed size computer arithmetic.
Ask an ordinary person what 3 * (1/3) or 3 * (-1/3) should come to, and they aren't going to say any of the results that you get in C for either signed or unsigned int types.
In Germany I learned module arithmetic at something like 13 years old in high school. I also find unsigned much more intuitive and in my code only use sized signed when I absolutely have to. Which is not often. Only in languages that offer automatic big Ints I just use ints, like Python and Haskell.
Why does an unsigned type for sizes or indices fare worse than a signed type? When do I want the -247th element in an array? When do I have a block that is -10 bytes in size?
Because doing subtraction on sizes/indicies is common, and signed handles the case where you subtract below 0. Unsigned yields unintuitive results. i.e, unsigned fails silently. For example, looping to the 2nd to last item in an array or getting the index before the given index.
The source of confusion is that unsigned is a terrible name. Unsigned does not mean non-negative. Its 100% complete valid to assign a negative value to an unsigned, it just fails silently.
If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime.
> The source of confusion is that unsigned is a terrible name. Unsigned does not mean non-negative. Its 100% complete valid to assign a negative value to an unsigned, it just fails silently.
C’s implicit casts are tripping you up. Unsigned ints can’t be negative, but C will happily let you assign a negative signed int to an unsigned int variable, but the moment it is assigned it ceases to be negative. In serious programming languages this implicit assignment is forbidden—you have to explicitly cast.
> For example, looping to the 2nd to last item in an array or getting the index before the given index.
I don’t understand what you mean here, can you clarify?
> If you want non-negative integers, then you should make a wrapper class that enforces non-negativity at compile and runtime.
Unsigned integers are the compile time side of the coin, but yes you may want to take care to enforce it at runtime as well, though this typically implies a performance penalty that most don’t want to pay.
In C your compiler can help you with conversions and if not, please use a better one. In this regard, C is a very pragmatic language, and hence for actual work it is a more "serious" programming language than programming languages which are based on some idealistic theory that pedantic typing will fix all your problems, but actually keep you from doing your job.
Warnings are just noise, so there's no point in printing them--they will be ignored (maybe not when there is a singular warning, but if warnings are allowed to accumulate beyond some manageable threshold). If a warning is worth printing, it should be treated as an error, and if you treat it as an error, you now are "strict" by definition.
Any reasonably good C code I ever worked with aimed to be warning free. But yes, if you can also make it an error. The flexibility is important though.
Regardless of whether you're "aiming for the code to be warning free" or telling the compiler to turn the warning into an error, you will make the implicit cast explicit and move on with your day. You've already said you should use your tools to flag these errors and that aiming to be warning free is a good thing, so I don't understand where we disagree, especially when making implicit casts explicit costs a single-digit number of keystrokes.
I was disagreeing with the statement that a "serious" language needs to have this hard-coded. think the C model where you can have the strictness if one wants to, but one can also opt-out is better.
the reason is not that you want a negative index or size, but that you want the computation of the index to be correct, and you want to have obvious errors. Both turns out to be easier with signed types.
There are (rare) times when you want negative array indices. C lets you index in both directions from a pointer to the middle of an array. That's why array indexing is signed in C. Some libc ctypes lookup tables do this. For sizing there is no strong case for negatives other than to shoehorn them into signed operations.
C23 updated the definition of the [] operator to disallow negative subscripts with array type. I think you have to explicitly convert the array to a pointer type now.
That’s interesting but seems pretty dangerous. How do you know you aren’t going to decrement off the front of the array? Keeping the pointer to the first element in the array and using offsets seems safer for humans and I don’t think the computer would care.
Kinda a smart alec response, but how do you know you aren’t going to increment off the end of the array when operating normally? I guess it is twice the danger.
You never want any element of an array, except elements within the range [0, array_length). Anything outside of that is undefined behavior.
I think people tend to overthink this. A function which takes an index argument, should simply return a result when the index is within the valid range, and error if it's outside of it (regardless of whether it's outside by being too low or too high). It doesn't particularly matter that the integer is signed.
If you aren't storing 2^64 elements in your array (which you probably aren't - most systems don't even support addressing that much memory) then the only thing unsigned gets you is a bunch of footguns (like those described in the OP article).
Let's say you have two indices into an array: a and b. You want to know how much earlier a is than b so you compute b-a, but as b was in fact the earlier one you get a negative number.
You can deal with this by casting before doing the subtraction, or you can deal with it by storing the indices as signed integers at all times. The latter is more ergonomic at the cost of wasted capacity.
Renaming c. to .cpp may work with ancient c89 code, but not with anything remotely modern. But while the code then is technically C++, it is not better. I still prefer C for new projects to any other language, because I value short compilation time and reduced complexity. For me, this translates in higher productivity and more fun. With modern tooling, also most C issues are detected early.
Slightly tweaking might not always be sufficient. Reengineering my numerical code would certainly a bit of effort. But anyhow, I do not think C++ is better. Recently I removed one (!) file with templates (which someone else added) from one of my project because it doubled compilation times (in a project with 750 other files or so). I do no need slow build times, more complexity, and more footguns.
This is a misrepresentation based on a misunderstanding on how standardization works. The C standard committee has long recognized the need for better safety and carefully made it possible so that C could be implemented safely. But the process is that vendors implement something and then come together during standardization so that it is compatible, not that the standardization is the government that prescribes top-down what everybody has to do. Vendors did not bother to provide safer C implementations and safety features (such as bounds checking) did not get much attention from users in the past. So I am happy to see that there is now more interest in safety, because as soon as there solutions we can start putting them into the standard.
(We can do some stuff before this, but this is always a bit of a fight with the vendors, because they do not like it at all if we tell them what to do, especially clang folks)
Stop mixing C and C++, tons of people on Unix still hate C++ (Motif a bit less) for being un-Unixy and megacomplex, even more today. Die had Unix and C people created Plan9 and now Go, which is maybe the other succesor to C before Inferno and Limbo, where programming it's more simpler than the whole C and POSIX clusterfux (even Plan9 and 9front itself can be called a "Unix 2.0").
C++ is something else. Heck, it's often far more bound to a Windows domain (and for a while Be/Haiku) than Unix itself by a huge stretch.
It is probably worth noting that C++, like C/Unix, originated at AT&T Bell Labs and was originally referred to as "C with classes." Classes were implemented using a preprocessor.
Unix creators called Unix "dead and rotten" because the eulogy was done by Perl, and Plan9/9front and Inferno obliterated it. Ditto with C+POSIX against Plan9's C (and 9front) and Inferno vs Limbo, the grandparents of Golang, which is seen from Pike and so as the tool set C++ should have been.
Golang it's like Windows NT. C++ it's like Windows ME, it might have their cases on RT performance and multimedia because of having far less layers than NT, (and much better on single core), but it crumbles down fast and it was really easy to shoot yourself in the foot. Windows 2000 and XP killed it for the good.
Some day Golang would be performant enough (even with CSP) with multiple cores so all the 'performance' advanteges -suppossedly C++ brings- aren't needed at all.
Even C# can be as good as C++ today in tons of cases (AOT and emulators like Ryuyinx are not a bluff), even SBCL for Common Lisp too if you finetune the compiling options.
To clarify, I do agree with you that C and C++ have been two distinct languages for a very long time. And C++ doesn’t have much in common with POSIX.
What I disagree with is the idea that C++ was developed completely independently of C (and Unix) - it originated at Bell Labs and was initially just an extension of C with classes. If you looked at the document I linked to, you would see that Bjarne Stroustrup thanks Dennis Ritchie in it for being a source of good ideas and useful problems. I don’t think I need to explain who Dennis Ritchie was for C and Unix.
reply