In a way this is an argument for languages where it's normal to have result-type...

nayuki · on Jan 10, 2025

In other words, this is a case of https://en.wikipedia.org/wiki/In-band_signaling versus out-of-band signaling https://en.wikipedia.org/wiki/Signaling_(telecommunications)... .

EricRiese · on Jan 10, 2025

Clears throat the precise wiki page you're looking for is the Semipredicate Problem

https://en.wikipedia.org/wiki/Semipredicate_problem

kazinator · on Jan 11, 2025

-1 is in-band signaling; but it steals a value.

kevinventullo · on Jan 10, 2025

We all know that types are not Pythonic.

(I’m only mostly joking)

folkrav · on Jan 10, 2025

Ackchyually, Python has pretty strong typing, as far as dynamic languages go.

Insanity · on Jan 10, 2025

If you use the actual type system and something like mypy, Python is a joy work with. (For my definition of joy, which includes static typing).

IshKebab · on Jan 10, 2025

Actually writing Python is reasonably nice in that case.

But dealing with Python infrastructure is so awful as to make the whole experience just bad.

uv fixes a lot of that, but I think it will be some time before it's used everywhere, and I have zero hope that the Python devs will ever do the right thing and officially replace Pip with uv.

arccy · on Jan 10, 2025

you also have to trawl through the flags to make sure it actually checks types, e.g. check_untyped_defs

bhickey · on Jan 10, 2025

Sure. Everything is a PyObject.

nas · on Jan 11, 2025

Every PyObject structure has a ob_type pointer.

ackfoobar · on Jan 11, 2025

Every Object in Java has a getClass method. If we changed all the static type information in Java to the `Object` type, that'd be pretty close to Python's case.

GP's post is probably the "unityped" critique of dynamically typed languages by Robert Harper: https://existentialtype.wordpress.com/2011/03/19/dynamic-lan...

bhickey · on Jan 11, 2025

Thank you for sharing. I hadn't read that critique but I am in wholehearted agreement. Dynamic typing imposes significant cognitive overhead in exchange the privilege of letting you writing incorrect programs.

kazinator · on Jan 11, 2025

1. But, Python has exceptions! The underlying C language doesn't, but Python's run-time has them and can use them in the C code.

2. It may be an argument for ensuring that absolutely everything that is an object can hash: the object hasher must not have error states. Nobody wants the overhead of a simple hash code being wrapped in a result type.

kstrauser · on Jan 10, 2025

Not really, I don’t think. Python code doesn’t ever really use hash() for anything where specific outputs are expected. Even if hash(a)==hash(b), it’s not implied that a==b or a is b.

tmvphil · on Jan 10, 2025

But -2 seems like just a bad choice here. -1 and -2 are very "common" integers. Seems they could have just picked a quasi-random large integer to reduce the likelihood of collision. I expect hash functions to have collisions, but I expect it to be "unlikely" in a way that scales with the size of the hash.

kstrauser · on Jan 10, 2025

I don't think that matters here, though. This specific hash function is literally used just for putting values in dict buckets. The docs say:

> Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup.

They're not to be used as a crypto digest. The downside here, then is that if -1 and -2 are used as dict keys, they'll end up bucketed together. Apparently that hasn't been enough of an issue over the years to bother changing it. One advantage is that those values are common enough that it might break code that incorrectly expects hash() to have predictable outputs. If it used 0xffffffff as the value, lots of busted tests may never stumble across it.

itishappy · on Jan 10, 2025

> The downside here, then is that if -1 and -2 are used as dict keys, they'll end up bucketed together.

Note that they'll only end up bucketed together in the underlying hashmap. The dictionary will still disambiguate the keys.

    >>> a = -1
    >>> b = -2
    >>> {a: a, b: b}
    {-1: -1, -2: -2}

kstrauser · on Jan 10, 2025

Yep, absolutely. End programmers will never see that unless they go digging into the CPython code. It's otherwise invisible.

daxfohl · on Jan 11, 2025

Not only bucketed together, which means they could still be disambiguated in C by the full hash, but they'll actually share the full hash, which means they'd have to jump back into Python code to check `__eq__` to disambiguate IIUC. That seems like a huge perf issue, is it not?

kstrauser · on Jan 11, 2025

Without looking it up, I think that's right, but... that's just kind of inevitable with hash tables. Unless you have a small number of keys and get very lucky, it's exceedingly likely that you'll have multiple keys in the same bucket, and you'll have to use whatever language's `__eq__` equivalent mechanism.

A mitigating factor is that dict keys have to be hashable, which implies immutability, which generally implies a small number of native, fast types like str, numbers, or simple data structures like tuples. You could have some frozendict monstrosity as keys, but 1) that's not something you see often, and 2) don't do that. It you must, define a fast __eq__. For example, an object mapping to a database row might look like:

  def __eq__(self, other):
      if self.pk != other.pk:
          return False # Fail quickly
      # If that's true, only then do a deep comparison
      return self.tuple_of_values == other.tuple_of_values

But again, that's just not really something that's done.

daxfohl · on Jan 11, 2025

Hash *keys* are 8 bytes (assume 64-bit machine). But the Set *buckets*[1] are a masked subset of that, starting at 3 bits and growing with the size of the set. Meaning if you store 16, 8, 32, 24, 0 all in the same set, they'll be in the same bucket (0b000)*. But, they'll still have different hash *keys*, which are stored in the hash table along with the object id. Then if you look up whether 40 is in that set, the C code will look in 0b000 bucket, see the other five values there, but note that each has a different hash *key*, since those are also stored in the table. So it'll know that 40 is not in the set without having to jump up into `__eq__`**.

[1] https://github.com/python/cpython/blob/main/Objects/setobjec...

* Note that the actual implementation will place bucket conflicts in the next available bucket, not the typical "overflow" linked list from CS 102, and does lookups in the same fashion. This approach makes better use of CPU caches.

** For `int` specifically, it's a C class and has no `__eq__` method. But the idea is the same if you have a class with member `i:int` and `__hash__(self): self.i`.

nneonneo · on Jan 10, 2025

I bet -2 is way less common than -1 in a typical codebase, particularly a C one.

Ekaros · on Jan 11, 2025

Are they? At least as keys for most dictionaries. Zero and positive integers sure, but negative ones would be somewhat rare. I could maybe see something like error handling, but then do you need performance there?

TRiG_Ireland · on Jan 10, 2025

Yes, but having result types would avoid the need for this special casing.

Ferret7446 · on Jan 12, 2025

Result types would also make it less efficient.