Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In a way this is an argument for languages where it's normal to have result-types [0] or exceptions, instead of -1 etc. It just feels wrong to have any potential collision or confusion between normal results and error results.

[0] https://en.wikipedia.org/wiki/Result_type




Clears throat the precise wiki page you're looking for is the Semipredicate Problem

https://en.wikipedia.org/wiki/Semipredicate_problem


-1 is in-band signaling; but it steals a value.


We all know that types are not Pythonic.

(I’m only mostly joking)


Ackchyually, Python has pretty strong typing, as far as dynamic languages go.


If you use the actual type system and something like mypy, Python is a joy work with. (For my definition of joy, which includes static typing).


Actually writing Python is reasonably nice in that case.

But dealing with Python infrastructure is so awful as to make the whole experience just bad.

uv fixes a lot of that, but I think it will be some time before it's used everywhere, and I have zero hope that the Python devs will ever do the right thing and officially replace Pip with uv.


you also have to trawl through the flags to make sure it actually checks types, e.g. check_untyped_defs


Sure. Everything is a PyObject.


Every PyObject structure has a ob_type pointer.


Every Object in Java has a getClass method. If we changed all the static type information in Java to the `Object` type, that'd be pretty close to Python's case.

GP's post is probably the "unityped" critique of dynamically typed languages by Robert Harper: https://existentialtype.wordpress.com/2011/03/19/dynamic-lan...


Thank you for sharing. I hadn't read that critique but I am in wholehearted agreement. Dynamic typing imposes significant cognitive overhead in exchange the privilege of letting you writing incorrect programs.


1. But, Python has exceptions! The underlying C language doesn't, but Python's run-time has them and can use them in the C code.

2. It may be an argument for ensuring that absolutely everything that is an object can hash: the object hasher must not have error states. Nobody wants the overhead of a simple hash code being wrapped in a result type.


Not really, I don’t think. Python code doesn’t ever really use hash() for anything where specific outputs are expected. Even if hash(a)==hash(b), it’s not implied that a==b or a is b.


But -2 seems like just a bad choice here. -1 and -2 are very "common" integers. Seems they could have just picked a quasi-random large integer to reduce the likelihood of collision. I expect hash functions to have collisions, but I expect it to be "unlikely" in a way that scales with the size of the hash.


I don't think that matters here, though. This specific hash function is literally used just for putting values in dict buckets. The docs say:

> Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup.

They're not to be used as a crypto digest. The downside here, then is that if -1 and -2 are used as dict keys, they'll end up bucketed together. Apparently that hasn't been enough of an issue over the years to bother changing it. One advantage is that those values are common enough that it might break code that incorrectly expects hash() to have predictable outputs. If it used 0xffffffff as the value, lots of busted tests may never stumble across it.


> The downside here, then is that if -1 and -2 are used as dict keys, they'll end up bucketed together.

Note that they'll only end up bucketed together in the underlying hashmap. The dictionary will still disambiguate the keys.

    >>> a = -1
    >>> b = -2
    >>> {a: a, b: b}
    {-1: -1, -2: -2}


Yep, absolutely. End programmers will never see that unless they go digging into the CPython code. It's otherwise invisible.


Not only bucketed together, which means they could still be disambiguated in C by the full hash, but they'll actually share the full hash, which means they'd have to jump back into Python code to check `__eq__` to disambiguate IIUC. That seems like a huge perf issue, is it not?


Without looking it up, I think that's right, but... that's just kind of inevitable with hash tables. Unless you have a small number of keys and get very lucky, it's exceedingly likely that you'll have multiple keys in the same bucket, and you'll have to use whatever language's `__eq__` equivalent mechanism.

A mitigating factor is that dict keys have to be hashable, which implies immutability, which generally implies a small number of native, fast types like str, numbers, or simple data structures like tuples. You could have some frozendict monstrosity as keys, but 1) that's not something you see often, and 2) don't do that. It you must, define a fast __eq__. For example, an object mapping to a database row might look like:

  def __eq__(self, other):
      if self.pk != other.pk:
          return False # Fail quickly
      # If that's true, only then do a deep comparison
      return self.tuple_of_values == other.tuple_of_values
But again, that's just not really something that's done.


Hash *keys* are 8 bytes (assume 64-bit machine). But the Set *buckets*[1] are a masked subset of that, starting at 3 bits and growing with the size of the set. Meaning if you store 16, 8, 32, 24, 0 all in the same set, they'll be in the same bucket (0b000)*. But, they'll still have different hash *keys*, which are stored in the hash table along with the object id. Then if you look up whether 40 is in that set, the C code will look in 0b000 bucket, see the other five values there, but note that each has a different hash *key*, since those are also stored in the table. So it'll know that 40 is not in the set without having to jump up into `__eq__`**.

[1] https://github.com/python/cpython/blob/main/Objects/setobjec...

* Note that the actual implementation will place bucket conflicts in the next available bucket, not the typical "overflow" linked list from CS 102, and does lookups in the same fashion. This approach makes better use of CPU caches.

** For `int` specifically, it's a C class and has no `__eq__` method. But the idea is the same if you have a class with member `i:int` and `__hash__(self): self.i`.


I bet -2 is way less common than -1 in a typical codebase, particularly a C one.


Are they? At least as keys for most dictionaries. Zero and positive integers sure, but negative ones would be somewhat rare. I could maybe see something like error handling, but then do you need performance there?


Yes, but having result types would avoid the need for this special casing.


Result types would also make it less efficient.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: