Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken hash function #15

Open
kaplun opened this issue Sep 8, 2014 · 11 comments
Open

Broken hash function #15

kaplun opened this issue Sep 8, 2014 · 11 comments
Milestone

Comments

@kaplun
Copy link
Member

kaplun commented Sep 8, 2014

In [7]: from intbitset import intbitset

In [8]: hash(intbitset([1,2,3]))
-6501568158668200174

In [9]: hash(intbitset([1,2,3]))
-6501568158668200174

In [10]: hash(intbitset([1,2,5]))
-7106116959979656662

In [11]: hash(intbitset([1,2,5,1000]))
-7106116959979656662
@kaplun kaplun added this to the v2.2 milestone Sep 8, 2014
@kaplun
Copy link
Member Author

kaplun commented Sep 9, 2014

A possible implementation is based on taking the hash of the (uncompressed) output of fastdump(), however, this might be different for intbitsets having being constructed with trailing_bits=True.

@jirikuncar
Copy link
Member

this might be different for intbitsets having being constructed with trailing_bits=True.

The question is should or should NOT be the such hashes the same?

>>> from intbitset import intbitset
>>> a = intbitset([50,60,100], trailing_bits=False)
>>> b = intbitset([50,60,100], trailing_bits=True)
>>> hash(a) == hash(b)
?

@kaplun
Copy link
Member Author

kaplun commented Sep 9, 2014

I believe they should be really different. Having trailing_bits=True means having an infinite set with infinite bits set to True.

The current issue I am having with the implementation I found so far is that the hash of an infinite intbitset from which I remove and then add a bit in the middle is different from the hash of the original intbitset which is not good, since they are semantically identical (albeit they might have some differences in the way they have allocated memory)

@jirikuncar
Copy link
Member

"Can you compare to infinities?"

@kaplun
Copy link
Member Author

kaplun commented Sep 9, 2014

Sure :) 1, 2, 4, 5, 6, 7, 8, 9... != 1, 3, 4, 5, 6, 7, 8, 9... :-) They have a start after all :)

@jirikuncar
Copy link
Member

Simple answer: raise exception if trailing_bits == True otherwise you can use hash of list(intbitset(...)).

I would rather see intbitset and frozenintbitset as alternative to set and frozenset (https://docs.python.org/2/library/stdtypes.html#set-types-set-frozenset).

The set type is mutable — the contents can be changed using methods like add() and remove(). Since it is mutable, it has no hash value and cannot be used as either a dictionary key or as an element of another set.

Broken hash function

It's not broken ... it should not have been implemented in first place.

@kaplun
Copy link
Member Author

kaplun commented Sep 9, 2014

Broken hash function

It's not broken ... it should not have been implemented in first place.

Read it as: buggy! There's no problem in having a hash function for intbitset. Simply the current implementation has a bug.

An intbiset is indeed mutable, but, is more similar to a string (from a data-perspective), since it is simply a container of bits, not of pointers. So as soon as you alter a bit, you obtain a different intbitset, and therefore hash. So it is currently (minus the hash-bug) possible to retrieve in a predictable way the value of foo[intbitset([1, 2, 3])] where foo is a mapping.

@jirikuncar
Copy link
Member

So it is currently (minus the hash-bug) possible to retrieve in a predictable way the value of foo[intbitset([1, 2, 3])] where foo is a mapping.

Is there a use-case for it? Do you really want to make hash on large byte arrays (5MB+)?

@kaplun
Copy link
Member Author

kaplun commented Sep 9, 2014

The use cases are the same of frozenset. It's a generic datatype. Who knows the future use-cases. Let me just fix this hashing functions 😜

@kaplun
Copy link
Member Author

kaplun commented Sep 9, 2014

Anyway, I like your proposal for the frozenintbitset. That's probably more clean. and can be the subject of another ticket...

@kaplun kaplun modified the milestones: v2.2, v2.3 Sep 4, 2015
@kaplun kaplun mentioned this issue May 4, 2016
@pombredanne pombredanne removed this from the v2.3 milestone Dec 20, 2019
@pombredanne
Copy link
Collaborator

@jirikuncar wrote

Broken hash function

It's not broken ... it should not have been implemented in first place.

I agree. Either we are hashable and immutable or we are mutable and not hashable and IMHO anything else is an oddity.

@pombredanne pombredanne added this to the v2.5 milestone Dec 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants