Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Hashing namespace with Hasher classes #4578

Closed
akzhan opened this issue Jun 15, 2017 · 18 comments · Fixed by #4946
Closed

Discussion: Hashing namespace with Hasher classes #4578

akzhan opened this issue Jun 15, 2017 · 18 comments · Fixed by #4946

Comments

@akzhan
Copy link
Contributor

akzhan commented Jun 15, 2017

Currently Crystal ignores existence of HashFlood: hash function for base types is trivial "identity", and for strings it is very simple and easy to forge, and no any seeding performed.

Hash functions should be at least not-trivially forged when random seed used.

Design with separated hasher allows switch between different hash algorithms: default "safer" for public facing applications, and "faster" for non-tainted data.

As proposed by @RX14

class Object
  def hash
    Hashing::StdHasher.build do |hasher|
      hash(hasher)
    end
  end

  def hash(hasher)
    hasher << object_id
  end
end

class MyClass
  property foo : Foo
  property bar : Bar

  def hash(hasher)
    hasher << @foo << @bar
  end
end

Extracted from #4557 discussion because it may be discussed and implemented before.

/cc @RX14, @funny-falcon

@akzhan
Copy link
Contributor Author

akzhan commented Jun 15, 2017

Looks like we need

module Hashing
  alias Type = Int32

  module Build
    def build : Type
      builder = new
      yield builder
      builder.calculate
    end

    def seed
      555
    end
  end

  class StdHasher
    extend Build

    property val
    
    def initialize
      @val = 0
    end

    def <<(value) : self
      self.val += value
      self
    end

    def calculate : Type
      val + self.class.seed
    end
  end

end

hash = Hashing::StdHasher.build do |hasher|
  hasher << 12 << 24
end

p hash

Any suggestions?

@funny-falcon
Copy link
Contributor

Looks a lot like I thought about.

There is also need for random seed. I understood how to generate it with SecureRandom, but I'm in doubdt how to share it between StdHasher and Int32Hasher (special case for Hash(Int32, V)). Can you give me an advice?

@akzhan
Copy link
Contributor Author

akzhan commented Jun 16, 2017

I suppose that You may implement seed in another module like a Build. Extend classes with it and do self.class.seed.

By the way - hashers are reference classes in example above now.

@funny-falcon
Copy link
Contributor

Here is draft of Hashing::StdHasher https://gist.github.com/funny-falcon/52eb622d689a2eae3e2d7b4e4a77d5ac

I strongly miss #1517 for convenience. Had to do wrapper struct holding pointer to implementation to achieve stack allocation + pass-by-pointer + convenient method calling. Compiler is not smart enough to allocate class-object on a stack.

@akzhan
Copy link
Contributor Author

akzhan commented Jun 17, 2017

why not

class String
  def hash(hasher)
    hasher << to_slice
  end
end

also looks like Object.hash() with build already does *.hash with build, so need no duplicated code.

@funny-falcon
Copy link
Contributor

why not

Ah, you are right.

also looks like Object.hash() with build already does *.hash with build, so need no duplicated code.

Sorry, I doesn't understand what you mean :-( This were example that compiles with unmodified master.
Of course I will modify original Object#hash() in pull request, if you mean that.
But still Hash may initiate builder by itself (for example, if per-hash-table seed will be accepted).

@akzhan
Copy link
Contributor Author

akzhan commented Jun 17, 2017

Yes, I mean that.

Please feel free to clone crystal repo, make crystal and use bin/crystal.

It allows to do anything.

btw, I can do a lot with standard Crystal: https://play.crystal-lang.org/#/r/27bo

@funny-falcon
Copy link
Contributor

Yes, I use bin/crystal.

@ysbaddaden
Copy link
Contributor

ysbaddaden commented Jun 18, 2017

Opening issues with code and no further explanations is only asking volonteers to search and seek for what the hell you are trying to solve. Please at least explain the issue, or nobody from the core team will care to take a look.

@funny-falcon
Copy link
Contributor

Explanation:
Currently Crystal ignores existence of HashFlood: hash function for base types is trivial "identity", and for strings it is very simple and easy to forge, and no any seeding performed.

Hash functions should be at least not-trivially forged when random seed used.

Design with def hash(hasher); hasher << @a << @b; end allows switch between different hash algorithms: default "safer" for public facing applications, and "faster" for non-tainted data.

I'm working on an issue, and will do pull request next week, if no one complains against.

@ysbaddaden
Copy link
Contributor

Thanks for taking the time to explain. I agree hash codes as currently implemented are simple but basic and problematic (leading to attack vectors).

allows switch between different hash algorithms, one safer, one faster.

No. There should be a single hash solution, good for all use cases (safety first, then the fastest), implemented just like today: a generic Object#hash implementation, which can be overriden to better fits each struct/class data.

@konovod
Copy link
Contributor

konovod commented Jun 19, 2017

@ysbaddaden, in this case overriding hash would be difficult, as user will have to think about safety and seeding (or just use boilerplate h = Hasher.new; h<<field1...; h.finalize). Design with hash(hasher) looks more convenient even if there will be only one way to hash (and overriding hash directly will still be possible).

@funny-falcon
Copy link
Contributor

After all, hash(hasher) is very reusable : one may use even SHA3 without changing a line, just supplying other hasher.

Though, there is a one pity point: Hash#hash currently key/insertion/iteration order agnostic. It will be quite difficult to preserve this property with hash(hasher). Either one have to sort by keys before hashing, or declare that hashes with different iteration order are different.

@ysbaddaden
Copy link
Contributor

#hash is something nobody should have to care about, even less deal with: the stdlib must do all the job, and do it so well that nobody else than stdlib maintainers have to worry about it. Overiding is for stdlib structs like Float64 or Time may provide specific solutions, if need be.

@RX14
Copy link
Contributor

RX14 commented Jun 19, 2017

@funny-falcon while that might be useful, I think it's a bad idea to overly abstract this and implement some kind of class hierarchy so that you can use multiple hashing algorithms. You should just implement a single Hasher class which implements one good generic hash. If people want to use multiple hashing algorithms they can do so through duck typing.

@RX14
Copy link
Contributor

RX14 commented Jun 19, 2017

@ysbaddaden everyone who defines custom equality also has to define a custom hash, and I think that's pretty common. We even have a macro def_equals_and_hash to make it simple for people to override the equality.

@funny-falcon
Copy link
Contributor

funny-falcon commented Jun 19, 2017

@RX14,

I think it's a bad idea to overly abstract this and implement some kind of class hierarchy so that you can use multiple hashing algorithms.

No one talks about class hierarchy.

If people want to use multiple hashing algorithms they can do so through duck typing.

With def hash(h); h << @a << @b ; end it is quite easy to pass non-default hasher with duck-typing.
Non-default hasher just have to implement following contract:

class NonDefaultHasher
   def <<(v : Nil)
   end
   def <<(v : Int::Primitive)
   end
   def <<(v : Bytes)
   end
   def <<(v)
     v.hash(self)
   end
   # initialize and calculate are omitted, cause they are not part of contract.
end

Without such contract use of duck-typing it is quite hard.

@RX14
Copy link
Contributor

RX14 commented Jun 19, 2017

@funny-falcon yes you're exactly correct. I thought someone had proposed something complex but I was wrong. My bad.

funny-falcon added a commit to funny-falcon/crystal that referenced this issue Jun 25, 2017
To protect against Hash DoS, change the way hash value is computed.
Class|Struct should define method `def hashme(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Enum#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

But it is better to implement `def hashme(hasher)`.

StdHasher is default hasher that uses `hashme` and it is used as default
seeded hasher. It also implements `fasthash` used for fast hash of
primitive types, and `unseeded` for `Enums`.

Fixes crystal-lang#4578
Prerequisite for crystal-lang#4557
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Jun 27, 2017
To protect against Hash DoS, change the way hash value is computed.
Class|Struct should define method `def hashme(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Enum#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

But it is better to implement `def hashme(hasher)`.

StdHasher is default hasher that uses `hashme` and it is used as default
seeded hasher. It also implements `fasthash` used for fast hash of
primitive types, and `unseeded` for `Enums`.

Fixes crystal-lang#4578
Prerequisite for crystal-lang#4557
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Jul 5, 2017
To protect against Hash DoS, change the way hash value is computed.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

But it is better to implement `def hash(hasher)`.

StdHasher is default hasher that uses `hash(hasher)` and it is used as default
seeded hasher. It also implements `unseeded` for `Enums`.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.

Fixes crystal-lang#4578
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Jul 5, 2017
To protect against Hash DoS, change the way hash value is computed.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

But it is better to implement `def hash(hasher)`.

StdHasher is default hasher that uses `hash(hasher)` and it is used as default
seeded hasher. It also implements `unseeded` for `Enums`.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
(idea by Akzhan Abdulin @akzhan)

Fixes crystal-lang#4578
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Jul 17, 2017
To protect against Hash DoS, change the way hash value is computed.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as default
seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Aug 24, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Aug 24, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Aug 24, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Sep 3, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Sep 3, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Sep 3, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Sep 3, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
funny-falcon added a commit to funny-falcon/crystal that referenced this issue Sep 10, 2017
Prepare hash infrastructor to future change of hashing algrorithm
to protect against Hash DoS.
Class|Struct should define method `def hash(hasher)` and call
`hasher << @ivar` inside.

As an option, for speed, and for backward compatibility, `def hash`
still could be implemented. It will be used for Hash of matched type.
`Thread#hash` and `Signal#hash` is implemented as unseeded cause they are
 used before `StdHasher @@seed` is initialized.

Hash::Hasher is default hasher that uses `hash(hasher)` and it is used as
default seeded hasher.

Also, number normalization for hashing introduced, ie rule 'equality
forces hash equality' is forced (`a == b` => `a.hash == b.hash`).
Normalization idea is borrowed from Python implementation.
It fixes several issues with BigInt and BigFloat on 32bit platform,
but not all issues.

Fixes crystal-lang#4578
Fixes crystal-lang#3932
Prerequisite for crystal-lang#4557
Replaces crystal-lang#4581
Correlates with crystal-lang#4653
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants