Use functor to generate the hash modules, add MD5 digest. #23

madroach · 2017-01-18T20:24:00Z

Hi,

here are some big changes. I trimmed down the stub code to the minimum algorithmic code and imlemented as much in OCaml as possible. To reduce redundancy I put the code in a functor which takes the stubs as parameter and generates the hash modules (Sha1, MD5...). Then I added the MD5 digest using the C functions already present in the OCaml runtime.
I also fixed a bug in commit fe6dba9.

I could take maintainership of the project, but would like to move it to BSD or ISC license.

Christopher

madroach · 2017-10-08T16:41:43Z

I obviosly would agree licensing my commits to ISC or BSD.

talex5 · 2018-01-02T14:33:14Z

@djs55 this is a big PR, but I think we should cherry-pick the memory-corruption fix fe6dba9 at least. I've made a separate PR #36 for this.

djs55 · 2018-01-02T21:04:53Z

Thanks for this -- I completely approve of the goal of implementing as much in OCaml as possible!

Sorry for the conflicts -- would you be able to rebase this on top of current master?

who have their own swap32 macros.

madroach · 2018-01-06T08:43:41Z

This is not finished yet. I still may have to merge some changes to the imlementation files that happened since I converted them to a single functor

madroach · 2018-01-06T14:16:39Z

Ok. I believe this is ready to me merged. The changes to the repo since I opened the pull requested were already included in my functor.
Still unfinished is the file_fast function. I would leave it as is and remove the C stub code. If this isn't fast enough we could switch from channels to Unix.read.

* add back the file_fast function implemented in C * also add a file_unbuffered variant which passes the opened file descriptor to C to avoid channel buffering and string copying. * add benchmark: benchmark for test/sample.txt: Rate file file_unbuffered file_fast file 28159+- 290/s -- -85% -89% file_unbuffered 190145+-4235/s 575% -- -26% file_fast 257396+-2014/s 814% 35% --

madroach · 2018-01-08T22:46:46Z

I added back the file_fast function and benchmarked it against an unbuffered implementation with more code in the functor. The results for the very small sample.txt is below. Please note that the benchmark is run on thousands of times on the very same file so the file is buffered by the OS.

                    Rate                  file file_unbuffered       file_fast
           file  28795+- 542/s              --            -85%            -89%
file_unbuffered 192040+-5202/s            567%              --            -27%
      file_fast 261630+-7906/s            809%             36%              --

addendum - benchmark 610kb file, too:

                 Rate    file_unbuffered       file_fast            file
file_unbuffered 299+-3/s              --           [-0%]             -4%
      file_fast 299+-4/s            [0%]              --             -4%
           file 311+-3/s              4%              4%              --

this tells me the real performance gain of file_fast and file_unbuffered comes from the overhead of Unix.openfile and open_in.
Since any implementation is ridiculously fast on small files and there is no difference for big files I would suggest dropping the file_fast and file_unbuffered stubs. Then we can also drop the dependency on Unix.

XVilka · 2019-11-25T13:26:33Z

Was anything decided about this?

bounds check on update_bigstring was too strict. update_substring did not check for negative length.

(do I break backwards-compatibility to a relevant Ocaml release?)

talex5 mentioned this pull request Jan 2, 2018

Don't access caml values without acquired runtime #36

Merged

Christopher Zimmermann added 5 commits January 4, 2018 20:01

Fix building on BSD

37b358f

who have their own swap32 macros.

Fix incompatible types in sha1.{ml,mli}

5f5e1cb

fix bigarray types

13c3f34

Implement context copying by Obj.dup instead C stubs

a296d63

Use functor to generate uniform hash modules from stubs

ba594d6

madroach force-pushed the master branch from 18985bc to 49f5095 Compare January 6, 2018 08:41

madroach force-pushed the master branch 7 times, most recently from 922126d to aaae5d4 Compare January 6, 2018 13:03

Add md5 digest

c15de0c

madroach force-pushed the master branch from aaae5d4 to c15de0c Compare January 6, 2018 13:53

Christopher Zimmermann added 4 commits January 8, 2018 18:54

don't duplicate signatures

7cba3ec

Add update_bigstring with pos and len arguments

edcc9f8

reindent

6ad6300

Christopher Zimmermann added 4 commits January 9, 2018 00:07

use sufficiently large int in update_fd counter

a589c48

fix signedness in update_fd buffer

a587e69

add update_bigstring to signature

dc0f629

add test dependency on benchmark

ee3edad

Christopher Zimmermann added 4 commits November 29, 2019 19:13

fix bounds check

1bd0169

bounds check on update_bigstring was too strict. update_substring did not check for negative length.

fix building on Ocaml > 4.07

d218a42

test on OCaml 4.09, too

9129c6c

jbuilder -> dune

6040cd2

madroach force-pushed the master branch from 312ad3a to a17667c Compare April 27, 2020 08:38

Christopher Zimmermann added 2 commits April 27, 2020 10:54

Merge remote-tracking branch 'upstream/master'

1668b73

Use Bytes_val instead of String_val to fix warning

cefe782

(do I break backwards-compatibility to a relevant Ocaml release?)

madroach force-pushed the master branch from a17667c to cefe782 Compare April 27, 2020 08:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use functor to generate the hash modules, add MD5 digest. #23

Use functor to generate the hash modules, add MD5 digest. #23

madroach commented Jan 18, 2017

madroach commented Oct 8, 2017

talex5 commented Jan 2, 2018 •

edited

Loading

djs55 commented Jan 2, 2018

madroach commented Jan 6, 2018

madroach commented Jan 6, 2018

madroach commented Jan 8, 2018 •

edited

Loading

XVilka commented Nov 25, 2019

Use functor to generate the hash modules, add MD5 digest. #23

Are you sure you want to change the base?

Use functor to generate the hash modules, add MD5 digest. #23

Conversation

madroach commented Jan 18, 2017

madroach commented Oct 8, 2017

talex5 commented Jan 2, 2018 • edited Loading

djs55 commented Jan 2, 2018

madroach commented Jan 6, 2018

madroach commented Jan 6, 2018

madroach commented Jan 8, 2018 • edited Loading

XVilka commented Nov 25, 2019

talex5 commented Jan 2, 2018 •

edited

Loading

madroach commented Jan 8, 2018 •

edited

Loading