Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use functor to generate the hash modules, add MD5 digest. #23

Open
wants to merge 20 commits into
base: master
Choose a base branch
from

Conversation

madroach
Copy link
Contributor

Hi,

here are some big changes. I trimmed down the stub code to the minimum algorithmic code and imlemented as much in OCaml as possible. To reduce redundancy I put the code in a functor which takes the stubs as parameter and generates the hash modules (Sha1, MD5...). Then I added the MD5 digest using the C functions already present in the OCaml runtime.
I also fixed a bug in commit fe6dba9.

I could take maintainership of the project, but would like to move it to BSD or ISC license.

Christopher

@madroach
Copy link
Contributor Author

madroach commented Oct 8, 2017

I obviosly would agree licensing my commits to ISC or BSD.

@talex5
Copy link
Contributor

talex5 commented Jan 2, 2018

@djs55 this is a big PR, but I think we should cherry-pick the memory-corruption fix fe6dba9 at least. I've made a separate PR #36 for this.

@djs55
Copy link
Owner

djs55 commented Jan 2, 2018

Thanks for this -- I completely approve of the goal of implementing as much in OCaml as possible!

Sorry for the conflicts -- would you be able to rebase this on top of current master?

@madroach
Copy link
Contributor Author

madroach commented Jan 6, 2018

This is not finished yet. I still may have to merge some changes to the imlementation files that happened since I converted them to a single functor

@madroach madroach force-pushed the master branch 7 times, most recently from 922126d to aaae5d4 Compare January 6, 2018 13:03
@madroach
Copy link
Contributor Author

madroach commented Jan 6, 2018

Ok. I believe this is ready to me merged. The changes to the repo since I opened the pull requested were already included in my functor.
Still unfinished is the file_fast function. I would leave it as is and remove the C stub code. If this isn't fast enough we could switch from channels to Unix.read.

Christopher Zimmermann added 4 commits January 8, 2018 18:54
* add back the file_fast function implemented in C
* also add a file_unbuffered variant which passes the opened file descriptor to C to avoid channel buffering and string copying.
* add benchmark:

benchmark for test/sample.txt:

                    Rate                  file file_unbuffered       file_fast
           file  28159+- 290/s              --            -85%            -89%
file_unbuffered 190145+-4235/s            575%              --            -26%
      file_fast 257396+-2014/s            814%             35%              --
@madroach
Copy link
Contributor Author

madroach commented Jan 8, 2018

I added back the file_fast function and benchmarked it against an unbuffered implementation with more code in the functor. The results for the very small sample.txt is below. Please note that the benchmark is run on thousands of times on the very same file so the file is buffered by the OS.

                    Rate                  file file_unbuffered       file_fast
           file  28795+- 542/s              --            -85%            -89%
file_unbuffered 192040+-5202/s            567%              --            -27%
      file_fast 261630+-7906/s            809%             36%              --

addendum - benchmark 610kb file, too:

                 Rate    file_unbuffered       file_fast            file
file_unbuffered 299+-3/s              --           [-0%]             -4%
      file_fast 299+-4/s            [0%]              --             -4%
           file 311+-3/s              4%              4%              --

this tells me the real performance gain of file_fast and file_unbuffered comes from the overhead of Unix.openfile and open_in.
Since any implementation is ridiculously fast on small files and there is no difference for big files I would suggest dropping the file_fast and file_unbuffered stubs. Then we can also drop the dependency on Unix.

@XVilka
Copy link

XVilka commented Nov 25, 2019

Was anything decided about this?

Christopher Zimmermann added 4 commits November 29, 2019 19:13
bounds check on update_bigstring was too strict.
update_substring did not check for negative length.
Christopher Zimmermann added 2 commits April 27, 2020 10:54
(do I break backwards-compatibility to a relevant Ocaml release?)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants