Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtable-related issues #239

Closed
Daniel-Cortez opened this issue Jan 6, 2018 · 4 comments
Closed

Hashtable-related issues #239

Daniel-Cortez opened this issue Jan 6, 2018 · 4 comments

Comments

@Daniel-Cortez
Copy link
Contributor

Hashtable size

In releases 3.10.4 through 3.10.6 the hashmap used 2^23 slots (which is actually 2^24 since the hashmap multiplies the specified number of slots and rounds it to the nearest power of 2 internally) which resulted into ~128 Mb RAM usage.
Although it was kinda fixed in 6d65605, the fix is not exactly correct.
https://github.com/Southclaws/pawn/blob/6d65605bb02d3bc4eb7c5d0344bd4ea934fa79de/source/compiler/sc1.c#L940
As I already mentioned, the number of slots is a power of 2, so the correct number here should be 16384, not 10000.

License compliance

The current hashmap implementation is released under the terms of the MIT license which requires the following:
https://github.com/Southclaws/pawn/blob/6d65605bb02d3bc4eb7c5d0344bd4ea934fa79de/source/compiler/hashmap/LICENSE#L12-L13
This means if the hashtable code is included into the compiler, so must be the license text ("permission") and copyright notice. But there's no definition of the term "software" in that text, so it can mean both the source code and the binary release, which means we have to
a) add a --licenses compiler option or something similar to list the licenses and copyright notices for all third-party code used in the compiler (or at leased for the code under licenses requiring to do so),
or
b) add a LICENSES.txt file and ship it with the binaries. Also the end user would be required to drop the *.txt file into the pawno folder along with the binaries and keep with them in order to "legally" use the compiler (they use the compiler, which means they use the hashtable code and so must comply to the terms of the MIT license as well).

Of course, I'm not a lawyer and I can be wrong - although I really doubt about the latter, the MIT license is simple enough to read and understand, unlike some complex licenses like MPL or GNU GPL.
I'm just trying to make sure we comply all licenses for the third-party OSS code used in the compiler, that's all.

@Daniel-Cortez
Copy link
Contributor Author

Daniel-Cortez commented Jan 6, 2018

Ok, so in order to sort out the abovementioned problems I made a patch, in which I switched to a new hashtable implementation.
While reimplementing all hashtable operations in the compiler, I removed the symbol2 struct which linked together symbols with the same name in the hashtable. Instead of using it I added an htnext variable to the symbol struct to keep everything in one place, in a more cache-friendly way.

Since the new hashtable implementation is dual-licensed (MIT/public domain), there's no requirement to add its license to the compiler binaries. We can still add the LICENSES file later (for ethical reasons and whatsoever), but after merging this PR we won't be strictly required to do so.

Also, as a bonus, we can use VS10 again when the patch is merged.

Tested the patch on Open-GTO, the *.amx files generated before and after the patch are byte-identical.

@Zeex
Copy link
Contributor

Zeex commented Jan 7, 2018

The size passed to hashmap_init is not required to be a power of 2; it's the final calculated size of the table that must be a power of 2 like it says here:

https://github.com/Southclaws/pawn/blob/master/source/compiler/hashmap/hashmap.c#L63-L69

It's always multiplied by 0.75 and converted to a power of 2 automatically.

Edit:

I was a little wrong, it's not multiplied by 0.75 but rather by 4/3, so that the old/new factor is 0.75.

@Daniel-Cortez
Copy link
Contributor Author

Daniel-Cortez commented Jan 7, 2018

The size passed to hashmap_init is not required to be a power of 2

Well, I never said it's required to be a power of 2; passing 10000 to hashmap_init results in 16384 slots anyway. Passing (16384 / 4 * 3) instead seems to be more right, but it's probably just a matter of taste.

It's always multiplied by 0.75 and converted to a power of 2 automatically.

Yes, I am aware of it, just forgot about it for a second when I was writing the 1'st post. I took that into account in the abovementioned PR though; the only difference is that the new hashtable implementation multiplies the passed size by 3 / 2 instead of 4 / 3.

https://github.com/Daniel-Cortez/pawn-3.10/blob/8b3583e7b5d4974a43501002a7d85d7c3c189093/source/compiler/sc1.c#L940

@Zeex
Copy link
Contributor

Zeex commented Jan 7, 2018

OK then, no worries. The new implementation looks better (I didn't like that symbol2 struct too).

Thanks for the patch! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants