-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jeffhostetler/memihash perf #964
Jeffhostetler/memihash perf #964
Conversation
Remove duplicate memihash() call in hash_dir_entry(). The existing code called memihash() to do the find_dir_entry() and it not found, called memihash() again to do the hashmap_add(). Signed-off-by: Jeff Hostetler <[email protected]>
Add variant of memihash() to allow the hash computation to be continued. There are times when we compute the hash on a full path and then the hash on just the path to the parent directory. This can be expensive on large repositories. With this, we can hash the parent directory first. And then continue the computation to include the "/filename". Signed-off-by: Jeff Hostetler <[email protected]>
Precompute the istate.name_hash and istate.dir_hash values for each cache-entry during the preload-index phase. Move the expensive memihash() calculations from lazy_init_name_hash() to the multi-threaded preload-index phase. Signed-off-by: Jeff Hostetler <[email protected]>
Specify an initial size for the istate.dir_hash HashMap matching the size of the istate.name_hash. Previously hashmap_init() was given 0, causing a 64 bucket hashmap to be created. When working with very large repositories, this would cause numerous rehash() calls to realloc and rebalance the hashmap. This is especially true when the worktree is deep, with many directories containing a few files. Signed-off-by: Jeff Hostetler <[email protected]>
Teach hash_dir_entry() to remember the previously found dir_entry during lazy_init_name_hash() iteration. This is a performance optimization. Since items in the index array are sorted by full pathname, adjacent items are likely to be in the same directory. This can save memihash() computations and HashMap lookups. Signed-off-by: Jeff Hostetler <[email protected]>
@@ -23,17 +23,23 @@ static int dir_entry_cmp(const struct dir_entry *e1, | |||
name ? name : e2->name, e1->namelen); | |||
} | |||
|
|||
static struct dir_entry *find_dir_entry(struct index_state *istate, | |||
const char *name, unsigned int namelen) | |||
static struct dir_entry *find_dir_entry__hash(struct index_state *istate, |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
* Incoporate another chunk of data into a memihash | ||
* computation. | ||
*/ | ||
unsigned int memihash2(unsigned int hash_seed, const void *buf, size_t len) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
int nr; | ||
|
||
if (istate->name_hash_initialized) | ||
return; | ||
hashmap_init(&istate->name_hash, (hashmap_cmp_fn) cache_entry_cmp, | ||
istate->cache_nr); | ||
hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp, 0); | ||
hashmap_init(&istate->dir_hash, (hashmap_cmp_fn) dir_entry_cmp, | ||
istate->cache_nr); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
static struct dir_entry *hash_dir_entry(struct index_state *istate, | ||
struct cache_entry *ce, int namelen) | ||
struct cache_entry *ce, int namelen, struct dir_entry **p_previous_dir) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Thank you so much! |
Performance of the cache of case-insensitive file names [has been improved](git-for-windows/git#964). Signed-off-by: Johannes Schindelin <[email protected]>
…er/memihash_perf Jeffhostetler/memihash perf
…er/memihash_perf Jeffhostetler/memihash perf
unsigned char *ucbuf = (unsigned char *) buf; | ||
while (len--) { | ||
unsigned int c = *ucbuf++; | ||
if (c >= 'a' && c <= 'z') |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Jeffhostetler/memihash perf
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
Jeffhostetler/memihash perf
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
Jeffhostetler/memihash perf
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
Jeffhostetler/memihash perf
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
Jeffhostetler/memihash perf
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
Jeffhostetler/memihash perf
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
Jeffhostetler/memihash perf
…ash_perf I should really implement special-handling for a new drop! prefix to commit messages... This commit reverts that Pull Request, in preparation for merging in a new iteration of that work (which looks substantially different from the previous iteration...). Signed-off-by: Johannes Schindelin <[email protected]>
A series of performance enhancements in the memihash and name-cache area.
On Windows, calls to memihash() and maintaining the istate.name_hash and istate.dir_hash HashMaps take significant time on very large repositories. This series of changes reduces the overall time taken for various operations by reducing the number calls to memihash(), moving some of them into multi-threaded code, and etc.