Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

fix(bulk_load): fix remove_local_bulk_load_dir() #823

Merged
merged 6 commits into from
Apr 25, 2021

Conversation

zhangyifan27
Copy link
Contributor

@zhangyifan27 zhangyifan27 commented Apr 22, 2021

Recently our users encountered problems while using bulk_load, their job always failed, and we found remove_directory errors in replica server log:

W2021-04-22 11:15:40.633 (1619061340633182759 122434) replica.replica20.0400de080018c494: filesystem.cpp:329:remove_directory(): remove /home/work/ssd3/pegasus/c4tst-bulkload/replica/reps/46.18.pegasus/bulk_load failed, err = Directory not empty
E2021-04-22 11:15:40.633 (1619061340633219926 122434) replica.replica20.0400de080018c494: replica_bulk_loader.cpp:599:remove_local_bulk_load_dir(): [[email protected]:34801] remove bulk_load dir(/home/work/ssd3/pegasus/c4tst-bulkload/replica/reps/46.18.pegasus/bulk_load) failed

The reason is that we couldn't remove a directory using whiling writing files in this directory(see the added tests in HDFSClientTest). This patch rename unused bulkload dir to a garbage dir instead of just removing it, then remove garbage dir, disk cleaner would retry to remove it if failed.

@foreverneverer
Copy link
Contributor

foreverneverer commented Apr 23, 2021

before starting to download bulkload

I don't see remove before starting? isn't it in this pr?

@zhangyifan27
Copy link
Contributor Author

before starting to download bulkload

I don't see remove before starting? isn't it in this pr?

Because do_bulkload(), start_bulkload() would be executed several times during the bulkload progress, so it is not safe to remove the bulk_load_dir in the begining of any function. I edited the description.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants