Skip to content
This repository has been archived by the owner on Jun 23, 2022. It is now read-only.

feat(backup): add hdfs as an alternative backup provider #647

Merged
merged 17 commits into from
Nov 19, 2020

Conversation

zhangyifan27
Copy link
Contributor

@zhangyifan27 zhangyifan27 commented Oct 21, 2020

Use a JNI based C API libhdfs to implement table backup/restore.

New thread pool added: THREAD_POOL_BLOCK_SERVICE
(THREAD_POOL_FDS_SERVICE and THREAD_POOL_LOCAL_SERVICE will be replaced by it later )

New configurations added:

[replication]
+ hdfs_read_batch_size_bytes = 67108864 # 64MB
+ hdfs_write_batch_size_bytes = 67108864 # 64MB

When using hdfs as a backup/restore provider, you also need to add a block service configuration like this:

[block_service.xxx]
type = hdfs_service
args = <hdfs_name_node> <hdfs_full_path>

@zhangyifan27 zhangyifan27 added the thirdparty-modified If this PR modified some thirdparties that need to be entirely rebuilt. label Nov 5, 2020
src/block_service/hdfs/CMakeLists.txt Outdated Show resolved Hide resolved
src/block_service/hdfs/CMakeLists.txt Outdated Show resolved Hide resolved
src/replica/replica_restore.cpp Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.h Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.h Show resolved Hide resolved
src/block_service/test/hdfs_service_test.cpp Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Show resolved Hide resolved
@zhangyifan27 zhangyifan27 added the type/config-change PR that made modification on configs, which should be noted in release note. label Nov 13, 2020
remove_path_response rem_resp;

// fisrt clean up all old file in test directory.
std::cout << "clean up all old files" << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about print these logs into log file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could know what stage is the test running if output these message to console, it's useful when upload/download many large files.

hycdong
hycdong previously approved these changes Nov 16, 2020
src/block_service/hdfs/CMakeLists.txt Outdated Show resolved Hide resolved
Comment on lines +23 to +25
[threadpool.THREAD_POOL_FDS_SERVICE]
worker_count = 8

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These block services will create in total 24 threads on server bootstrap. We should merge them into one shared threadpool. Call it THREAD_POOL_BLOCK_SERVICE then.

We can leave this job to later PR.

src/block_service/hdfs/hdfs_service.cpp Show resolved Hide resolved
src/block_service/test/hdfs_service_test.cpp Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.h Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.h Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
#"GLOB" for non - recursive search
set(MY_SRC_SEARCH_MODE "GLOB")

set(MY_PROJ_LIBS "")
Copy link
Contributor

@neverchanje neverchanje Nov 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set(MY_PROJ_LIBS "")
set(MY_PROJ_LIBS "hdfs ${JAVA_JVM_LIBRARY}")


dsn_add_static_library()

target_link_libraries(${MY_PROJ_NAME} PUBLIC hdfs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
target_link_libraries(${MY_PROJ_NAME} PUBLIC hdfs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target_link_libraries(${MY_PROJ_NAME} PUBLIC hdfs ${JAVA_JVM_LIBRARY}) is required, because libraries(hdfs,jvm) are for both a target and its dependents.

@acelyc111
Copy link
Member

Use a JNI based C API libhdfs to implement table backup/restore.

New configurations added:

  • hdfs_read_batch_size_bytes
  • hdfs_write_batch_size_bytes

New thread pool added: THREAD_POOL_HDFS_SERVICE

Better to describe the new added configs like:

+ [security]
+ enable_acl = false
+ super_users = 
+ meta_acl_rpc_allow_list = 

src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
derror_f("Failed to write hdfs file {}, error: {}.",
file_name(),
utils::safe_strerror(errno));
hdfsCloseFile(_service->get_fs(), writeFile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the file when error occurs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If writing/uploading file failed, we'll retry to overwrite the same file, so I think removing the file is not necessary.

src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
src/block_service/hdfs/hdfs_service.cpp Outdated Show resolved Hide resolved
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
thirdparty-modified If this PR modified some thirdparties that need to be entirely rebuilt. type/config-change PR that made modification on configs, which should be noted in release note.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants