Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature disks (ls, C, list-disks, disk, help) #36060

Merged
merged 9 commits into from
Jun 7, 2022

Conversation

Varinara
Copy link
Contributor

@Varinara Varinara commented Apr 8, 2022

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add feature disks(
ls - list files on disk,
C - set config file,
list-disks - list disks names,
disk - set disk for work by name,
help - produce help message
copy - copy data on disk containing at from_path to to_path
link - Create hardlink on disk from from_path to to_path
list - List files on disk
move - Move file or directory on disk from from_path to to_path
read - read File on disk from_path to to_path or to stdout
remove - Remove file or directory on disk with all children.
write - Write File on diskfrom_path or stdin to to_path)

Information about CI checks: https://clickhouse.tech/docs/en/development/continuous-integration/

@kssenii kssenii self-assigned this Apr 8, 2022
@robot-clickhouse robot-clickhouse added the pr-improvement Pull request with some product improvements label Apr 8, 2022
@nikitamikhaylov nikitamikhaylov added the can be tested Allows running workflows for external contributors label Apr 8, 2022
@@ -844,6 +844,9 @@ class Context: public std::enable_shared_from_this<Context>

void shutdown();

std::vector<std::string> listDisks();
void listFiles(const String & name, const String & path, std::vector<String> & file_names);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can get disk instance in your calling method, I think this method should be part of the disks tool, not context.

std::vector<std::string> disksNames;
std::lock_guard lock(shared->storage_policies_mutex);

for (auto & [disk_name, disk] : getDiskSelector(lock)->getDisksMap())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getDiskSelector is a part of public interface of the context. You can do it inside disks tool code.

@Varinara
Copy link
Contributor Author

Varinara commented Apr 8, 2022

The tests will be written later, now you can look at it

@Varinara Varinara marked this pull request as draft April 8, 2022 11:04
@alexey-milovidov alexey-milovidov changed the title Add feature disks(ls, C, list-disks, disk, help) Add feature disks (ls, C, list-disks, disk, help) Apr 8, 2022
@@ -0,0 +1,29 @@
#ifndef CLICKHOUSE_PATH_H
#define CLICKHOUSE_PATH_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'#pragma once' is enough.

std::cout << *it << "\n";
}
}
else
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad codestyle. Try to use clang-format with 'IndentWidth: 4'.

run();
}
else
if (curCommand == "ls")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codestyle.

{
if (curCommand == "start")
{
if (int(argumentsToDo.size()) == 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not use c-style type casts, read https://clickhouse.com/docs/ru/development/style/

And cast from size_t (64 bit) to int (32 bit) is a very dangerous. int(0x100000001ULL) == 1


disk_test->listFiles(full_path, file_names);

for (auto iter = file_names.begin(); iter != file_names.end(); iter++)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C++11 style for (const auto &file_name : file_names) is more readable.

void DisksFunc::initialize(Poco::Util::Application & self)
{
Poco::Util::Application::initialize(self);
if (curCommand == "start")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You use string iterals for command ("start", "ls", etc.) in multiple places. Move it into constants.


void DisksFunc::init(std::vector<String> &common_arguments)
{
namespace po = boost::program_options;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This alias used in several methods, move it to top of the file.

if (curCommand == "start")
helpOptionDescription.emplace(createOptionsDescription("Help Message", getTerminalWidth()));

boost::program_options::positional_options_description positional_options_description;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use namespace alias, use it everywhere.

Path diskPath = Path();
Arguments argumentsToDo;

std::map<String, bool> fArgOpt;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Codestyle, use snake_case for variables.

diskName = "default";
}

void Path::parsPath(std::string toPars)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be parsePath?

struct Path
{
public:
Path();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default disk name could be overriden via glob option, so you can pass it here

Path();

void parsPath(std::string toPars);
void setDiskName(std::string name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you pass default disk name(or overriden via args) this method is useless
And probably you can pass path to parse in constructor too


namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't find usages of it

extern const int NOT_IMPLEMENTED;
}

class DisksFunc : public Poco::Util::Application
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be DIsksTool or DisksApp can more obviously describe purpose of this class

}
}

void DisksFunc::reloadConfiguration()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is called only once on initializtion, so it's better loadCondfiguration, but I think you can just inline it in caller


void DisksFunc::printHelpMessage()
{
if (curCommand == "start")
Copy link
Contributor

@GrigoryPervakov GrigoryPervakov Apr 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should build special command description object, that contains all info about command:

  • name
  • method to call
  • program options
  • base help message
    So you can generate help message in generic way here based only on that information.
    It would simplify adding new commands. You can introduce new command by adding new item in commands map instead of searching a lot of methods where you should put new if statement

void DisksFunc::listDisks()
{
while(argumentsToDo[0] != "list-disks")
argumentsToDo.erase(argumentsToDo.begin());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code duplicates same for ls and would be for all other commands, it can be easily generalized in code before actual command call

path = "";
disk_name = def_disk_name;

if (to_parse.find(':') == std::string::npos)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if(auto pr = to_parse.find(':'); pr == std::string::npos)
You can avoid second find call


Path::Path(std::string def_disk_name, std::string to_parse)
{
path = "";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line is redundant

Comment on lines 104 to 106
std::cout << cur_command_description.command_name<<'\n';
std::cout << cur_command_description.description<<'\n';
std::cout << cur_command_description.usage<<'\n';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::cout << cur_command_description.command_name<<'\n';
std::cout << cur_command_description.description<<'\n';
std::cout << cur_command_description.usage<<'\n';
std::cout << cur_command_description.command_name << '\n';
std::cout << cur_command_description.description << '\n';
std::cout << cur_command_description.usage << '\n';

comand_descriptions.emplace("ls", DB::CommandDescription{
.command_name = "ls",
.command_option_description = createOptionsDescription("Help Message for ls", getTerminalWidth()),
.description = "List FILEs (the default disk by default)",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
.description = "List FILEs (the default disk by default)",
.description = "List files (the default disk is used by default)",

ContextMutablePtr global_context;
SharedContextHolder shared_context;

std::vector<String> in;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in -- bad name. Better something like command_arguments.

Comment on lines 200 to 206
std::optional<String> DisksApp::FindPosArg(Arguments & arguments)
{
for (const auto &pos_arg : arguments)
if (f_arg_pos[pos_arg])
return {pos_arg};
return {};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The position of the command (ls, etc) is known and always the same. So it is better to just check if the command is supported by checking it in a set

static const std::unordered_set<std::string_view> supported_commands = {"ls", ...};

global_context->makeGlobalContext();
global_context->setApplicationType(Context::ApplicationType::DISKS);

(this->*positional_arg_map[in[0]])();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this std::map<String, void (DisksApp::*)() > positional_arg_map;.
And use a simple if instead. Here optimization is not important while readablity of code is.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or put it inside the CommandDescription

cluster.add_instance(
"disks_test",
main_configs=["config.xml"],
with_zookeeper=True,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line 16 -- probably redundant

<source_trivial_cluster>
<shard>
<replica>
<host>disks_test</host>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remote_servers config also redundant, it is not necessary for instance name to be defined here.

]

instance = started_cluster.instances["disks_test"]
container = instance.get_docker_handle()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also use exec_in_container function from helpers/cluster as it will look simpler

def exec_in_container(self, cmd, detach=False, nothrow=False, **kwargs):
return self.cluster.exec_in_container(
self.docker_id, cmd, detach, nothrow, **kwargs
)

but current way is also ok.

@@ -55,6 +55,8 @@ option (ENABLE_CLICKHOUSE_KEEPER "ClickHouse alternative to ZooKeeper" ${ENABLE_

option (ENABLE_CLICKHOUSE_KEEPER_CONVERTER "Util allows to convert ZooKeeper logs and snapshots into clickhouse-keeper snapshot" ${ENABLE_CLICKHOUSE_ALL})

option (ENABLE_CLICKHOUSE_DISKS "Work with disks" ${ENABLE_CLICKHOUSE_ALL})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
option (ENABLE_CLICKHOUSE_DISKS "Work with disks" ${ENABLE_CLICKHOUSE_ALL})
option (ENABLE_CLICKHOUSE_DISKS "A tool to manage disks" ${ENABLE_CLICKHOUSE_ALL})

something like this, or more informative.


private:
std::string path;
std::string disk_name;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This struct is quite simple, so may be do just like this without any getters:

struct Path
{
     std:string disk_name;
     std::string path;

     Path(const std::string & disk_name_, const String & path_) : disk_name(disk_name), path(path_) {}
};

Getters are usually better for something not as simple.


global_context->makeGlobalContext();
global_context->setApplicationType(Context::ApplicationType::DISKS);
global_context->setPath(PATH);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path should not be hardcoded here to PATH, need to get it from config.


disk->listFiles(full_path, file_names);

for (const auto &file_name : file_names)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (const auto &file_name : file_names)
for (const auto & file_name : file_names)

{
String out;
auto in = disk->readFile(full_path);
readStringUntilEOF(out, *in);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not fill into memory. Try to use WriteBufferFromFileDescriptor and copyData.

Comment on lines 254 to 256
String full_path_input = disk->getPath();
full_path_input.pop_back();
full_path_input += config().getString("input");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String full_path_input = disk->getPath();
full_path_input.pop_back();
full_path_input += config().getString("input");
String full_path_input = std::filesystem::path(disk->getPath()) / config().getString("input");

std::optional<ProgramOptionsDescription> command_option_description;
String description;
String usage;
void (DisksApp::*arg_func)(void);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get rid of void (DisksApp::*arg_func)(void);.
You can just make a class for each command, with inheritance from ICommand and using command_descriptions = std::unordered_map<std:string, CommandPtr>, where CommandPtr=std::shared_ptr<ICommand>. Each will override virtual methods execute(), validateArguements(...), getHelpMessage(), ...

@ianton-ru
Copy link
Contributor

LGTM

@Varinara Varinara marked this pull request as ready for review May 19, 2022 11:06
@Varinara
Copy link
Contributor Author

All comments have been addressed. @kssenii please have a look.

global_context->setApplicationType(Context::ApplicationType::DISKS);
global_context->setPath(path);

command_descriptions[command_arguments[0]]->execute(command_arguments, global_context, config());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's save command_arguments[0] into a separate variable command_name in class when validating command arguments . And same separate variable for command_arguments, so that they do not contain command name, only command arguments.

exit(0);
}

if (command_arguments.size() == 0 || !supported_commands.count(command_arguments[0]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (command_arguments.size() == 0 || !supported_commands.count(command_arguments[0]))
if (command_arguments.size() == 0 || !supported_commands.contains(command_arguments[0]))

Comment on lines 43 to 46
if (path_from[0] == '/')
path_from.erase(0, 1);
if (path_to[0] == '/')
path_to.erase(0, 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the input format of command should looks like command /PATH
disk->getPath() gives the path in the format /PATH/
that's why I erased the first symbol '/'

Comment on lines 51 to 56
String full_path_from = (disk_name_from == "default"
? ((fs::path(disk_from->getPath()) / "data/default") / path_from)
: (fs::path(disk_from->getPath()) / path_from));
String full_path_to = (disk_name_to == "default"
? ((fs::path(disk_to->getPath()) / "data/default") / path_to)
: (fs::path(disk_to->getPath()) / path_to));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need to add "data/default" ? fs::path(disk->getPath()) / path should suffice for any case

@Varinara
Copy link
Contributor Author

Thanks for the comments @kssenii . Fixed

Comment on lines 40 to 42
String path = (command_arguments.size() < 1
? ""
: command_arguments[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we allow empty path for write command?


DiskPtr disk = global_context->getDisk(disk_name);

String full_path = fs::path(disk->getPath()) / path;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's protect against forbidden paths. Need to check that (fs::absolute(disk->getPath()) / path).lexically_normal() is indeed inside disk->getPath()

Need to add this path validation in all places it is used (move checking into a separate method validatePath).

Comment on lines 40 to 42
String path = (command_arguments.size() < 1
? ""
: command_arguments[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here why allow empty path for read command?

void execute(
const std::vector<String> &,
const DB::ContextMutablePtr &,
const Poco::Util::AbstractConfiguration &) override{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

throw exception in this case?

Comment on lines 41 to 46
String path_from = (command_arguments.size() < 1
? ""
: command_arguments[0]);
String path_to = (command_arguments.size() < 2
? ""
: command_arguments[1]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would also be strange to allow empty paths here.

@Varinara Varinara force-pushed the master branch 2 times, most recently from 7e1ced7 to c743ac8 Compare May 30, 2022 14:53
@Varinara
Copy link
Contributor Author

Thanks, @kssenii . think Fixed

Poco::Util::AbstractConfiguration &,
po::variables_map &) const override{}

void execute_impl(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void execute_impl(
void executeImpl(

should be like this by our style

@Varinara
Copy link
Contributor Author

Varinara commented Jun 1, 2022

@kssenii Fixes have been made, failed tests seems to be unrelevant

Copy link
Member

@kssenii kssenii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, these are last comments, let's fix this and merge

Comment on lines 40 to 39
for (const auto & [disk_name, disk] : global_context->getDisksMap())
disks_names.push_back(disk->getName());

for (const auto & disk_name : disks_names)
std::cout << disk_name << '\n';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (const auto & [disk_name, disk] : global_context->getDisksMap())
disks_names.push_back(disk->getName());
for (const auto & disk_name : disks_names)
std::cout << disk_name << '\n';
for (const auto & [disk_name, _] : global_context->getDisksMap())
std::cout << disk_name << '\n';

Comment on lines 53 to 63
String path_input = config.getString("input", "default");

std::unique_ptr<ReadBufferFromFileBase> in = std::make_unique<ReadBufferFromFileDescriptor>(STDIN_FILENO);
if (path_input != "default")
{
String full_path_input = fullPathWithValidate(disk, path_input);
in = disk->readFile(full_path_input);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String path_input = config.getString("input", "default");
std::unique_ptr<ReadBufferFromFileBase> in = std::make_unique<ReadBufferFromFileDescriptor>(STDIN_FILENO);
if (path_input != "default")
{
String full_path_input = fullPathWithValidate(disk, path_input);
in = disk->readFile(full_path_input);
}
String path_input = config.getString("input", "");
std::unique_ptr<ReadBufferFromFileBase> in;
if (path_input.empty())
{
in = std::make_unique<ReadBufferFromFileDescriptor>(STDIN_FILENO);
}
else
{
String full_path_input = fullPathWithValidate(disk, path_input);
in = disk->readFile(full_path_input);
}

{
String full_path = (fs::absolute(disk->getPath()) / path).lexically_normal();
String disk_path = fs::path(disk->getPath());
if (full_path.find(disk_path) == String::npos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

starts_with?

String full_path = (fs::absolute(disk->getPath()) / path).lexically_normal();
String disk_path = fs::path(disk->getPath());
if (full_path.find(disk_path) == String::npos)
throw DB::Exception("Error path", DB::ErrorCodes::BAD_ARGUMENTS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
throw DB::Exception("Error path", DB::ErrorCodes::BAD_ARGUMENTS);
throw DB::Exception(DB::ErrorCodes::BAD_ARGUMENTS, "Path {} must be inside disk path {}", path, disk->getPath());

Comment on lines 53 to 55
String path_output = config.getString("output", "default");

if (path_output != "default")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
String path_output = config.getString("output", "default");
if (path_output != "default")
String path_output = config.getString("output", "");
if (!path_output.empty())

@Varinara
Copy link
Contributor Author

Varinara commented Jun 3, 2022

Fixes completed, test failures seem unrelated to the changes.@kssenii

Copy link
Member

@kssenii kssenii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

will wait for some tests which are not finished yet (marked as failed now, but they will restart automatically as they failed because of ci reasons), at least waiting for all integration tests

@kssenii
Copy link
Member

kssenii commented Jun 4, 2022

It crashes in debug build:

./clickhouse disks --config-file ~/ch-server/config.xml list-disks
libc++abi: terminating
[1]    1597275 IOT instruction (core dumped)  ./clickhouse disks --config-file ~/ch-server/config.xml list-disks

./clickhouse disks --config-file ~/ch-server/config.xml --disk s3disk  list .
libc++abi: terminating
[1]    1597889 IOT instruction (core dumped)  ./clickhouse disks --config-file ~/ch-server/config.xml --disk s3disk list .

UPDATE: the reason was in invalid connection to s3 which resulted in logical_error and termination in debug build.

may be allow config path at any position? this does not work:

./clickhouse disks list-disks --config-file ~/ch-server/config.xml
DB::Exception: Configuration file /etc/clickhouse-server/config.xml doesn't exist
[1]    1597229 IOT instruction (core dumped)  ./clickhouse disks list-disks --config-file ~/ch-server/config.xml

@Varinara
Copy link
Contributor Author

Varinara commented Jun 6, 2022

Now can set config path at any position. @kssenii

@kssenii
Copy link
Member

kssenii commented Jun 6, 2022

@Mergifyio update

@mergify
Copy link
Contributor

mergify bot commented Jun 6, 2022

update

✅ Branch has been successfully updated

@kssenii kssenii merged commit 4272ca8 into ClickHouse:master Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
can be tested Allows running workflows for external contributors pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants