Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do you run oximeter? #1295

Closed
davepacheco opened this issue Jun 28, 2022 · 6 comments · Fixed by #1302
Closed

how do you run oximeter? #1295

davepacheco opened this issue Jun 28, 2022 · 6 comments · Fixed by #1302
Assignees

Comments

@davepacheco
Copy link
Collaborator

I was running through the simulated how-to instructions and got to running oximeter:

$ cargo run --bin=oximeter -- oximeter/collector/config.toml
    Blocking waiting for file lock on build directory
   Compiling oximeter-collector v0.1.0 (/home/dap/omicron-fixes/oximeter/collector)
    Finished dev [unoptimized + debuginfo] target(s) in 4m 33s
     Running `target/debug/oximeter oximeter/collector/config.toml`
error: Found argument 'oximeter/collector/config.toml' which wasn't expected, or isn't valid in this context

USAGE:
    oximeter <SUBCOMMAND>

For more information try --help

help says:

$ cargo run --bin=oximeter -- --help
    Finished dev [unoptimized + debuginfo] target(s) in 1.25s
     Running `target/debug/oximeter --help`
oximeter 
See README.adoc for more information

USAGE:
    oximeter <SUBCOMMAND>

OPTIONS:
    -h, --help    Print help information

SUBCOMMANDS:
    help       Print this message or the help of the given subcommand(s)
    openapi    Print the external OpenAPI Spec document and exit
    run        Start an Oximeter server

I guess this changed to use the run subcommand.

$ cargo run --bin=oximeter -- run oximeter/collector/config.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.67s
     Running `target/debug/oximeter run oximeter/collector/config.toml`
error: The following required arguments were not provided:
    --id <ID>
    --address <ADDRESS>

USAGE:
    oximeter run --id <ID> --address <ADDRESS> <CONFIG_FILE>

For more information try --help

It's also got new required arguments. What are those?

$ cargo run --bin=oximeter -- run --help
    Finished dev [unoptimized + debuginfo] target(s) in 0.46s
     Running `target/debug/oximeter run --help`
oximeter-run 
Start an Oximeter server

USAGE:
    oximeter run --id <ID> --address <ADDRESS> <CONFIG_FILE>

ARGS:
    <CONFIG_FILE>    Path to TOML file with configuration for the server

OPTIONS:
    -a, --address <ADDRESS>    
    -h, --help                 Print help information
    -i, --id <ID>              

I'm not sure what that's supposed to be the address or id for. I guessed that this was a unique id for the oximeter instance (similar to sled agent) and I gathered from the source history that "address" is a Nexus address (though I'm not sure if it's internal or external API). So I tried:

$ cargo run --bin=oximeter -- run --id=$(uuidgen) --address 127.0.0.1:12221  oximeter/collector/config.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.41s
     Running `target/debug/oximeter run --id=34bb5a05-ef6e-4d80-b96f-9ad422d2694a --address '127.0.0.1:12221' oximeter/collector/config.toml`
error: Invalid value "127.0.0.1:12221" for '--address <ADDRESS>': invalid IPv6 socket address syntax

For more information try --help

I guess this has to be a v6 address. But my Nexus is listening on these addresses:

Jun 28 20:20:52.107 INFO listening, local_addr: 127.0.0.1:12220, component: dropshot_external, name: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c
Jun 28 20:20:52.107 INFO listening, local_addr: 127.0.0.1:12221, component: dropshot_internal, name: e6bff1ff-24fb-49dc-a54e-c6a350cd4d6c

There seems to be an inconsistency here: we allow Nexus to only listen on v4 and the simulated Sled Agent seems okay with that but Oximeter requires it to be on v6.

Just to see what would happen, I gave '[::1]:12221' -- this shouldn't succeed because Nexus is only listening on v4. But it failed differently than I expected:

 $ cargo run --bin=oximeter -- run --id=$(uuidgen) --address [::1]:12221  oximeter/collector/config.toml
    Finished dev [unoptimized + debuginfo] target(s) in 0.44s
     Running `target/debug/oximeter run --id=235441d2-e38a-4cd0-8354-fd005c1403fe --address '[::1]:12221' oximeter/collector/config.toml`
Jun 28 20:34:30.295 DEBG registered DTrace probes
Jun 28 20:34:30.296 INFO starting oximeter server
Jun 28 20:34:30.296 DEBG creating ClickHouse client
Jun 28 20:34:30.299 WARN failed to initialize ClickHouse database, will retry in 208.932877ms, error: ResolveError(Resolve(ResolveError { kind: Proto(ProtoError { kind: Io(Os { code: 148, kind: HostUnreachable, message: "No route to host" }) }) }))
Jun 28 20:34:30.509 DEBG creating ClickHouse client
Jun 28 20:34:30.511 WARN failed to initialize ClickHouse database, will retry in 729.595732ms, error: ResolveError(Resolve(ResolveError { kind: Proto(ProtoError { kind: Io(Os { code: 148, kind: HostUnreachable, message: "No route to host" }) }) }))

It's failing to connect to clickhouse and the error is "no route to host". I wondered if the address on the command line was supposed to be the Clickhouse address so I tried giving [::1]:8123 with the same result (after confirming that the ClickHouse pid is listening on [::1]:8123).

I think I did a bunch of things wrong here but I think there are a few issues here:

  • docs need to use run subcommand and provide the id and address
  • help output should say what the id and address args are
  • should it accept a v4 address for the address?
  • I don't know what's going on with the Clickhouse error

I'm happy to fix the docs and help output but I wasn't sure what to do about the v4 vs. v6 address so I haven't gotten any of it working yet.

@davepacheco
Copy link
Collaborator Author

I meant to add: "host unreachable" is a surprising error there because v4 and v6 localhost addresses are reachable. I'm not sure where else it can be trying to reach. It'd be nice to put that into the log messages.

@smklein
Copy link
Collaborator

smklein commented Jun 28, 2022

I think it's likely I broke this in #1237 - I can give a shot at updating the docs

@bnaecker
Copy link
Collaborator

bnaecker commented Jun 28, 2022

Sorry about that @davepacheco. The address was indeed intended to be the address of ClickHouse, but it now appears to be the address of the DNS server? On oximeter/src/collector/bin/oximeter.rs:429, that address is passed to a Resolver. If I remember right, that's just using the provided IP to find the right DNS server to hit, and then asks it to resolve an SRV record for ClickHouse. Do you know if your DNS server is running and has such a record? Based on the fact that the process is listening on localhost, it looks like you're running things manually. I'm not sure what the intended meaning of address is in that case, since the DNS server may not even be running.

@davepacheco
Copy link
Collaborator Author

Ah, okay. I'm following the simulated instructions and have not started a DNS server. Even though in production we're definitely going to want to use DNS names, I think it'll be quite useful to still be able to point things at individual IPs of specific services.

@bnaecker
Copy link
Collaborator

Yeah, agreed. I think there needs to be an additional flag that tells the program to use a literal address, or at least something that doesn't conflate that with the location of the DNS server. A pair of mutually-exclusive arguments would be nice.

@smklein smklein self-assigned this Jun 29, 2022
@smklein
Copy link
Collaborator

smklein commented Jun 29, 2022

The Oximeter configuration file is already taking both the Nexus and Clickhouse addresses as optional parameters. If not supplied, they'll use DNS. If supplied, they'll use the hard-coded address values.

This was removed from the "example" config file, but should not have been. I'll add it back, and update the docs in a follow-up PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants