Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-xxx client: Dump metrics to container #15525

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from
Draft

Conversation

mjmac
Copy link
Contributor

@mjmac mjmac commented Nov 20, 2024

If configured, dump metrics to a container instead of
a directory.

Copy link

github-actions bot commented Nov 20, 2024

Errors are Ticket number suffix is not a number. See https://daosio.atlassian.net/wiki/spaces/DC/pages/11133911069/Commit+Comments,Unable to load ticket data
https://daosio.atlassian.net/browse/DAOS-xxx

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15525/1/testReport/

@mjmac mjmac force-pushed the mjmac/dfs_stats branch 2 times, most recently from 22e2ae2 to 78f09d2 Compare November 22, 2024 23:01
@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15525/3/testReport/

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15525/3/testReport/

@mjmac mjmac changed the title mjmac/dfs stats DAOS-xxx client: Add DFS-level metrics Nov 25, 2024
@mjmac mjmac changed the title DAOS-xxx client: Add DFS-level metrics DAOS-xxx container: Dump metrics to container Nov 27, 2024
@mjmac mjmac changed the title DAOS-xxx container: Dump metrics to container DAOS-xxx client: Dump metrics to container Nov 27, 2024
@daosbuild1
Copy link
Collaborator

Test stage Unit Test on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15525/4/testReport/

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15525/4/testReport/

@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15525/5/testReport/

mjmac added 3 commits December 4, 2024 20:19
Add a boolean container property that can be used to
toggle client-side metrics collection and reporting
behavior on a per-container basis. By default, containers
will not participate in metrics collection, and must
be opted-in via the property. The initial use case for
this property is to enable DFS-level metrics in
POSIX containers.

Features: container
Required-githooks: true
Signed-off-by: Michael MacDonald <[email protected]>
If metrics are enabled for a POSIX container, create
a new container/$UUID/dfs metrics root in the client
telemetry to provide DFS-oriented metrics (POSIX ops,
file I/Os, etc).

Features: container
Required-githooks: true
Signed-off-by: Michael MacDonald <[email protected]>
Required-githooks: true

Signed-off-by: Michael MacDonald <[email protected]>
@daosbuild1
Copy link
Collaborator

Test stage NLT on EL 8.8 completed with status UNSTABLE. https://build.hpdd.intel.com/job/daos-stack/job/daos//view/change-requests/job/PR-15525/6/testReport/

/*
* dc_task_schedule() calls tse_task_complete() even on error (which also calls the
* completion cb that frees params in this case, so we can just ignore the rc here.
*/
dc_task_schedule(task, true);

dfs_update_file_metrics(dfs, *params->read_size, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think those metrics update calls should be in the completion callback when the read completes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@@ -412,6 +412,12 @@ open_stat(dfs_t *dfs, dfs_obj_t *parent, const char *name, mode_t mode, int flag

out:
if (rc == 0) {
if (flags & O_CREAT) {
DFS_OP_STAT_INCR(dfs, DOS_CREATE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it would be better to not have a CREATE metric, but re-use the mkdir (because that is what open with O_CREAT for dir is) and File_Create or just CREATE for creating a file.

if (flags & O_CREAT) {
DFS_OP_STAT_INCR(dfs, DOS_CREATE);
} else {
DFS_OP_STAT_INCR(dfs, DOS_OPEN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would also split this into opendir and open

@@ -1662,6 +1679,7 @@ dfs_punch(dfs_t *dfs, dfs_obj_t *obj, daos_off_t offset, daos_size_t len)
return daos_der2errno(rc);
}

DFS_OP_STAT_INCR(dfs, DOS_PUNCH);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

punch is not a posix term, so maybe better one would be truncate?

@@ -1708,6 +1726,7 @@ dfs_sync(dfs_t *dfs)
if (dfs->amode != O_RDWR)
return EPERM;

DFS_OP_STAT_INCR(dfs, DOS_SYNC);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is a no-op, I would not increment the metric for it as it can cause confusion.

### Per-Container Metrics Collection

If enabled, per-container metrics (currently only POSIX containers are supported)
will be collected on the client. Set the `DAOS_PROP_CO_METRICS_ENABLED` container
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think this is a good idea. we can discuss offline

@mjmac
Copy link
Contributor Author

mjmac commented Dec 5, 2024

@mchaarawi: This is the combined patch for playing around with the feature. I'll incorporate your feedback into the standalone PR for DFS metrics: #15544

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants