Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add volume stats metrics - #677

Merged
merged 1 commit into from
Jan 11, 2021

Conversation

AndyXiangLi
Copy link
Contributor

@AndyXiangLi AndyXiangLi commented Jan 4, 2021

Is this a bug fix or adding new feature?
Fixes #223
Fixes #499
What is this PR about? / Why do we need it?
Implement Volume stats metrics

What testing is done?
Sanity test passed
Added unit test for the change
Validate volume status can be logged in the ebs-plugin in node pod

@k8s-ci-robot
Copy link
Contributor

Hi @AndyXiangLi. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 4, 2021
},
},
}, nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you post output of this passing csi sanity tests? I assume it did but just to be sure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sanity test is passed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify in the PR description? Ideally with the output.

@gnufied
Copy link
Contributor

gnufied commented Jan 4, 2021

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Jan 4, 2021
@coveralls
Copy link

coveralls commented Jan 4, 2021

Pull Request Test Coverage Report for Build 1450

  • 47 of 73 (64.38%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.5%) to 81.177%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/driver/node.go 47 73 64.38%
Totals Coverage Status
Change from base Build 1447: -0.5%
Covered Lines: 1669
Relevant Lines: 2056

💛 - Coveralls


var _ Statter = realStatter{}

type realStatter struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should name this simply statter.


func (fakeStatter) StatFS(path string) (available, capacity, used, inodesFree, inodes, inodesUsed int64, err error) {
// Assume the file exists and give some dummy values back
return 1, 1, 1, 1, 1, 1, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't you return back whatever is passed to this function? It'd be more useful for testing.

"golang.org/x/sys/unix"
)

type Statter interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some documentation? It'd be really helpful going forward.

strOut := strings.TrimSpace(string(output))
gotSizeBytes, err := strconv.ParseInt(strOut, 10, 64)
if err != nil {
return -1, fmt.Errorf("failed to parse size %s into int a size", strOut)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: failed to parse size %s as int

return false, err
}

return (st.Mode & unix.S_IFMT) == unix.S_IFBLK, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to assume this only works on unix?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, right now it is only working for unix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. I see you've added the checks for windows, which is good enough for now. I'm OK with only working on unix, but we shouldn't break windows.

@ayberk
Copy link
Contributor

ayberk commented Jan 5, 2021

Supersedes #589.

@wongma7
Copy link
Contributor

wongma7 commented Jan 6, 2021

fyi, test case failed

=== RUN TestNodeGetVolumeStats
node_test.go:1194: Expected no error, got rpc error: code = NotFound desc = path /test does not exist
panic.go:636: missing call(s) to *mocks.MockMounter.ExistsPath(is equal to /test) /home/prow/go/src/github.com/kubernetes-sigs/aws-ebs-csi-driver/pkg/driver/node_test.go:1180
panic.go:636: aborting test due to missing call(s)
--- FAIL: TestNodeGetVolumeStats (0.00s)

@AndyXiangLi AndyXiangLi closed this Jan 6, 2021
@k8s-ci-robot k8s-ci-robot added size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 6, 2021
@AndyXiangLi AndyXiangLi reopened this Jan 6, 2021
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Jan 6, 2021
@wongma7
Copy link
Contributor

wongma7 commented Jan 6, 2021

/approve

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 6, 2021
@AndyXiangLi AndyXiangLi requested review from gnufied and ayberk January 6, 2021 22:39
Copy link
Contributor

@ayberk ayberk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple comments, good work!

Also, looks like you'll need to improve the coverage 😅

return false, err
}

return (st.Mode & unix.S_IFMT) == unix.S_IFBLK, nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine. I see you've added the checks for windows, which is good enough for now. I'm OK with only working on unix, but we shouldn't break windows.

}

// Available is blocks available * fragment size
available = int64(statfs.Bavail) * int64(statfs.Bsize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these fields guaranteed to be present if unix.Statfs succeeds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they are all system level variables to my understanding, and we should be able to get all the following value in one call: Type Bsize Blocks Bfree Bavail Files Ffree Fsid NamelenFrsize Flags Spare

// Capacity is total block count * fragment size
capacity = int64(statfs.Blocks) * int64(statfs.Bsize)
// Usage is block being used * fragment size (aka block size).
used = (int64(statfs.Blocks) - int64(statfs.Bfree)) * int64(statfs.Bsize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably do something like if used < 0 { used = 0 } and maybe log a warning for sanity.

inodes = int64(statfs.Files)
inodesFree = int64(statfs.Ffree)
inodesUsed = inodes - inodesFree
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got confused for a moment here. God I hate the named return :)

Can you also add a comment to clarify the unit of these values? I'd assume they're bytes.

isBlock, err := d.statter.IsBlockDevice(req.VolumePath)

if err != nil {
return nil, status.Errorf(codes.NotFound, "failed to determine whether %s is block device: %v", req.VolumePath, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be codes.Internal?

},
},
}, nil

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you clarify in the PR description? Ideally with the output.

@@ -73,6 +76,15 @@ func TestSanity(t *testing.T) {
sanity.Test(t, config)
}

func createDir(targetPath string) (string, error) {
if err := os.MkdirAll(targetPath, 0755); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

755 looks too permissive to me. 644 should be good for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we need write and exec permit here, minimum 300 will work. But 755 is the default permit sanity_test used to mkdir, so I kept it..
https://github.com/kubernetes-csi/csi-test/blob/054980205e4fc45ab9e940e39bdfd71066d96d87/pkg/sanity/sanity.go#L365

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure why we would need exec on a folder. TIL :) https://unix.stackexchange.com/a/150456

@@ -73,6 +76,15 @@ func TestSanity(t *testing.T) {
sanity.Test(t, config)
}

func createDir(targetPath string) (string, error) {
if err := os.MkdirAll(targetPath, 0755); err != nil {
if !os.IsExist(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use IsNotExist here.

@@ -309,13 +321,48 @@ func (f *fakeMounter) GetDeviceName(mountPath string) (string, int, error) {
}

func (f *fakeMounter) MakeFile(pathname string) error {
file, err := os.OpenFile(pathname, os.O_CREATE, os.FileMode(0644))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we talked about this offline, but I want to note it publicly.

I am not quite sure why we have a different mounter for sanity tests. We should use the code from the driver itself, otherwise we're not really testing the "real" code.

/cc @wongma7 @gnufied

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's fine, I'm not too familiar with sanity but seems mocking the mount is common pattern. https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/blob/2014365ea7b3405e024be1d9552aa30d44da4a8b/pkg/mount-manager/fake-safe-mounter.go#L28

I think the sanity test is still testing everything in the driver code surrounding wheerever it actually tries to call "Mount/Statfs".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be fine as long as we don't modify the mount, but if we do that we risk silently failing sanity tests.

@AndyXiangLi can you add a note to the actual mount code? That it's mirrored here and any modification should be applied here too. Not ideal, but I'd feel safer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayberk Sure will do that, and as we talked off line, I think it would be better if we can extract those three functions out of mounter in the future.

@k8s-ci-robot k8s-ci-robot requested a review from wongma7 January 6, 2021 23:05
@AndyXiangLi AndyXiangLi force-pushed the volume-stat-metrics branch 2 times, most recently from b88f7c1 to 13bf1a3 Compare January 7, 2021 23:34
@Henni Henni mentioned this pull request Jan 8, 2021
@AndyXiangLi AndyXiangLi force-pushed the volume-stat-metrics branch 2 times, most recently from 81b5c9d to e7e5d3e Compare January 9, 2021 01:12
mounter: mockMounter,
inFlight: internal.NewInFlight(),
}
err := os.MkdirAll(VolumePath, 0644)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there are some indentation problems in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@ayberk
Copy link
Contributor

ayberk commented Jan 11, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 11, 2021
@wongma7
Copy link
Contributor

wongma7 commented Jan 11, 2021

I'll manually merge, coverage decrease is because of getBlockSizeBytes which is not easy to exercise and anyway coveralls is not supposed to be required! ( longstanding TODO to fix this).

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndyXiangLi, wongma7

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Report volume usage Report Metrics for the driver
6 participants