Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVIDStore disk plugin #2647

Closed
wants to merge 12 commits into from
Closed

Conversation

amartinezfayo
Copy link
Member

Implementation of the SVIDStore disk plugin.

Fixes #2077.

Signed-off-by: Agustín Martínez Fayó <[email protected]>
Signed-off-by: Agustín Martínez Fayó <[email protected]>
Signed-off-by: Agustín Martínez Fayó <[email protected]>
@loveyana
Copy link
Contributor

loveyana commented Dec 3, 2021

hi @amartinezfayo , I am glad that a local disk-based svid component has been almost completed.

I compiled this change into a plug-in separately and tested it in the local and docker environment.
I found that setxattr only supports setxattr in our debian 9 version For the attribute creation of the specified namespace, https://man7.org/linux/man-pages/man5/attr.5.html ,but the constant spire-svid is used as the attribute name in your plug-in, do you mind modifying this problem, regard it as a configuration or modify it as a user namespace attribute, I think it is a good idea, below Is a screenshot of the execution in our environment.

image
image

this is the result of local execution, and the execution in docker and kubernetes is also the same.

@loveyana
Copy link
Contributor

loveyana commented Dec 3, 2021

There is also an immature idea.I mentioned such a problem before, #2573, spiffe-csi uses csi-driver to build grpc communication from other services to spire-agent, and some paths to mount it.
I think we can insert certain specified fields into the "xattr" of the file in the disk plug-in, maybe also in the entry definition, so that we can make the pod only The purpose of being able to mount to its own svid file and socket, do you have any suggestions for this.

Looking forward for your response, thank you~

Signed-off-by: Agustín Martínez Fayó <[email protected]>
Signed-off-by: Agustín Martínez Fayó <[email protected]>
@amartinezfayo
Copy link
Member Author

Thank you @loveyana for looking at this PR!

the constant spire-svid is used as the attribute name in your plug-in, do you mind modifying this problem, regard it as a configuration or modify it as a user namespace attribute

Thanks for pointing that the namespace was missing. It now uses the user namespace when setting the extended attribute.

I think we can insert certain specified fields into the "xattr" of the file in the disk plug-in, maybe also in the entry definition, so that we can make the pod only The purpose of being able to mount to its own svid file and socket, do you have any suggestions for this.

If I'm understanding correctly, you are suggesting to encode in selectors information that will end up being set as extended attributes in the (SVID) files that the plugin writes to disk. If that's the case, I would be hesitant to provide that kind of functionality that essentially would allow to set arbitrary data in the extended attributes that would be consumed for a certain purpose.
Generally speaking, I think that the plugin should serve the specific purpose of storing SVIDs on disk, and handling that kind of functionality feels like out of the scope of the plugin and can lead to compatibility problems and make it more error prone.

attrName = "user.spire-svid"
)

type certFile struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small naming nit: certFile is being used for cert chains, private keys, and bundles. Maybe svidFile?

log.With("bundle_file_path", diskStore.bundle.filePath).Debug("Writing bundle file")
if err := diskStore.bundle.write(p.trustDomain); err != nil {
return nil, status.Errorf(codes.Internal, "failed to write bundle file: %v", err)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it make sense to best-effort delete the previously-saved components of the stored SVID if a later write action fails? This would help keep disk usage tidied up

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we are updating a previously stored-SVID, is it possible for us to reach an inconsistent state where the past SVID is partially on disk, and the rest has been overwritten by the updated SVID?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @amoore877 for the feedback. I'm implementing a solution that should address these concerns, providing a reasonable level of atomicity in the write action. I'll be updating the PR with that.

@loveyana
Copy link
Contributor

loveyana commented Dec 5, 2021

@amartinezfayo I am glad to receive your reply, and thank you that you have solved this problem, I have passed the test in the cluster.

Regarding the second half of the reply, I actually still have some thoughts I want to share with you. In our current environment, if we use the svidstore plug-in for svid storage, we definitely hope that each service can only get its own svid.

I think that in the gcp or aws svidstore scenario, we have solved this problem in disguise through the cloud platform's permission management, but in the disk svidstore scenario, in the current design, I completely believe that there is no way to solve this problem.

So I have a suggestion, whether we can specify a way like -selector "disk:xattr:serviceaccount:example-service" when the entry is created, by inserting user.<plugin-prefix>-serviceaccount to example service svids files, It is left to the platforms that distribute these svids, including kubernetes or any virtualization platform to determine whether to issue certificates. I think this is also the original intention of spiffe in designing the selector mechanism.


// SetLogger sets the logger used by the plugin
func (p *DiskPlugin) SetLogger(log hclog.Logger) {
p.mtx.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lcoks are not required because it is done in plugin level


p.trustDomain = req.CoreConfiguration.TrustDomain
p.config = config
if config.Directory == "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe you can move this if above lock just to make it easier to read?

func newDiskStore(metaData []string, dir string) (*diskStore, error) {
metadataMap, err := svidstore.ParseMetadata(metaData)
if err != nil {
return nil, status.Errorf(codes.InvalidArgument, "error parsing metadata: %v", err)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function caller your are just getting error message, and creating a new status.
So this status will not be used.

What do you think about, and the same for all another errors form this function, what do you think about in caller just. resturn error instead of getting error message and creating a new status?

Another option is that you can just create regular errors here, but yyou may do the same in all another functions, and it will be a little incosistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The caller should just return the error as it is, without adding a status. All code paths should return a proper status. I'll fix this.

return p.config, nil
}

func (c *certFile) write(attrValue string) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

functions must be moved together with struct definition

// This is a best-effort attempt to avoid collisions with other systems.
// Some platforms do not support this mechanism.
func validateXattr(filePath, attrValue string) error {
if _, statErr := os.Stat(filePath); os.IsNotExist(statErr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if it returns another error? is it not important?

@@ -0,0 +1,15 @@
// +build !linux
// +build !darwin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about 1 file for windows and another for !windows?
we are using that approach in another places, and may works

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that there are non-Windows platforms where this is not supported, so we can't just have !windows for the supported implementation. The supported ones are: linux, darwin, freebsd and netbsd. I see that I missed freebsd and netbsd., which I should add.


func setxattr(filePath, attr string, data []byte) error {
// On unsupported systems, we just do nothing
return nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you forgot to close {

// On unsupported systems, we just do nothing
return nil

func getxattr(filePath, attr string, dest []byte) (err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe:

func getxattr(filePath, attr string, dest []byte) error {

return nil
}

func getxattr(filePath, attr string, dest []byte) (err error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as before you are not using err var

log = log.WithField(telemetry.SVIDStore, storeName)
log = log.WithFields(logrus.Fields{
telemetry.SVIDStore: storeName,
telemetry.Entry: entry.EntryId,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you replacing log here? it is defined above and contains EntryID and SPIFFEID

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this. I was somehow confused about missing the entry ID and SPIFFE ID.

@azdagron
Copy link
Member

azdagron commented Dec 6, 2021

I wonder if there is an alternative here than the use of extended attributes. Support for xattrs is very spotty based on the filesystem and/or platform. Considering the wide scope of SPIRE deployments, I think we'd be better off if we can figure something else out rather than have a divergent experience.

Signed-off-by: Agustín Martínez Fayó <[email protected]>
@azdagron azdagron added this to the 1.2.0 milestone Dec 7, 2021
- Set proper ownership and permissions
- Do not use extended attributes

Signed-off-by: Agustín Martínez Fayó <[email protected]>
Signed-off-by: Agustín Martínez Fayó <[email protected]>
Signed-off-by: Agustín Martínez Fayó <[email protected]>
| `disk:certchainfile` | `disk:certchainfile:tls.crt` | x | The file path relative to the base directory where the SVID certificate chain will be stored. |
| `disk:keyfile` | `disk:keyfile:key.crt` | x | The file path relative to the base directory where the SVID certificate key will be stored. |
| `disk:bundlefile` | `disk:bundlefile:ca.crt` | x | The file path relative to the base directory where the CA certificates belonging to the Trust Domain of the SVID will be stored. |
| `disk:certchainfile` | `disk:certchainfile:tls.crt` | x | The file path relative to the base directory where the SVID certificate chain will be stored. Must be in the same directory as `keyfile` and `bundlefile`. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about setting path in each selector, we are forced to set all selectors with paths and make sure all paths are the same.
What do you think about creating a selector to set a folder in a single place?
that will allow us to have "default" file names, so entries will be easier, and we can have other selectors to set file names.
example using default names:

{  
    "spiffeID": "spiffe://example.org/w1",
    "parentID": "spiffe://example.org/agent",
    "selectors", [
         "disk:directory:/tmp"
    ]
}

example changing a single file name

{  
    "spiffeID": "spiffe://example.org/w1",
    "parentID": "spiffe://example.org/agent",
    "selectors", [
         "disk:certchainfile:tls.crt"
    ]
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this and I didn't like the idea of adding a new selector.
Now that I think it more, this also has the benefit of being able to have default file names and end up with a less number of required selectors. I'll make the change to implement it in the suggested way.

| `disk:certchainfile` | `disk:certchainfile:tls.crt` | x | The file path relative to the base directory where the SVID certificate chain will be stored. Must be in the same directory as `keyfile` and `bundlefile`. |
| `disk:keyfile` | `disk:keyfile:key.crt` | x | The file path relative to the base directory where the SVID certificate key will be stored. Must be in the same directory as `certchainfile` and `bundlefile`. |
| `disk:bundlefile` | `disk:bundlefile:ca.crt` | x | The file path relative to the base directory where the CA certificates belonging to the Trust Domain of the SVID will be stored. Must be in the same directory as `certchainfile` and `keyfile`. |
| `disk:group` | `disk:group:my-workload` | x (if `gid` is not specified) | The group name that is set to the files written to disk. If set, `gid` cannot be specified. |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

disk:group and disk:gid is confusing, may we use groupname or gname instead? and have a consistent name for gid?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. I like groupname and groupid.

if err := createBaseDirectoryIfNeeded(c.filePath); err != nil {
// If needed, create the parent directory that will contain the
// autogenerated directory and the symbolic link.
parentDir := filepath.Dir(originalDir)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means that agent user must have writing rights to parent folder,
if we have a selector disk:certchain:/usr/certs/cert.pem user must have write rights to /usr and then we try to change parent permissions, no matter if we created the folder or it exists.
I think we must never change folder permission, and only change it when we create that folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this! Changing permissions on the parent directory (out of the plugin directory space) was not the intention. I'll fix that.

// directory is not completed. If there is an error before
// finishing writing all the files, the directory must be
// cleaned up.
d.cleanupDir = autoGeneratedDir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why an attribute to struct level is required here? is it possible to have another value? and we remove even if function fail before?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's better to handle this with a var in the function. I'll make that change.


keyFile := filepath.Join(autoGeneratedDir, filepath.Base(d.key.filePath))
log.With("key_file_path", keyFile).Debug("Writing key file")
if err := os.WriteFile(keyFile, d.key.pemBytes, permFile); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment about why keys uses 640 permissions? I understand it is done this way because workload must have access to key.

// Defer a function that cleanups the files that are no
// longer used.
defer func() {
if removeAllErr := os.RemoveAll(d.cleanupDir); removeAllErr != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: validate d.cleanupDir != ""

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calling os.RemoveAll with an empty string causes no error.

return status.Errorf(codes.Internal, "failed to rename symbolic link: %v", err)
}

if oldPath != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just assign oldPath, since you are always using oldPath value, and maybe add a comment about what happens here

gid := metadataMap["gid"]
groupName := metadataMap["group"]

if gid != "" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be simplified with a switch:

swtich {
   case gid != "" && groupName != "":
       return nil, status.Error(codes.InvalidArgument, "either gid or group must be specified, not both")
   case gid != "":
        ....
    case groupName != "" :
        .....
}

Signed-off-by: Agustín Martínez Fayó <[email protected]>
@amartinezfayo
Copy link
Member Author

This PR evidences that the configuration and use of this plugin as it has been implemented here can be cumbersome and most likely will result in operational problems. More details can be found here: #2077 (comment)
Closing it as a result of that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Serverless support: disk plugin
5 participants