Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix elastic-agent cloud image run issues #1630

Conversation

fearful-symmetry
Copy link
Contributor

What does this PR do?

We're seeing issues with various docker builds of elastic-agent failing with an "input not supported" error. It appears that when elastic-agent starts up in container mode, it'll set the paths incorrectly under V2, so we aren't reading the specfiles from the component directory.

This makes a few changes to the container setup to get the paths working under the "flat" V2 structure. I'm not 100% sure what else is happening under "container mode" so I want someone with a little more experience here to make sure I'm not breaking anything. I tested this natively and inside a container, and it seems to work.

In the future, we might want to modify the specfile load.go so we return an specific error if we can't find any specfiles

Checklist

  • My code follows the style guidelines of this project
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

@fearful-symmetry fearful-symmetry added bug Something isn't working v8.6.0 labels Oct 27, 2022
@fearful-symmetry fearful-symmetry self-assigned this Oct 27, 2022
@fearful-symmetry fearful-symmetry requested a review from a team as a code owner October 27, 2022 22:34
@fearful-symmetry fearful-symmetry requested review from michel-laterman and removed request for a team October 27, 2022 22:34
@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 27, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-10-28T20:41:01.289+0000

  • Duration: 17 min 30 sec

Test stats 🧪

Test Results
Failed 0
Passed 4461
Skipped 13
Total 4474

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Oct 27, 2022

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.305% (58/59)
Files 70.256% (137/195)
Classes 70.161% (261/372)
Methods 54.318% (780/1436)
Lines 39.634% (8287/20909)
Conditionals 100.0% (0/0) 💚

@cmacknz
Copy link
Member

cmacknz commented Oct 28, 2022

The biggest thing we need to make sure we don't break is using custom a STATE_PATH, which is often used to mount that directory on a persistent or host volume when the agent runs on Kubernetes so that the directory contents persist across restarts and upgrades. It should hopefully be easy to test this by just changing the state path environment variable.

@@ -789,10 +789,9 @@ func setPaths(statePath, configPath, logsPath string, writePaths bool) error {
}
originalInstall := paths.Install()
originalTop := paths.Top()
paths.SetTop(topPath)
paths.SetTop(statePath) // change this?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct from my reading of the comments here:

// Top returns the top directory for Elastic Agent, all the versioned
// home directories live under this top-level/data/elastic-agent-${hash}
func Top() string {
return topPath
}
// SetTop overrides the Top path.
//
// Used by the container subcommand to adjust the overall top path allowing state can be maintained between container
// restarts.
func SetTop(path string) {
topPath = path
}

I also suspect you want to change the topPath variable here so that the permissions are set correctly:

topPath := filepath.Join(statePath, "data")
configPath = envWithDefault(configPath, "CONFIG_PATH")
if configPath == "" {
configPath = statePath
}
// ensure that the directory and sub-directory data exists
if err := os.MkdirAll(topPath, 0755); err != nil {

paths.SetConfig(configPath)
// when custom top path is provided the home directory is not versioned
paths.SetVersionHome(false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious to know what this does (I expect we need Michal or Blake to answer this). It's a bit hard to tell from the code.

As far as I can tell in the container I built, the home path is versioned by default. I get /usr/share/elastic-agent/data/elastic-agent-fc3eba. I don't know that this would change if the STATE_PATH were to change either so removing it seems right but it would be good to confirm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, this just adds the elastic-agent-fc3eba directory, which is still there in docker.

@fearful-symmetry
Copy link
Contributor Author

@cmacknz , yah, I suspect there's deeper problems with how the state_path is being set, as elastic-agent in its current state (and in this PR) uses the state path for the entire run directory. I assume this code made more sense before the "flattened" V2 directory structure, but right now I'm not sure we have the mechanism to separate out the state info from the rest of the pre-installed elastic-agent assets.

@fearful-symmetry
Copy link
Contributor Author

Alright, added the changelog and fixed up a few things. @cmacknz right now, it seems like elastic-agent just kind of mixes statefiles in the flat directory along with everything else. If the intent is to have STATE_PATH be a separate directory that users can mount just for tracking state, we may want to open an issue to track that.

@@ -40,9 +40,9 @@ import (
const (
requestRetrySleepEnv = "KIBANA_REQUEST_RETRY_SLEEP"
maxRequestRetriesEnv = "KIBANA_REQUEST_RETRY_COUNT"
defaultRequestRetrySleep = "1s" // sleep 1 sec between retries for HTTP requests
defaultMaxRequestRetries = "30" // maximum number of retries for HTTP requests
defaultStateDirectory = "/usr/share/elastic-agent/state" // directory that will hold the state data
Copy link
Contributor

@michalpristas michalpristas Oct 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is here probably to share state across runs. have mounted volume.
this PR fixes the problem but i don't believe it's proper fix and will introduce another set of issues.

i believe proper fix would be to fix not just data/downloads but also data/components
will prepare PR on monday and test it as well

@cmacknz
Copy link
Member

cmacknz commented Oct 31, 2022

@michalpristas should this be closed in favour of #1653?

@michalpristas
Copy link
Contributor

@cmacknz yes please

@michalpristas
Copy link
Contributor

closing in favor of #1653

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working skip-changelog v8.6.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants