Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic crash in reporter.go crashes in managed mode #1340

Closed
amolnater-qasource opened this issue Sep 28, 2022 · 7 comments · Fixed by #1341
Closed

Panic crash in reporter.go crashes in managed mode #1340

amolnater-qasource opened this issue Sep 28, 2022 · 7 comments · Fixed by #1341
Assignees
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@amolnater-qasource
Copy link

Description:
Hosted Fleet Server not available on Healthy deployment of 8.4.3 BC-1 when deployed from:

Screenshots:

15

14

@amolnater-qasource amolnater-qasource added bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team labels Sep 28, 2022
@amolnater-qasource
Copy link
Author

@manishgupta-qasource Please review.

@manishgupta-qasource
Copy link

Secondary review for this ticket is Done

@AndersonQ
Copy link
Member

I'm looking at it, just to confirm, the 8.4.3 BC-1 is the 8.4.3-SNAPSHOT, right?

@joshdover
Copy link
Contributor

I'm looking at it, just to confirm, the 8.4.3 BC-1 is the 8.4.3-SNAPSHOT, right?

No it's not necessarily the exact same build. To deploy BC1 you can go to staging and deploy the 8.4.3 release (not SNAPSHOT)

@joshdover
Copy link
Contributor

joshdover commented Sep 28, 2022

The root cause has been identified as a panic being caused by an elastic-agent bug introduced in 7f10625. @AndersonQ is preparing a fix which should be up very soon.

panic: assignment to entry in nil map

goroutine 1 [running]:
github.com/elastic/elastic-agent/internal/pkg/core/status.(*controller).RegisterLocalComponent(0x40001f58c0, {0xaaaadd2a6670, 0xf})
        github.com/elastic/elastic-agent/internal/pkg/core/status/reporter.go:151 +0x29c
github.com/elastic/elastic-agent/internal/pkg/agent/application/gateway/fleet.newFleetGatewayWithScheduler({0xaaaadda65b28?, 0x400039b240}, 0x40004c2460, 0xaaaade856620, {0xaaaadda4c488?, 0x40002e4fc0}, {0xaaaadda53888?, 0x40003086e0}, {0xaaaadda4c608?, 0x4000a379b0}, ...)
        github.com/elastic/elastic-agent/internal/pkg/agent/application/gateway/fleet/fleet_gateway.go:160 +0x138
github.com/elastic/elastic-agent/internal/pkg/agent/application/gateway/fleet.New({0xaaaadda65b28, 0x400039b240}, 0x40001a7ec0?, {0xaaaadda4c488, 0x40002e4fc0}, {0xaaaadda53888, 0x40003086e0}, {0xaaaadda4c608, 0x4000a379b0}, {0xaaaadda53748, ...}, ...)
        github.com/elastic/elastic-agent/internal/pkg/agent/application/gateway/fleet/fleet_gateway.go:110 +0x178
github.com/elastic/elastic-agent/internal/pkg/agent/application.newManaged({0xaaaadda65b60?, 0x40001b2000}, 0x40004c2460, {0xffff5c5881d8?, 0x40002198c0}, 0x4000535bf0, 0x1?, {0xffff5c57de98?, 0x40002e5020}, {0xaaaadda6c8c8, ...}, ...)
        github.com/elastic/elastic-agent/internal/pkg/agent/application/managed_mode.go:279 +0x1750
github.com/elastic/elastic-agent/internal/pkg/agent/application.createApplication(0x40004c2460, {0x4000256570, 0x30}, 0xaaaadbf49bb8?, {0xffff5c57de98, 0x40002e5020}, {0xaaaadda6c8c8, 0x40001f58c0}, {0xaaaadda4c828, 0x40000b0b80}, ...)
        github.com/elastic/elastic-agent/internal/pkg/agent/application/application.go:103 +0x228
github.com/elastic/elastic-agent/internal/pkg/agent/application.New(0x40000b0b80?, {0xffff5c57de98, 0x40002e5020}, {0xaaaadda6c8c8, 0x40001f58c0}, {0xaaaadda4c828, 0x40000b0b80}, 0x0?, 0x0?)
        github.com/elastic/elastic-agent/internal/pkg/agent/application/application.go:65 +0xb0
github.com/elastic/elastic-agent/internal/pkg/agent/cmd.run(0x4000569e40?)
        github.com/elastic/elastic-agent/internal/pkg/agent/cmd/run.go:212 +0xbc4
github.com/elastic/elastic-agent/internal/pkg/agent/cmd.runContainerCmd(_, {{{0x0, 0x0}, 0x1, {0x0, 0x0}, 0x0, 0x0, {0xaaaadd2990e7, 0x7}, ...}, ...})
        github.com/elastic/elastic-agent/internal/pkg/agent/cmd/container.go:335 +0x510
github.com/elastic/elastic-agent/internal/pkg/agent/cmd.containerCmd(0x400041e900)
        github.com/elastic/elastic-agent/internal/pkg/agent/cmd/container.go:252 +0x584
github.com/elastic/elastic-agent/internal/pkg/agent/cmd.logContainerCmd(0x400041e900)
        github.com/elastic/elastic-agent/internal/pkg/agent/cmd/container.go:170 +0x73c
github.com/elastic/elastic-agent/internal/pkg/agent/cmd.newContainerCommand.func1(0x40000d1900?, {0xaaaadd295d3f?, 0x0?, 0x0?})
        github.com/elastic/elastic-agent/internal/pkg/agent/cmd/container.go:137 +0x28
github.com/spf13/cobra.(*Command).execute(0x40000d1900, {0xaaaade907b80, 0x0, 0x0})
        github.com/spf13/[email protected]/command.go:860 +0x4ac
github.com/spf13/cobra.(*Command).ExecuteC(0x4000251400)
        github.com/spf13/[email protected]/command.go:974 +0x354
github.com/spf13/cobra.(*Command).Execute(...)
        github.com/spf13/[email protected]/command.go:902
main.main()
        github.com/elastic/elastic-agent/main.go:33 +0x198

Some learnings from this:

  • The panic crash from elastic-agent is not showing up in the logs we have ingested in Cloud. We need to have this resolved. @mieciu any advice here would be helpful. I can provide some deployment IDs to look at.
  • We shouldn't be able to produce DRAs with bugs of this severity. We need to prioritize stabilizing the e2e-testing suite and re-enabling that suite as a DRA blocker.
  • We also probably shouldn't be able to merge code with this type of issue. This should have been caught by some level of testing in the elastic-agent repository before it was merged.

@joshdover joshdover transferred this issue from elastic/fleet-server Sep 28, 2022
@joshdover joshdover changed the title Hosted Fleet Server not available on Healthy deployment of 8.4.3 BC-1 Panic crash in reporter.go crashes in managed mode Sep 28, 2022
@amolnater-qasource amolnater-qasource added the QA:Ready For Testing Code is merged and ready for QA to validate label Sep 28, 2022
@amolnater-qasource
Copy link
Author

Hi Team
Thank you for looking into this.

We are now successfully able to setup 8.4.3 BC2 Kibana cloud environment.

Screenshot:
image (1)

Hence marking this as QA:Validated.

Thanks

@amolnater-qasource amolnater-qasource added QA:Validated Validated by the QA Team and removed QA:Ready For Testing Code is merged and ready for QA to validate labels Sep 29, 2022
@ghost
Copy link

ghost commented Nov 22, 2022

Bug Conversion

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants