Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix confusing error message gcp #3756

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

Excoriate
Copy link

Description

Fixes #3730
Additionally, I added a slight enhancement in the relevant docs to explain this validation further. However, I'd love to be directed to the correct place if I haven't added it 😅 .

Problem Statement

The current Terragrunt GCS remote state configuration lacks consistent validation for project and location attributes when creating new buckets. This can lead to unclear error messages and potential configuration issues during remote state initialization.

Proposed Solution

Enhance the GCS remote state configuration validation to:

  • Require project and location only when bucket creation is not explicitly skipped
  • Maintain existing behavior for existing buckets
  • Provide clearer error messages for missing configuration
  • Preserve the skip_bucket_creation configuration option

Feel free to run the tests executing:

go test ./remote -tags=gcp -run TestValidateGCSConfig -v;

TODOs

Read the Gruntwork contribution guidelines.

  • Update the docs.
  • Run the relevant tests successfully, including pre-commit checks.
  • Include release notes. If this PR is backward incompatible, include a migration guide.

Release Notes (draft)

Added GCS (Remote State) validation for location and project's attribute when the skip_bucket_creation option is set to false.

@yhakbar
Copy link
Collaborator

yhakbar commented Jan 13, 2025

Hey @Excoriate ,

Thanks for submitting this pull request! I'm excited to get it merged in, as I think improving error messages is a solid way to improve the UX of Terragrunt.

I'm seeing an error in one of our integration tests. If you could resolve that, I'd be happy to merge this in:

=== NAME  TestGcpWorksWithExistingBucket
    integration_gcp_test.go:72: [terragrunt apply -auto-approve --terragrunt-non-interactive --terragrunt-config /tmp/terragrunt-test1226453696/terragrunt.hcl --terragrunt-working-dir fixtures/gcs-byo-bucket/ --terragrunt-log-format=key-value]
    integration_gcp_test.go:72: Failed to run Terragrunt command 'terragrunt apply -auto-approve --terragrunt-non-interactive --terragrunt-config /tmp/terragrunt-test1226453696/terragrunt.hcl --terragrunt-working-dir fixtures/gcs-byo-bucket/' due to error: remote.MissingRequiredGCSRemoteStateConfig Missing required GCS remote state configuration project
        /home/circleci/project/remote/remote_state_gcs.go:288 (0x1789bdf)
        	ValidateGCSConfig: return errors.New(MissingRequiredGCSRemoteStateConfig("project"))
        /home/circleci/project/remote/remote_state_gcs.go:185 (0x1789298)
        	GCSInitializer.Initialize: if err := ValidateGCSConfig(gcsConfigExtended); err != nil {
        /home/circleci/project/remote/remote_state.go:97 (0x178790e)
        	(*RemoteState).Initialize: return initializer.Initialize(ctx, state, terragruntOptions)
        /home/circleci/project/cli/commands/terraform/action.go:512 (0x1bc1f12)
        	prepareInitCommand: if err := terragruntConfig.RemoteState.Initialize(ctx, terragruntOptions); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:306 (0x1bc07e5)
        	runTerragruntWithConfig: if err := prepareInitCommand(ctx, terragruntOptions, terragruntConfig); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:655 (0x1bc2925)
        	runTerraformInit: if err := runTerragruntWithConfig(ctx, originalTerragruntOptions, initOptions, terragruntConfig, new(Target)); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:591 (0x1bc24b9)
        	prepareNonInitCommand: if err := runTerraformInit(ctx, originalTerragruntOptions, terragruntOptions, terragruntConfig); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:310 (0x1bc0825)
        	runTerragruntWithConfig: if err := prepareNonInitCommand(ctx, originalTerragruntOptions, terragruntOptions, terragruntConfig); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:249 (0x1bbffee)
        	runTerraform.func4: return runTerragruntWithConfig(ctx, terragruntOptions, updatedTerragruntOptions, terragruntConfig, target)
        /home/circleci/project/options/options.go:953 (0x9861c3)
        	(*TerragruntOptions).RunWithErrorHandling: return operation()
        /home/circleci/project/cli/commands/terraform/action.go:248 (0x1bbfe79)
        	runTerraform: if err := terragruntOptions.RunWithErrorHandling(ctx, func() error {
        /home/circleci/project/cli/commands/terraform/action.go:83 (0x1bbeff2)
        	Run: return runTerraform(ctx, opts, new(Target))
        /home/circleci/project/cli/commands/terraform/command.go:47 (0x200f92a)
        	NewApp.NewCommand.Action.func4: return Run(ctx.Context, opts.OptionsFromContext(ctx))
        /home/circleci/project/cli/app.go:236 (0x2011647)
        	runAction.func2: return action(cliCtx)
        /home/circleci/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 (0x16b1330)
        	(*Group).Go.func1: if err := f(); err != nil {
        /usr/local/go/src/runtime/asm_amd64.s:1700 (0x47f621)
        	goexit: BYTE	$0x90	// NOP
        
        
        Stdout: (see log output above)
        
        Stderr: (see log output above)
    panic.go:629: Deleting test GCS bucket terragrunt-test-bucket-6jjqn0
--- FAIL: TestGcpWorksWithExistingBucket (2.27s)

@Excoriate
Copy link
Author

Hey @Excoriate ,

Thanks for submitting this pull request! I'm excited to get it merged in, as I think improving error messages is a solid way to improve the UX of Terragrunt.

I'm seeing an error in one of our integration tests. If you could resolve that, I'd be happy to merge this in:

=== NAME  TestGcpWorksWithExistingBucket
    integration_gcp_test.go:72: [terragrunt apply -auto-approve --terragrunt-non-interactive --terragrunt-config /tmp/terragrunt-test1226453696/terragrunt.hcl --terragrunt-working-dir fixtures/gcs-byo-bucket/ --terragrunt-log-format=key-value]
    integration_gcp_test.go:72: Failed to run Terragrunt command 'terragrunt apply -auto-approve --terragrunt-non-interactive --terragrunt-config /tmp/terragrunt-test1226453696/terragrunt.hcl --terragrunt-working-dir fixtures/gcs-byo-bucket/' due to error: remote.MissingRequiredGCSRemoteStateConfig Missing required GCS remote state configuration project
        /home/circleci/project/remote/remote_state_gcs.go:288 (0x1789bdf)
        	ValidateGCSConfig: return errors.New(MissingRequiredGCSRemoteStateConfig("project"))
        /home/circleci/project/remote/remote_state_gcs.go:185 (0x1789298)
        	GCSInitializer.Initialize: if err := ValidateGCSConfig(gcsConfigExtended); err != nil {
        /home/circleci/project/remote/remote_state.go:97 (0x178790e)
        	(*RemoteState).Initialize: return initializer.Initialize(ctx, state, terragruntOptions)
        /home/circleci/project/cli/commands/terraform/action.go:512 (0x1bc1f12)
        	prepareInitCommand: if err := terragruntConfig.RemoteState.Initialize(ctx, terragruntOptions); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:306 (0x1bc07e5)
        	runTerragruntWithConfig: if err := prepareInitCommand(ctx, terragruntOptions, terragruntConfig); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:655 (0x1bc2925)
        	runTerraformInit: if err := runTerragruntWithConfig(ctx, originalTerragruntOptions, initOptions, terragruntConfig, new(Target)); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:591 (0x1bc24b9)
        	prepareNonInitCommand: if err := runTerraformInit(ctx, originalTerragruntOptions, terragruntOptions, terragruntConfig); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:310 (0x1bc0825)
        	runTerragruntWithConfig: if err := prepareNonInitCommand(ctx, originalTerragruntOptions, terragruntOptions, terragruntConfig); err != nil {
        /home/circleci/project/cli/commands/terraform/action.go:249 (0x1bbffee)
        	runTerraform.func4: return runTerragruntWithConfig(ctx, terragruntOptions, updatedTerragruntOptions, terragruntConfig, target)
        /home/circleci/project/options/options.go:953 (0x9861c3)
        	(*TerragruntOptions).RunWithErrorHandling: return operation()
        /home/circleci/project/cli/commands/terraform/action.go:248 (0x1bbfe79)
        	runTerraform: if err := terragruntOptions.RunWithErrorHandling(ctx, func() error {
        /home/circleci/project/cli/commands/terraform/action.go:83 (0x1bbeff2)
        	Run: return runTerraform(ctx, opts, new(Target))
        /home/circleci/project/cli/commands/terraform/command.go:47 (0x200f92a)
        	NewApp.NewCommand.Action.func4: return Run(ctx.Context, opts.OptionsFromContext(ctx))
        /home/circleci/project/cli/app.go:236 (0x2011647)
        	runAction.func2: return action(cliCtx)
        /home/circleci/go/pkg/mod/golang.org/x/[email protected]/errgroup/errgroup.go:78 (0x16b1330)
        	(*Group).Go.func1: if err := f(); err != nil {
        /usr/local/go/src/runtime/asm_amd64.s:1700 (0x47f621)
        	goexit: BYTE	$0x90	// NOP
        
        
        Stdout: (see log output above)
        
        Stderr: (see log output above)
    panic.go:629: Deleting test GCS bucket terragrunt-test-bucket-6jjqn0
--- FAIL: TestGcpWorksWithExistingBucket (2.27s)

Thanks so much for taking the time @yhakbar — 
I sadly overlooked the integration tests. When I ran it, I noticed the error. So, I've made slight adjustments in the logic in the ValidateGCSConfig function.
Bucket Requirement:

  • A bucket is always mandatory
  • Bucket name must be specified in the configuration
    Project and Location Validation: (the core of my changes, and the actual root cause of this issue).

Scenario 1: When skip_bucket_creation is false (default):

  • Both project and location are required for new bucket creation.
  • If the bucket doesn't exist, these fields become mandatory

Scenario 2: When skip_bucket_creation is true:

  • Project and location checks are bypassed
  • Allows using existing buckets without full configuration

What I've added, and I considered relevant is the Bucket Existence Check: If project or location is missing, the function checks if the bucket already exists (in Google Cloud), this is due to existing buckets can have partial configuration (meaning, in the terragrunt.hcl file), but non-existent buckets require complete configuration.

After these changes, I've tested them by running:

go test ./remote -tags=gcp -run TestValidateGCSConfig -v
=== RUN   TestValidateGCSConfig
=== RUN   TestValidateGCSConfig/Valid_config_with_project_and_location
=== RUN   TestValidateGCSConfig/Valid_config_with_skip_bucket_creation
=== RUN   TestValidateGCSConfig/Missing_bucket
=== RUN   TestValidateGCSConfig/Existing_bucket_without_project_and_location_when_bucket_exists
=== RUN   TestValidateGCSConfig/Missing_project_when_bucket_exists
=== RUN   TestValidateGCSConfig/Missing_project_when_bucket_does_not_exist
=== RUN   TestValidateGCSConfig/Missing_location_when_bucket_does_not_exist
=== RUN   TestValidateGCSConfig/Existing_bucket_without_project_and_location_when_skip_bucket_creation_is_true
=== RUN   TestValidateGCSConfig/Missing_location_when_bucket_exists
--- PASS: TestValidateGCSConfig (0.00s)
    --- PASS: TestValidateGCSConfig/Valid_config_with_project_and_location (0.00s)
    --- PASS: TestValidateGCSConfig/Valid_config_with_skip_bucket_creation (0.00s)
    --- PASS: TestValidateGCSConfig/Missing_bucket (0.00s)
    --- PASS: TestValidateGCSConfig/Existing_bucket_without_project_and_location_when_bucket_exists (0.00s)
    --- PASS: TestValidateGCSConfig/Missing_project_when_bucket_exists (0.00s)
    --- PASS: TestValidateGCSConfig/Missing_project_when_bucket_does_not_exist (0.00s)
    --- PASS: TestValidateGCSConfig/Missing_location_when_bucket_does_not_exist (0.00s)
    --- PASS: TestValidateGCSConfig/Existing_bucket_without_project_and_location_when_skip_bucket_creation_is_true (0.00s)
    --- PASS: TestValidateGCSConfig/Missing_location_when_bucket_exists (0.00s)
PASS

Also —thanks again for the great catch 😅 — I run the integration tests that are linked to GCS (let me know if I'm missing something else).

export GOOGLE_CLOUD_PROJECT=mysuperprojectingcp && go test -tags=gcp ./test -run TestGcpWorksWithExistingBucket -v

And also passes!

@yhakbar
Copy link
Collaborator

yhakbar commented Jan 15, 2025

@Excoriate

We're very close! Please take a look at this lint error and address when convenient:

INFO [runner] linters took 3m50.829585485s with stages: goanalysis_metalinter: 3m50.725731215s 
remote/remote_state_gcs.go:300:24: Error return value of `gcsClient.Close` is not checked (errcheck)
                defer gcsClient.Close()
                                     ^
remote/remote_state_gcs.go:298:55: non-wrapping format verb for fmt.Errorf. Use `%w` to format errors (errorlint)
                        return fmt.Errorf("error creating GCS client: %v", err)
                                                                           ^
remote/remote_state_gcs.go:257:1: ST1020: comment on exported function ParseExtendedGCSConfig should be of the form "ParseExtendedGCSConfig ..." (stylecheck)
// Parse the given map into a GCS config
^

@yhakbar
Copy link
Collaborator

yhakbar commented Jan 15, 2025

Also, @Excoriate , the logic that you're using here isn't optimal.

A lot of what Terragrunt does is handle concurrent behavior, and overriding global anonymous functions in tests for mocking can result in problems for concurrent testing.

If possible, take one of these two approaches in order of preference:

  1. Avoid doing any mocking, and instead handle all the business logic for deciding what to do with a GCP API response outside of the API call. e.g. instead of mocking DoesGCSBucketExist, instead create a function that receives the actual response from GCP (e.g. ValidateGCSBucketExistsResponse or something better named), and pass in a mock value for that so that nothing has to be overridden.
  2. Use dependency injection to pass in something that can be mocked without overriding any global variables.

If this is too complicated to worry about now, feel free to ignore this. We can address it later, assuming tests pass and the lint errors are resolved.

@Excoriate
Copy link
Author

Excoriate commented Jan 17, 2025

remote/remote_state_gcs.go:298:55: non-wrapping format verb for fmt.Errorf. Use %w to format errors (errorlint)
return fmt.Errorf("error creating GCS client: %v", err)

Oh! I see 😄 , thanks a lot!
I'm a bit confused. There are two configurations for Golangci in the Terragrunt repo, the .golangci.yaml and the .strict.golangci.yml.

When I run

golangci-lint run --config=.strict.golangci.yml ./remote/remote_state_gcs.go ./remote/remote_state_gcs_test.go

I obtained different linter errors (few of them aren't caused by my changes) than the ones highlighted by your comment:

golangci-lint run --config=.strict.golangci.yml ./remote/remote_state_gcs.go ./remote/remote_state_gcs_test.go
remote/remote_state_gcs.go:93:68: undefined: RemoteState (typecheck)
func (initializer GCSInitializer) NeedsInitialization(remoteState *RemoteState, existingBackend *TerraformBackend, terragruntOptions *options.TerragruntOptions) (bool, error) {
                                                                   ^
remote/remote_state_gcs.go:131:75: undefined: TerraformBackend (typecheck)
func GCSConfigValuesEqual(config map[string]interface{}, existingBackend *TerraformBackend, terragruntOptions *options.TerragruntOptions) bool {
                                                                          ^
remote/remote_state_gcs.go:179:80: undefined: RemoteState (typecheck)
func (initializer GCSInitializer) Initialize(ctx context.Context, remoteState *RemoteState, terragruntOptions *options.TerragruntOptions) error {
                                                                               ^
remote/remote_state_gcs.go:164:6: undefined: terraformStateConfigEqual (typecheck)
        if !terraformStateConfigEqual(existingBackend.Config, comparisonConfig) {
            ^
remote/remote_state_gcs.go:192:25: undefined: initializedRemoteStateCache (typecheck)
        if initialized, hit := initializedRemoteStateCache.Get(ctx, cacheKey); initialized && hit {
                               ^
remote/remote_state_gcs.go:198:9: undefined: stateAccessLock (typecheck)
        return stateAccessLock.StateBucketUpdate(gcsConfig.Bucket, func() error {
               ^
remote/remote_state_gcs.go:200:26: undefined: initializedRemoteStateCache (typecheck)
                if initialized, hit := initializedRemoteStateCache.Get(ctx, cacheKey); initialized && hit {
                                       ^
remote/remote_state_gcs.go:225:3: undefined: initializedRemoteStateCache (typecheck)
                initializedRemoteStateCache.Put(ctx, cacheKey, true)
                ^
remote/remote_state_gcs.go:344:11: undefined: BucketCreationNotAllowed (typecheck)
                        return BucketCreationNotAllowed(config.remoteStateConfigGCS.Bucket)
                               ^
remote/remote_state_gcs.go:488:20: undefined: MaxRetriesWaitingForS3BucketExceeded (typecheck)
        return errors.New(MaxRetriesWaitingForS3BucketExceeded(config.Bucket))
                          ^

My questions are:

  1. What GolangCI linter should I use, to recreate the issues and fix them locally? — I can't recreate (or even see) what's the golangci-lint command that runs in CI, since it runs in CircleCI where I don't have access or visibility for that matter.

NOTE: I was able to recreate the issues reported running make run-strict-lint which means. I assume it's the one that runs in CI. Correct me if I'm wrong @yhakbar

  1. What GolangCI lint configuration should I use, to ensure that locally I'm able to fix any linter issues prior to submitting a pull-request? — Running the default configuration spots several linter issues across the codebase. :/

The changes in this commit add validation for the `project` and `location` fields in the GCS remote state configuration. If the `skip_bucket_creation` option is not set, the `project` and `location` fields are required. This ensures that the necessary information is provided for creating the GCS bucket if it doesn't already exist.
The changes in this commit focus on adding important considerations for using the GCS remote state backend in Terragrunt. The key changes are:

1. Clarify the requirements for creating a new GCS bucket, including the need for `project` and `location` configuration options.
2. Explain that Terragrunt will automatically create the bucket if it doesn't exist, and that versioning is enabled by default unless `skip_bucket_versioning` is set.
3. Highlight that Terragrunt validates the presence of `bucket`, `project`, and `location` when creating a new bucket, but these are not strictly required if the bucket already exists.
…project and location

This change adds support for using an existing GCS bucket for remote state storage, even if the `project` and `location` are not specified in the configuration.

The key changes are:

1. Modify the `ValidateGCSConfig` function to check if the bucket exists when `project` or `location` are not specified, and only require them if the bucket does not exist.
2. Add two new test cases to cover the scenarios where the bucket exists or does not exist without `project` and `location`.
3. Mock the `CreateGCSClient` and `DoesGCSBucketExist` functions in the tests to simulate the bucket existence scenarios.

This change allows users to use an existing GCS bucket for remote state storage without having to specify the `project` and `location` in the configuration, making it more flexible and easier to use.
…tion

This change enhances the validation logic for GCS remote state configuration:

- Simplifies the test cases by using a map-based approach instead of a slice.
- Introduces a mock function to simulate the existence of the GCS bucket, making the tests more robust.
- Clarifies the validation logic by separating the cases where the bucket creation is skipped and when it is not.
- Ensures that the bucket is always a required configuration parameter, regardless of the `skip_bucket_creation` setting.
- Requires both the project and location to be specified when the bucket creation is not skipped, as these are the minimum required information to create a new bucket.
- Adds a comment to indicate that the configuration is valid at the end of the validation process.
- Wrap the error returned by `CreateGCSClient` to provide more
  context.
- Wrap the error returned by `CreateGCSBucket` to provide more
  context.
- Add a deferred function to close the GCS client, and log any
  errors that occur during the close operation.
- Rename the `parseGCSConfig` function to `ParseExtendedGCSConfig`
  to better reflect its purpose.
…r issues

The changes in this commit simplify the validation of the GCS remote state configuration. The main improvements are:

- If both `project` and `location` are provided, the configuration is considered valid and no further checks are needed.
- If the bucket doesn't exist, the validation now requires both `project` and `location` to be set, instead of checking them separately.
- The code for creating and closing the GCS client has been moved outside the conditional blocks, reducing duplication.

These changes make the validation logic more straightforward and easier to understand.
@Excoriate Excoriate force-pushed the fix-confusing-error-message-gcp branch from d9decd4 to 86f18d0 Compare January 17, 2025 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Confusing error message when missing 'location' and 'project'
2 participants