Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Removal of caching for services #394

Merged
merged 3 commits into from
Sep 14, 2023
Merged

Conversation

erikfuller
Copy link
Contributor

What type of PR is this?
cleanup

Which issue does this PR fix:
n/a

What does this PR do / Why do we need it:
This is the second refactor to ultimately eliminate the LatticeDataStore. This PR eliminates caching for services. Prev PR: #391

This PR eliminates use of the data store for service operations and instead calls the VPC Lattice API directly. This increases call volume to the ListServices API, but centralizes this logic in a way that should be more reliably cacheable in future.

Additional changes include minor refactoring for readability, typos, or consistency in naming. Also, I had some follow up actions from the previous PR addressed here. These include:

  1. Returning an error when resource not found by name
  2. Using auto-paginated SDK methods (List*PagesWithContext)
  3. Removed unused cleanup logic from e2e tests

Testing done on this change:
Automated tests (unit/integration).

make test

go test ./pkg/... ./controllers/... -coverprofile coverage.out
?   	github.com/aws/aws-application-networking-k8s/pkg/apis/applicationnetworking/v1alpha1	[no test files]
ok  	github.com/aws/aws-application-networking-k8s/pkg/aws	0.503s	coverage: 24.2% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/aws/services	0.808s	coverage: 5.8% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/config	0.898s	coverage: 32.9% of statements
?   	github.com/aws/aws-application-networking-k8s/pkg/k8s	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/model/lattice	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/runtime	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils/gwlog	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils/retry	[no test files]
?   	github.com/aws/aws-application-networking-k8s/pkg/utils/ttime	[no test files]
ok  	github.com/aws/aws-application-networking-k8s/pkg/deploy	0.872s	coverage: 20.6% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/deploy/externaldns	1.367s	coverage: 77.1% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/deploy/lattice	2.268s	coverage: 85.4% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/gateway	2.794s	coverage: 74.9% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/latticestore	1.053s	coverage: 64.1% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/model/core	2.963s	coverage: 61.7% of statements
ok  	github.com/aws/aws-application-networking-k8s/pkg/model/core/graph	2.903s	coverage: 17.2% of statements
ok  	github.com/aws/aws-application-networking-k8s/controllers	0.772s	coverage: 0.0% of statements
ok  	github.com/aws/aws-application-networking-k8s/controllers/eventhandlers	1.111s	coverage: 54.8% of statements

e2e tests

Ran 31 of 31 Specs in 2050.026 seconds
SUCCESS! -- 31 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (2050.59s)

Will this PR introduce any new dependencies?:
No

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
n/a

Does this PR introduce any user-facing change?:
No

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Comment on lines -257 to -259
if snInfo == nil {
return fmt.Errorf("Service network %s for account %s not found", gw.Name, config.AccountID)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the change on FindX logic to raise a "not found" error, these additional nil checks on the return value are no longer required.

Comment on lines +30 to +33
type NotFoundError struct {
ResourceType string
Name string
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting something that looked and behaved sensibly for error handling and identifying errors of this type took me a while. For folks more familiar with Go please feel free to suggest an alternate approach if you think there's a better option.

Copy link
Contributor

@mikhail-aws mikhail-aws Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std lib style https://pkg.go.dev/errors

// Error and constructor
var ErrNotFound = errors.New("resource not found")

func NewNotFound(name, restype string) {
  return fmt.Errorf("name: %s, type: %s, err: %w", name, restype, ErrNotFound)
}

// In lattice interface
func FindResource(res Req) (Resp, error) {
  return nil, NewNotFound(res.Name, res.Type)
}

// Handle error
resp, err := lattice.FindResource(ctx, req)
if err != nil {
  if errors.Is(err, ErrNotFound) {
    // handle error
  } else {
    ...
  }
}

@coveralls
Copy link

coveralls commented Sep 13, 2023

Pull Request Test Coverage Report for Build 6175321314

  • 168 of 235 (71.49%) changed or added relevant lines in 15 files are covered.
  • 89 unchanged lines in 5 files lost coverage.
  • Overall coverage decreased (-0.4%) to 38.664%

Changes Missing Coverage Covered Lines Changed/Added Lines %
controllers/gateway_controller.go 0 1 0.0%
pkg/deploy/stack_deployer.go 0 1 0.0%
pkg/deploy/lattice/listener_synthesizer.go 8 10 80.0%
pkg/deploy/lattice/rule_synthesizer.go 2 4 50.0%
pkg/deploy/lattice/rule_manager.go 12 15 80.0%
pkg/deploy/lattice/listener_manager.go 26 31 83.87%
pkg/deploy/lattice/service_manager.go 3 8 37.5%
pkg/deploy/lattice/service_network_manager.go 13 18 72.22%
pkg/aws/services/vpclattice_mocks.go 0 11 0.0%
pkg/aws/services/vpclattice.go 77 89 86.52%
Files with Coverage Reduction New Missed Lines %
controllers/route_controller.go 1 0.0%
pkg/deploy/lattice/listener_synthesizer.go 1 83.53%
pkg/deploy/lattice/rule_synthesizer.go 1 75.7%
pkg/aws/services/vpclattice.go 8 78.7%
pkg/aws/services/vpclattice_mocks.go 78 4.16%
Totals Coverage Status
Change from base Build 6163918423: -0.4%
Covered Lines: 3948
Relevant Lines: 10211

💛 - Coveralls

Comment on lines +84 to +89
err := d.ListServiceNetworksPagesWithContext(ctx, input, func(page *vpclattice.ListServiceNetworksOutput, lastPage bool) bool {
for _, sn := range page.Items {
result = append(result, sn)
}
input.NextToken = resp.NextToken
resp, err = d.ListServiceNetworksWithContext(ctx, input)
return true
})
Copy link
Contributor Author

@erikfuller erikfuller Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've updated all the methods here to use automatic pagination as per @mikhail-aws suggestion in previous PR. The way these methods work is you provide a function for them to call for each page found and you return true if you want the call to keep fetching pages.

This makes mocking a little more complicated but it's not too hard to follow.

Copy link
Contributor

@mikhail-aws mikhail-aws Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compact version:

 7 func (d *defaultLattice) ListServiceNetworksAsList(ctx context.Context, input *vpclattice.ListServiceNetworksInput) ([]*vpclattice.ServiceNetworkSummary, error) {
  8         result := []*vpclattice.ServiceNetworkSummary{}
  9         err := d.ListServiceNetworksPagesWithContext(ctx, input, func(page *vpclattice.ListServiceNetworksOutput, lastPage bool) bool {
 10                 result = append(result, page.Items...)
                      return true
 11         })
 12         return result, err
 13 }

Generic get all pages function. Since every page is a struct, getting Items from page can be done through reflection, or explicitly written as getPageItems, that return items from page.

 func ListAll[REQ any, ITEM any, PAGE any](
         ctx context.Context,    req REQ,
         listPages func(context.Context, REQ, func(PAGE, bool) bool, ...request.Option) error,
         getPageItems func(PAGE) []ITEM)([]ITEM, error) {
                 out := []ITEM{}
                 err := listPages(ctx, req, func(page PAGE, _ bool) bool {
                         out = append(out, getPageItems(page)...)
                         return true
                 })
                 return out, err
         }

 func (l *Lattice) ListServiceNetworksAsList(ctx context.Context, in *vpclattice.ListServiceNetworksInput) ([]*vpclattice.ServiceNetworkSummary, error) {
         return ListAll(
                 ctx, in, l.c.ListServiceNetworksPagesWithContext,
                 func(page *vpclattice.ListServiceNetworksOutput) []*vpclattice.ServiceNetworkSummary {
                         return page.Items
                 })
 }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any value in unit-tests with mocks for these Lattice wrapper functions. As you mentioned, it's painful and kind of self-fulfilling prophecy.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious what the benefit of ListServiceNetworksPagesWithContext(ctx, input, fn) compared to the old NextToken way? (Still learning this part)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm not convinced it's that big a win. The logic is a little harder to wrap your head around, but it means we can't get into an infinite loop or accidently omit pages by mishandling NextToken.

Comment on lines -209 to -210
// if we still want this method, we should tag the things we create with the test suite
func (env *Framework) CleanTestEnvironment(ctx context.Context) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the "best" way to clean up the test environment is via the new e2e-clean target. Tracking these resources as side-effects of various creation calls seems unreliable, similar to the reasons we're eliminating the LatticeDataStore cache.

Given we're not using the cleanup methods today, I think removing them is better for now than continuing to maintain this unused code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea, do you want to remove cleanup after each test and do only one at the very end of all tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing resources as each test finishes is what it does today. We don't call anything at the end of all tests. These Clean* methods are currently unused.

Copy link
Contributor

@mikhail-aws mikhail-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt have a chance review everything. Left minor comments.

Comment on lines +30 to +33
type NotFoundError struct {
ResourceType string
Name string
}
Copy link
Contributor

@mikhail-aws mikhail-aws Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std lib style https://pkg.go.dev/errors

// Error and constructor
var ErrNotFound = errors.New("resource not found")

func NewNotFound(name, restype string) {
  return fmt.Errorf("name: %s, type: %s, err: %w", name, restype, ErrNotFound)
}

// In lattice interface
func FindResource(res Req) (Resp, error) {
  return nil, NewNotFound(res.Name, res.Type)
}

// Handle error
resp, err := lattice.FindResource(ctx, req)
if err != nil {
  if errors.Is(err, ErrNotFound) {
    // handle error
  } else {
    ...
  }
}

Comment on lines +84 to +89
err := d.ListServiceNetworksPagesWithContext(ctx, input, func(page *vpclattice.ListServiceNetworksOutput, lastPage bool) bool {
for _, sn := range page.Items {
result = append(result, sn)
}
input.NextToken = resp.NextToken
resp, err = d.ListServiceNetworksWithContext(ctx, input)
return true
})
Copy link
Contributor

@mikhail-aws mikhail-aws Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compact version:

 7 func (d *defaultLattice) ListServiceNetworksAsList(ctx context.Context, input *vpclattice.ListServiceNetworksInput) ([]*vpclattice.ServiceNetworkSummary, error) {
  8         result := []*vpclattice.ServiceNetworkSummary{}
  9         err := d.ListServiceNetworksPagesWithContext(ctx, input, func(page *vpclattice.ListServiceNetworksOutput, lastPage bool) bool {
 10                 result = append(result, page.Items...)
                      return true
 11         })
 12         return result, err
 13 }

Generic get all pages function. Since every page is a struct, getting Items from page can be done through reflection, or explicitly written as getPageItems, that return items from page.

 func ListAll[REQ any, ITEM any, PAGE any](
         ctx context.Context,    req REQ,
         listPages func(context.Context, REQ, func(PAGE, bool) bool, ...request.Option) error,
         getPageItems func(PAGE) []ITEM)([]ITEM, error) {
                 out := []ITEM{}
                 err := listPages(ctx, req, func(page PAGE, _ bool) bool {
                         out = append(out, getPageItems(page)...)
                         return true
                 })
                 return out, err
         }

 func (l *Lattice) ListServiceNetworksAsList(ctx context.Context, in *vpclattice.ListServiceNetworksInput) ([]*vpclattice.ServiceNetworkSummary, error) {
         return ListAll(
                 ctx, in, l.c.ListServiceNetworksPagesWithContext,
                 func(page *vpclattice.ListServiceNetworksOutput) []*vpclattice.ServiceNetworkSummary {
                         return page.Items
                 })
 }

Comment on lines +84 to +89
err := d.ListServiceNetworksPagesWithContext(ctx, input, func(page *vpclattice.ListServiceNetworksOutput, lastPage bool) bool {
for _, sn := range page.Items {
result = append(result, sn)
}
input.NextToken = resp.NextToken
resp, err = d.ListServiceNetworksWithContext(ctx, input)
return true
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont see any value in unit-tests with mocks for these Lattice wrapper functions. As you mentioned, it's painful and kind of self-fulfilling prophecy.

Comment on lines +84 to +89
err := d.ListServiceNetworksPagesWithContext(ctx, input, func(page *vpclattice.ListServiceNetworksOutput, lastPage bool) bool {
for _, sn := range page.Items {
result = append(result, sn)
}
input.NextToken = resp.NextToken
resp, err = d.ListServiceNetworksWithContext(ctx, input)
return true
})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am curious what the benefit of ListServiceNetworksPagesWithContext(ctx, input, fn) compared to the old NextToken way? (Still learning this part)

pkg/deploy/lattice/listener_manager.go Show resolved Hide resolved
Copy link
Contributor

@zijun726911 zijun726911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM, both current IsNotFoundError() and the error handling way the @mikhail-aws mentions are ok for me


func (e *NotFoundError) Error() string {
return fmt.Sprintf("%s %s not found", e.ResourceType, e.Name)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to change to:

return fmt.Sprintf("Lattice %s %s not found", e.ResourceType, e.Name)

So that log can show "Lattice service xxxxxx not found" and "Lattice service network xxxxxx not found"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update this for next PR


var svcMatch *vpclattice.ServiceSummary
err := d.ListServicesPagesWithContext(ctx, &input, func(page *vpclattice.ListServicesOutput, lastPage bool) bool {
for _, svc := range page.Items {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FindService() also need to check:

acctIdMatches, err1 := accountIdMatches(optionalAccountId, *r.Arn)

if !acctIdMatches {
    continue
}

in case in the future we support RAM share lattice Service in the controller and one current account Service and one foreign Service have the same name. (check accountId at least not harmful?)

@erikfuller erikfuller merged commit 868b8c0 into aws:main Sep 14, 2023
@erikfuller erikfuller deleted the svc-cache-refactor branch September 14, 2023 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants