Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Creating Access Log Policies #430

Merged
merged 6 commits into from
Oct 13, 2023
Merged

Conversation

xWink
Copy link
Member

@xWink xWink commented Oct 11, 2023

What type of PR is this?

Feature

Which issue does this PR fix:
#200 partially

What does this PR do / Why do we need it:

  • Adds controller support for creating Access Log Policies
    • When an Access Log Policy is created with a Gateway targetRef, a VPC Lattice Access Log Subscription for the Gateway's corresponding Service Network is created
    • When an Access Log Policy is created with an HTTPRoute or GRPCRoute targetRef, a VPC Lattice Access Log Subscription for the Route's corresponding Service is created
    • The Access Log Policy's status is updated according to the result of the creation attempt
      • If creation succeeds, the Access Log Policy status is set to Accepted
      • If the Access Log Subscription destination does not exist, the Policy status is set to Invalid
      • If the targetRef's Group is not gateway.networking.k8s.io, the Policy status is set to Invalid
      • If the targetRef's Kind is not Gateway, HTTPRoute, or GRPCRoute, the Policy status is set to Invalid
      • If the targetRef already has an Access Log Policy for the same destination type, the Policy status is set to Conflicted
      • If the VPC Lattice resource (Service or Service Network) corresponding to the targetRef does not exist, the controller will enter a reconciliation loop until the resource exists or the Policy is updated/deleted
      • If the targetRef itself does not exist, the status of the Access Log Policy is set to TargetNotFound
  • Added a suite of unit and e2e tests for the new controller, model builder, synthesizer, and manager logic
    • The new e2e test suite consists of 8 tests and adds about 10% to the total e2e test run time

Note: There was discussion about adding annotations to Access Log Policies in this thread. I intend to add this logic in the upcoming Delete logic, as this PR is large enough as is and the Delete logic will be able to make use of the annotations.

If an issue # is not available please add repro steps and logs from aws-gateway-controller showing the issue:

Testing done on this change:

Ran 41 of 41 Specs in 3060.228 seconds
SUCCESS! -- 41 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (3061.28s)
PASS
ok      github.com/aws/aws-application-networking-k8s/test/suites/integration   3061.965s

Automation added to e2e:

Will this PR introduce any new dependencies?:

E2E tests now rely on:

  • S3 API (need to create/delete 1 Bucket)
  • CloudWatch Logs API (need to create/delete 1 Log Group)
  • Firehose API (need to create/delete 1 Delivery Stream)
  • IAM API (need to create/delete 1 Role and attach an inline policy to it)

Will this break upgrades or downgrades. Has updating a running cluster been tested?:
No

Does this PR introduce any user-facing change?:

Yes, but this isn't ready for release yet since you can't update or delete these policies. Release note will be added as part of the final PR of this feature.


By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

}

func (t *accessLogSubscriptionModelBuildTask) run(ctx context.Context) error {
sourceType := model.ServiceSourceType
Copy link
Contributor

@zijun726911 zijun726911 Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One general comment, since we are doing a brand new e2e code path for accessLogPolicy, can we don't follow the old side-effect style code? I think old style is a little bit hard to trace the data flow (tho we already got used to it)

Can we try to use stateless functions as much as we can?

for example:
accessLogSubscriptionModelBuildTask could don't have member variables stack, accessLogPolicy , accessLogSubscription. Instead, use function input params and return vaule to represent them:
func (t *accessLogSubscriptionModelBuildTask) run(ctx context.Context, accessLogPolicy *anv1alpha1.AccessLogPolicy ) *model.AccessLogSubscription, error
(And I think either stack or accessLogSubscription just need one, don't need both)

Another example is (s *accessLogSubscriptionSynthesizer) Synthesize(), can we removed member variable stack of accessLogSubscriptionSynthesizer and put it in the input parameter?

func (s *accessLogSubscriptionSynthesizer) Synthesize(ctx context.Context, accessLogSubscription *model.AccessLogSubscription) error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree a functional code path is ideal, I think we are at a point now where deviation from the existing style would require a much more substantial refactor of the codebase. I would rather keep things consistent and have predictable side-effects, than have multiple different controllers with wildly different styles of code

Perhaps we should create a refactoring ticket to migrate over if the existing code style is causing churn?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, make sense to do code style refactoring in a seperate PR, I can create a issue for it

}, nil
}

func getServiceNetworkWithName(ctx context.Context, vpcLatticeSess services.Lattice, name string) (*vpclattice.ServiceNetworkSummary, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does FindServiceNetwork() function do the same thing with this function? can you reuse it?

func (d *defaultLattice) FindServiceNetwork(ctx context.Context, name string, optionalAccountId string) (*ServiceNetworkInfo, error) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, wasn't aware it was an option! Will fix next commit

}

func getServiceWithName(ctx context.Context, vpcLatticeSess services.Lattice, name string) (*vpclattice.ServiceSummary, error) {
serviceSummaries, err := vpcLatticeSess.ListServicesAsList(ctx, &vpclattice.ListServicesInput{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a FindService() function, same with that one?

func (d *defaultLattice) FindService(ctx context.Context, nameProvider LatticeServiceNameProvider) (*vpclattice.ServiceSummary, error) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above!

vpcLatticeSess := m.cloud.Lattice()

var resourceIdentifier string
if accessLogSubscription.Spec.SourceType == lattice.ServiceNetworkSourceType {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can change to a switch case here?

switch accessLogSubscription.Spec.SourceType {
         case lattice.ServiceNetworkSourceType:
               .......
	case lattice.ServiceSourceType:
	
                .......
	default:
		return nil, services.NewInvalidError("Invalid source type")
	}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done next commit!
Note that I'll be using fmt.Errorf rather than NewInvalidError. This would be a case where the controller is improperly handling something, rather than the user input being invalid.

@@ -133,16 +140,24 @@ func (r *accessLogPolicyReconciler) reconcileUpsert(ctx context.Context, alp *an
alp.Spec.TargetRef.Kind, alp.Spec.TargetRef.Name)
err := r.updateAccessLogPolicyStatus(ctx, alp, gwv1alpha2.PolicyReasonTargetNotFound)
if err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems we have this behavior: If the targetRef does not exist, the controller will enter a reconciliation loop until the targetRef exists or the Policy is updated/deleted

One thing I am not sure is we return retryErr==nil with policy status Accepted==false with reason TargetNotFound, does the controller can keep the reconciliation loop?

Copy link
Member Author

@xWink xWink Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, I worded this poorly. Corrected it to

  • If the VPC Lattice resource (Service or Service Network) corresponding to the targetRef does not exist, the controller will enter a reconciliation loop until the resource exists or the Policy is updated/deleted
  • If the targetRef itself does not exist, the status of the Access Log Policy is set to TargetNotFound

Copy link
Contributor

@zijun726911 zijun726911 Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for clarifying!

For the behavior: If the targetRef itself does not exist, the status of the Access Log Policy is set to TargetNotFound

If the AccessLogPolicy k8s resource is created first, then later, corresponding targetRefObj (Gateway or HTTPRoute or GRPCRoute) is created, do we have a sort of gwEventHandler.MapToAccessLogPolicy() (enqueueImpactedAccessLogPolicy()) to make the controller do the (r *accessLogPolicyReconciler) reconcile() again? Assume you will do it in a separate task [EKS][Access Logs] Handle Gateway and Route ?

(I actually think that behavior: If the targetRef k8s obj does not exist, the controller will enter a reconciliation loop until the targetRef exists or the Policy is updated/deleted is good and can keep thing simple...)

Copy link
Member Author

@xWink xWink Oct 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the case of reverse order creation, we will need an event handler for the other resources to deal with it. The use of TargetRefNotFound is standard practice for Policy CRDs according to Gateway API docs https://gateway-api.sigs.k8s.io/geps/gep-713/#on-policy-objects

The recon loop will handle the case of when the targetRef was created first but the controller didn't finish creating VPC Lattice resources

@zijun726911
Copy link
Contributor

zijun726911 commented Oct 11, 2023

Do you have a validation logic to reject (fast fail) AccessLogPolicy that it's targetRef.Kind != Gateway or HTTPRoute or GRPCRoute (also need to check targetRef.Group)?

Similar to that one:

if targetRef.Group != expectedGroup || targetRef.Kind != expectedKind {

(also, any chance to re-use policyToTargetRefObj() function? don't need to do it if its too complex or not fit your use case)

@xWink
Copy link
Member Author

xWink commented Oct 11, 2023

Do you have a validation logic to reject (fast fail) AccessLogPolicy that it's targetRef.Kind != Gateway or HTTPRoute or GRPCRoute (also need to check targetRef.Group)?

Similar to that one:

if targetRef.Group != expectedGroup || targetRef.Kind != expectedKind {

(also, any chance to re-use policyToTargetRefObj() function? don't need to do it if its too complex or not fit your use case)

Good callout, I'll add logic to mark the status of the resource as Invalid if the Kind or Group is unsupported next commit.

And I haven't seen a need for the policyToTargetRefObj function as of yet

@@ -26,14 +26,18 @@ import (
"k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/tools/record"
ctrl "sigs.k8s.io/controller-runtime"
builder2 "sigs.k8s.io/controller-runtime/pkg/builder"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be like pkg_builder considering recent naming scheme guidance

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will fix next commit!

gateway := &gwv1beta1.Gateway{}
err = r.client.Get(ctx, targetRefNamespacedName, gateway)
gw := &gwv1beta1.Gateway{}
err = r.client.Get(ctx, targetRefNamespacedName, gw)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We assume here that error is "not found" only, is that true?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, there are other errors that could come up here. Fixed next commit

Comment on lines 217 to 219
r.log.Debugw("successfully deployed model",
"stack", stack.StackID().Name+":"+stack.StackID().Namespace,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

misuse of "w" logger, should be "f"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed next commit

controllers/accesslogpolicy_controller.go Show resolved Hide resolved

func IsConflictError(err error) bool {
conflictErr := &ConflictError{}
return errors.As(err, &conflictErr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

errors.Is

Copy link
Member Author

@xWink xWink Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, this should be errors.As, since we care about the type of the error, rather than the specific contents of the error. If I simply change this function to

func IsNotFoundError(err error) bool {
	nfErr := NotFoundError{}
	return errors.Is(err, &nfErr)
}

then tests start failing because the contents of the error don't match.

Copy link
Contributor

@mikhail-aws mikhail-aws Oct 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 approaches with std errors package.

  1. Create a sentinel error, that can be asserted by "Is" and can be used when you dont need to access details for error handling. For example oserrors: https://cs.opensource.google/go/go/+/refs/tags/go1.21.3:src/internal/oserror/errors.go
var NotFoundError = errors.New("resource not found")

func NewNotFoundError(name string) error {
  return fmt.Errorf("%s: %w", name, NotFoundError)
}

// error check now as simple as 
err := NewNotFoundError("lattice")
if errors.Is(err, NotFoundError) { 
  println(err) // will print name and error
}
  1. You need error details when handling error. Then you create struct and implement error interface, but then usage changes too.
type NotFoundError sturct {
  name string
}

func (e *NotFoundError) Error() string {
	return fmt.Sprintf("%s: resource not found", e.name)
}


err := &NotFoundError{"lattice"}

// now you handle error with passing reference
notFoundErr := &NotFoundError
if errors.As(err, notFoundErr) {
  println("can use name now: " + notFoundErr.name)
}

I see that you use second approach but intention is first approach.

log gwlog.Logger
k8sClient client.Client
stack core.Stack
accessLogSubscriptionManager lattice.AccessLogSubscriptionManager
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there is single manager name can be simplified to manager, I dont see there will be multiple managers here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough, fixed next commit

Comment on lines 62 to 74
/*
* For Service Network, the name is just the Gateway's name.
* For Service, the name is Route's name, followed by hyphen (-), then the Route's namespace.
*/
sourceName := string(t.accessLogPolicy.Spec.TargetRef.Name)
if sourceType == model.ServiceSourceType {
namespace := t.accessLogPolicy.Spec.TargetRef.Namespace
if namespace != nil {
sourceName = fmt.Sprintf("%s-%s", sourceName, string(*namespace))
} else {
sourceName = fmt.Sprintf("%s-default", sourceName)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few questions:

  1. When namespace is nil? why it's should be possible
  2. Naming rules for Gateway and non-Gateway seems common problem, I dont think we should handle it here. There should be single place for this logic in the code base. I have a sense we have something like this already in other places.

Copy link
Member Author

@xWink xWink Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Namespace can be nil because it's not required as part of the targetRef spec. If not provided, it should default to the Policy's namespace, rather than "default". Fixed this next commit.
  2. I found utils.LatticeServiceName, and will move this logic over to the utils package as well, since it will come in handy for other Policy CRDs

"github.com/aws/aws-sdk-go/aws"
"github.com/golang/mock/gomock"
"github.com/stretchr/testify/assert"
v1 "k8s.io/apimachinery/pkg/apis/meta/v1"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apimachineryv1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed next commit, thank you!

"k8s.io/apimachinery/pkg/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
testclient "sigs.k8s.io/controller-runtime/pkg/client/fake"
"sigs.k8s.io/gateway-api/apis/v1alpha2"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gwv1alpha2

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed next commit, thank you!

Status: nil,
}

stack.AddResource(als)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be here, please move stack mutation above

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed next commit

@xWink
Copy link
Member Author

xWink commented Oct 12, 2023

Will also update the message field in the Policy's Status to include useful information during the error cases next commit

…lidation for AccessLogPolicy targetRef, changed default namespace to Policy namespace in targetRer, general code improvements
@xWink
Copy link
Member Author

xWink commented Oct 12, 2023

Reran e2e tests with new changes

Ran 43 of 43 Specs in 2967.092 seconds
SUCCESS! -- 43 Passed | 0 Failed | 0 Pending | 0 Skipped
--- PASS: TestIntegration (2968.06s)
PASS
ok      github.com/aws/aws-application-networking-k8s/test/suites/integration   2968.977s

pkg/model/lattice/accesslogsubscription.go Show resolved Hide resolved
name string
}

func NewLatticeServiceNameProvider(name string) LatticeServiceNameProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change name to NewDefaultLatticeServiceNameProvider? since LatticeServiceNameProvider is a function name.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed next commit

@coveralls
Copy link

coveralls commented Oct 12, 2023

Pull Request Test Coverage Report for Build 6512529140

  • 125 of 258 (48.45%) changed or added relevant lines in 7 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.2%) to 45.361%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/gateway/model_build_access_log_subscription.go 36 40 90.0%
pkg/deploy/lattice/access_log_subscription_manager.go 44 50 88.0%
pkg/deploy/lattice/access_log_subscription_synthesizer.go 25 35 71.43%
pkg/deploy/stack_deployer.go 1 19 5.26%
pkg/aws/services/vpclattice.go 0 28 0.0%
controllers/accesslogpolicy_controller.go 0 67 0.0%
Totals Coverage Status
Change from base Build 6500187347: 0.2%
Covered Lines: 3950
Relevant Lines: 8708

💛 - Coveralls

Copy link
Contributor

@zijun726911 zijun726911 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM, looking forward to the rest of Access Log code.

Do you need to send e2etest at last in a separate PR?

Make sure to reply @mikhail-aws 's comments or make code change

httpK8sService *corev1.Service
grpcK8sService *corev1.Service
httpRoute *gwv1beta1.HTTPRoute
grpcRoute *gwv1alpha2.GRPCRoute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new e2e test suite consists of 8 tests and adds about 10% to the total e2e test run time

If you just create one of httpRoute/grpcRoute and create 2 AccessLogSubscriptions on it can it save some time? Or any other way to save e2etest time? (e.g., just test CloudWatchLogGroup type accessLog for gateway)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need 1 of each source type and 1 of each destination type to confirm all the use-cases work, otherwise there is risk that our TargetRef identification or DestinationArn validation is broken without us knowing. I'll be adding Delete and Update e2e tests here in upcoming PRs, I think we will be able to squish some use-cases together into single tests to save time once that happens.

@xWink
Copy link
Member Author

xWink commented Oct 12, 2023

Overall LGTM, looking forward to the rest of Access Log code.

Do you need to send e2etest at last in a separate PR?

Make sure to reply @mikhail-aws 's comments or make code change

Thank you, I have one more commit on the way with the DefaultTags logic from #428

e2e tests are in this PR

@mikhail-aws comments are all addressed in the above commits

@xWink xWink merged commit c3d0ab9 into aws:main Oct 13, 2023
5 checks passed
@xWink xWink deleted the create-access-log-policy branch October 13, 2023 21:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants