[WIP] Flesh out v2 LoadBalancer #203

PtrTeixeira · 2017-01-18T21:58:02Z

WIP effort at filling out a handler for the ApplicationLoadBalancer to
match the interface from the old (v1/classic) load balancer within the
BaragonService. Should allow us to handle BaragonAgent connections /
lgoad balancer connections using the new API, once it gets finished.

WIP effort at filling out a handler for the ApplicationLoadBalancer to match the interface from the old (v1/classic) load balancer within the BaragonService. Should allow us to handle BaragonAgent connections / lgoad balancer connections using the new API, once it gets finished.

Move away from using `String.format(...)` in logging calls, in favor of slf4j's native interpolation with `{}`. This has no real impact and was just a quick win in the process of working on other parts of the application.

Removing a target (=== Instance) should (probably) work now, both individually and as part of the sync cycle. Adding new instances is going to take a little more work because ApplicationLoadBalancers have a little bit more infrastructural overhead than Classic load balancers. This commit also finishes making the change-over from using a string to represent the load balancer (TrafficSource). There was a datastore that should needed to move over to the new interface, which meant that there was a resource that also needed to move over.

Currently have sync ~ working in the ApplicationLoadBalancer. At the moment, syncing will 1. Find a target group with the same name as a baragon group, if possible 2. Guarantee that the targets (== instances) that baragon agents think that they're on are members of the target group 3. Guarantee that the load balancers that they baragon agents think they're talking to can reach all the az's that the baragon agents think that they're on 4. De-register targets that don't appear to have a known baragon agent associated with them, or which appear to have an uncommunicative baragon agent associated with them. It is limited in that, among other things, it requires that the target group be named the same as the baragon group. At the moment, the application load balancer will not support explicit requests to add or remove targets and requires a certain amount of infrastructure to already be in place.

PtrTeixeira · 2017-01-20T17:32:03Z

cc @ssalinas Mind taking a look at this?

ssalinas

Bunch of smaller comments for efficiency/etc. Nice first draft of this!

ssalinas · 2017-01-20T17:44:11Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+ * For this class I am working under the assumption that the ID's of Targets can
+ * be interchanged with the ID's of instances. That is, if you ask this class
+ * for the health of instance 12345, then you will actually expect the health
+ * of target 12345.


This should be accurate. The docs allude to the ability to register something besides an ec2 instance as a target (maybe future functionality?) but we won't have to worry about that. For our purposes it's accurate that instance id == target id 👍

ssalinas · 2017-01-20T17:52:00Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ClassicLoadBalancer.java

            } else {
-              LOG.debug(String.format("Agent %s is already registered", agent));
+              LOG.debug("Agent {} is already registered", agent);


+1 for updating all of these

ssalinas · 2017-01-20T17:55:42Z

BaragonService/src/main/java/com/hubspot/baragon/service/resources/LoadBalancerResource.java

@@ -68,7 +70,8 @@ public LoadBalancerResource(BaragonLoadBalancerDatastore loadBalancerDatastore,
  @POST
  @Path("/{clusterName}/sources")
  public BaragonGroup addSource(@PathParam("clusterName") String clusterName, @QueryParam("source") String source) {
-    return loadBalancerDatastore.addSourceToGroup(clusterName, source);
+    TrafficSource trafficSource = new TrafficSource(source, TrafficSourceType.CLASSIC);
+    return loadBalancerDatastore.addSourceToGroup(clusterName, trafficSource);


we'll need an additional endpoint that can take a TrafficSource json body. Maybe add that as one with @Path("/{clusterName}/traffic-sources") and we can mark the old one as @Deprecated

ssalinas · 2017-01-20T18:02:23Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+        if (baragonGroup.getSources().isEmpty()) {
+          LOG.debug("No traffic sources present for group {}", baragonGroup.getName());
+        } else {
+          Collection<LoadBalancer> elbsForBaragonGroup = getLoadBalancersByBaragonGroup(baragonGroup);


we've had problems in the past with hitting aws' api rate limits. Rather than making an individual call for each (even though it is filtered down), it might be wise to move this outside of the loop. We can describe all ALBs in one call, then grab what we need from the object. It's likely faster to loop through the object in memory than to make a new call for new data.

I think that I got most of this, but it is still making calls in a pretty tight loop in isLastHeathlyInstance during de-registration. I can try and patch that up too.

Yeah, that's a tough one, especially since we really want to be sure that info is up to date for that check. If we can lessen the api calls in other areas at least i think that is fine. We are doing something similar for classic load balancers with isLastHealthyInstance as well

ssalinas · 2017-01-20T18:05:15Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+
+          LOG.debug("Registering TargetGroup for baragon group {}", baragonGroup.getName());
+          Optional<TargetGroup> maybeTargetGroup = guaranteeTargetGroupFor(baragonGroup);
+          if (maybeTargetGroup.isPresent()) {


if this isn't present let's log, maybe at WARN level, so that we have an indicator of what's going on

ssalinas · 2017-01-20T18:18:21Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+          }
+        } else {
+        /* Is this really an appropriate thing to do here? */
+          throw new IllegalArgumentException(


I think this is good here since we catch it right below. We want to continue trying to sync the remaining agents (which we do), but this is a rather blocking error for that one agent, so logging at error level + the notify is good to do here.

ssalinas · 2017-01-20T18:22:21Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+    }
+
+    guaranteeHasAllTargets(targetGroup, baragonAgents);
+    guaranteeAzEnabled(baragonAgents, loadBalancers);


The az enable will need to go before the target addition most likely. In the case of the classic load balancers, if the az/subnet were not enabled, any registration in that az/subnet would fail. I would suspect it is likely the case here as well, although I can't find any docs that say so

Swapped. I think it mattered more when the instances were registered against the load balancers directly, but I couldn't find anything to say that TargetGroups cared about subnets or az's.

ssalinas · 2017-01-20T18:27:07Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+    }
+
+    try {
+      SetSubnetsRequest subnetsRequest = new SetSubnetsRequest()


does SetSubnets override all current ones? If so we'd want to make sure this list also includes ones currently active so we aren't removing any that might still be needed (i.e. subnetsOnLoadBalancer + subnetIds here)

Good catch, thank you!

ssalinas · 2017-01-20T18:30:43Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+    }
+  }
+
+  private void guaranteeAzEnabled(Collection<BaragonAgentMetadata> agents,


Read over the docs here again. In the old classic the az/subnet operations were a bit more separated. I think here we can get away with gathering the list of subnets from the agents and feeding that to guaranteeHasAllSubnets, rather than doing the az check and than having to check subnets after

My concern is that if there were agents that were assigned to the same az, but different subnets then we might accidentally wind up clobbering access to one of the subnets. I think load balancers are still on a one-subnet-per-az policy, and I didn't see anything that said what would happen if two subnets in the same az were added at the same time.

ssalinas · 2017-01-20T18:33:15Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+
+    Collection<TargetDescription> removableTargets = new HashSet<>();
+    for (TargetDescription targetDescription : targetsOnGroup) {
+      if (! knownAgentIds.contains(targetDescription.getId()) && canDeregisterAgent(baragonGroup, targetDescription.getId())) {


you should be able to copy some of the logic from here for this. If you find yourself rewriting bits of those methods, feel free to move them to the ElasticLoadBalancer class so they can be shared

I think the data you are looking at is slightly different than you are expecting here. Your agentIds method below is returning Baragon's agent id (which in this case is the url on which the agent can be hit by BaragonService). What you probably want in the knwonAgentIds variable is the instance IDs which should be equal to the target id. Instance ID should also be available in the BaraongAgentMetadata. Maybe rename that list as well to better show what it contains once you update it.

ssalinas · 2017-01-20T18:34:31Z

Added a number of comments around efficiency of the api calls we are making/etc @PtrTeixeira , nice first draft of this!

Patch up (most) of the issues raised during review. Still to do: 1. Migrate duplicated logic in `knownAgents` &c into `ElasticLoadBalancer` 2. Add an additional endpoint that offers a little bit more control over creating traffic sources to `LoadBalancerResource`

I had duplicated some of the functionality in `ElasticLoadBalancer` w/r/t checking whether an instance / target could be de-registered in order to keep the scope of the changes that I was making smaller. This DRYs it a little bit; makes `ElasticLoadBalancer` take an ID rather than a model object from the elasticloadbalancingv1, so it can be used from both the `ApplicationLoadBalancer` and the `ClassicLoadBalancer`.

ssalinas · 2017-01-23T14:24:00Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+    String nextPage = result.getNextMarker();
+    loadBalancers.addAll(result.getLoadBalancers());
+
+    while (! nextPage.isEmpty()) {


Just checking, will the aws API ever return a null for this, or is it just an empty string when there are no more pages? If it's possible there might be a null, I'd opt for !Strings.isNullOrEmpty() here

ssalinas · 2017-01-23T14:27:45Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+              deregisterIfOld(baragonGroup, targetGroup, baragonAgents, targets);
+            }
+          } else {
+            LOG.debug("Not TargetGroup for Baragon Group {}", baragonGroup);


ssalinas · 2017-01-23T14:31:42Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+  public void syncAll(Collection<BaragonGroup> baragonGroups) {
+    Collection<LoadBalancer> allLoadBalancers = getAllLoadBalancers(baragonGroups);
+    try {
+      for (BaragonGroup baragonGroup : baragonGroups) {


Since a few of the methods in this for loop can possibly throw exceptions, we should probably move the try/catch inside the loop. This way we are still syncing the remaining groups, even if one should fail

ssalinas · 2017-01-23T14:37:30Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+    return targetDescriptions;
+  }
+
+  private Collection<TargetDescription> listOldTargets(BaragonGroup baragonGroup,


for the sake of clarify I'd rename this method to something like listRemovableTargets. Old is just a bit generic in terms of what we are actually finding here.

ssalinas · 2017-01-23T14:41:05Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+
+  private Collection<TargetDescription> listOldTargets(BaragonGroup baragonGroup,
+                                                       Collection<TargetDescription> targetsOnGroup,
+                                                       Collection<BaragonAgentMetadata> knownAgents) {


Be careful what you call 'known' for agents. There is a concept of BaragonKnownAgentMetadata which is the agent metadata along with a 'list seen' timestamp (stored in a separate zk node). However, what you are passing here is actually the regular agent metadata for the active agent (as stored in the leader-latch zk node)

ssalinas · 2017-01-23T14:42:20Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

+
+    Collection<TargetDescription> removableTargets = new HashSet<>();
+    for (TargetDescription targetDescription : targetsOnGroup) {
+      if (! knownAgentIds.contains(targetDescription.getId()) && canDeregisterAgent(baragonGroup, targetDescription.getId())) {


I think the data you are looking at is slightly different than you are expecting here. Your agentIds method below is returning Baragon's agent id (which in this case is the url on which the agent can be hit by BaragonService). What you probably want in the knwonAgentIds variable is the instance IDs which should be equal to the target id. Instance ID should also be available in the BaraongAgentMetadata. Maybe rename that list as well to better show what it contains once you update it.

ssalinas · 2017-01-23T14:44:56Z

Once the comments above are addressed we should just need to implement the removeInstance and registerInstance methods, then we can merge this into the alb_support branch and give it a whirl 👍

Involved a handful of changes, including a nullity check, swapping the order of a loop and try block, and clarifying some conceptual differences related to agent metadata.

ssalinas · 2017-01-23T16:55:58Z

BaragonService/src/main/java/com/hubspot/baragon/service/elb/ApplicationLoadBalancer.java

          }

          LOG.debug("ELB sync complete for group {}", baragonGroup);
+        } catch (AmazonClientException acexn) {
+          LOG.error("Could not retrieve elb information due to ELB client error {}", acexn);
+          exceptionNotifier.notify(acexn, ImmutableMap.of("groups", baragonGroups == null ? "" : baragonGroups.toString()));


nit: We're already accessing baragonGroups above by entering the for loop, probably don't need an additional null check

This was just a copy-paste relic; when I swapped the order of the try block and the loop, the error message still checked for the nullity of the _collection_ and had the _collection_ in the error message. The nullity check was removed as being unnecessary, and the error message now refers to the particular group that failed, rather than the collection of groups.

Implement methods to register and remove instances (ie, targets) in the context of an application load balancer. This works a little bit differently, semantically, than the equivalent methods for the classic load balancers. In particular, it treats the `TargetGroup` as the `TrafficSource`, and registers and deregisters from the `TargetGroup` rather than from load balancers individually. This means that whereas before it was possible to register and deregister a single instances from individual load balancers, now it is only possible to register and deregister from load balancers on the scope of the target group. For example, suppose that LB1 and LB2 were talking to Instance. It would have been possible to remove Instance from LB2 and leave it talking to LB1. With ApplicationLoadBalancers, LB1 and LB2 talk to a target group, so they can't be individually talked to at the instance level. It winds up working out the same way, if you consider ClassicLoadBalancers and TargetGroups to be the equivalent TrafficSources, rather than ClassicLoadBalancers and ApplicationLoadBalancers. You just have the target belong to multiple TargetGroups.

ssalinas · 2017-01-25T15:44:44Z

Going to merge this into the other ALB pr, will be simpler to rebase off of the basepom upgrade that way

PtrTeixeira added 4 commits January 18, 2017 16:54

Migrate to SLF4j paramerterized logging

70c82ea

Move away from using `String.format(...)` in logging calls, in favor of slf4j's native interpolation with `{}`. This has no real impact and was just a quick win in the process of working on other parts of the application.

ssalinas requested changes Jan 20, 2017

View reviewed changes

PtrTeixeira added 2 commits January 20, 2017 14:52

ssalinas requested changes Jan 23, 2017

View reviewed changes

Fix change requests from code review

8e2606f

Involved a handful of changes, including a nullity check, swapping the order of a loop and try block, and clarifying some conceptual differences related to agent metadata.

ssalinas reviewed Jan 23, 2017

View reviewed changes

PtrTeixeira added 2 commits January 23, 2017 12:06

ssalinas merged commit 07bfceb into alb_support Jan 25, 2017

ssalinas deleted the aws_application_load_balancer branch January 25, 2017 15:44

ssalinas modified the milestone: 0.5.0 May 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Flesh out v2 LoadBalancer #203

[WIP] Flesh out v2 LoadBalancer #203

PtrTeixeira commented Jan 18, 2017

PtrTeixeira commented Jan 20, 2017

ssalinas left a comment

ssalinas Jan 20, 2017

ssalinas Jan 20, 2017

ssalinas Jan 20, 2017

ssalinas Jan 20, 2017

PtrTeixeira Jan 20, 2017

ssalinas Jan 20, 2017

ssalinas Jan 20, 2017

ssalinas Jan 20, 2017

ssalinas Jan 20, 2017

PtrTeixeira Jan 20, 2017

ssalinas Jan 20, 2017

PtrTeixeira Jan 20, 2017

ssalinas Jan 20, 2017

PtrTeixeira Jan 20, 2017

ssalinas Jan 20, 2017

ssalinas Jan 23, 2017

ssalinas commented Jan 20, 2017

ssalinas Jan 23, 2017

ssalinas Jan 23, 2017

ssalinas Jan 23, 2017

ssalinas Jan 23, 2017

ssalinas Jan 23, 2017

ssalinas Jan 23, 2017

ssalinas commented Jan 23, 2017

ssalinas Jan 23, 2017

ssalinas commented Jan 25, 2017

[WIP] Flesh out v2 LoadBalancer #203

[WIP] Flesh out v2 LoadBalancer #203

Conversation

PtrTeixeira commented Jan 18, 2017

PtrTeixeira commented Jan 20, 2017

ssalinas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssalinas commented Jan 20, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ssalinas commented Jan 23, 2017

Choose a reason for hiding this comment

ssalinas commented Jan 25, 2017