Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[synapse] enable generator config update from ZK #241

Merged
merged 9 commits into from
Sep 6, 2017
Merged

[synapse] enable generator config update from ZK #241

merged 9 commits into from
Sep 6, 2017

Conversation

lap1817
Copy link
Contributor

@lap1817 lap1817 commented Aug 22, 2017

to: @jolynch, @igor47
cc: @liangg, @jtai , @wei-zhang

Dynamic generator config update in Syapse

In the current implementation, the “watcher” class has the “config_for_generator” field which is initialized when Synapse starts, and loads the static connection configuration from Synapse config.
The major changes to that are:

  • Allow Zookeeper watcher to update its “config_for_generator” field when getting update from ZK.
  • Trigger configuration re-generation when either backend list or generator config changes.
  • Store front/backend configure in Haproxy stat file, not only backend list.
  • Add haproxy restart trigger when front/backend configure changes.

Those changes fall into the following Classes.

BaseWatcher Class

The ‘set_backends’ function will take a second argument ‘new_conifg_for_generator’, default to empty hash. The function then compares the new config with the existing one, update the ‘conifg_for_generator’ field and trigger Synapse re-configure if necessary. Watchers other than ZookeeperWatcher do not need to change its function call to “set_backends” as the new argument is default to nil.

ZookeeperWatcher Class

In the “discovery” function, besides reading the children of the znode, also get its node data, parse into generator config.
Pass the new config to the “set_backends” function

Haproxy Class

Haproxy State file should also keep the haproxy config data.
When update_config, compare the new config with the one from state file. If haproxy config changes, it should trigger haproxy restart.

In order to keep backward compatible in the Haproxy Stat file, we propose to add a new key-value-pair under each service, besides the list of backends. The key is specially named as "watcher_config_for_generator". And creates a wrapper class around the current “seen” object (which deserializes the haproxy Stat file) so that it knows how to the added config element from the rest of the backend server list.
{
"service1": {
“watcher_config_for_generator”: { //watcher.config_for_generator["haproxy"]}
"i-xxxxxxx_10.0.0.0:80": {
"name": "i-xxxxxx",
"host": "10.0.0.0.83",
"port": 80,
"id": 75,
"weight": null,
"haproxy_server_options": null,
"labels": null,
"timestamp": 1502949735
}
}
}

For the purpose of comparing two Hash (of the config_for_generator), I am explicitly add the dependency to HashDiff gem, which was implicitly depended by gems that Synapse refers.

Test

  • updated existing rspec
  • added new rspec test cases to cover the new feature

@lap1817
Copy link
Contributor Author

lap1817 commented Aug 23, 2017

@jolynch @igor47 ,

Did you get a chance to take a look at the PR? We have a tight timeline and are pending on this change to make progress.
Please let me know what you think. Happy to chat online.

@jolynch
Copy link
Collaborator

jolynch commented Aug 24, 2017

@lap1817 thanks for the patch, I'll try to take a look asap.

I am curious though, what is the driving motivation? Are you dynamically adding in different ACLs or something based on some external script?

@lap1817
Copy link
Contributor Author

lap1817 commented Aug 24, 2017

@jolynch, thank you!

I sent you an email earlier this week, with a document regarding the whole story behind this change, including motivation and etc. Is [email protected] the correct email address?

Anyway, here is the doc: https://1drv.ms/b/s!AkCYvnLbJodli0E2OY_GuHITRHuC

And since you probably haven't got the email, I am Feng Pan, developer working in the Airbnb infra team. This patch is to address some of the pain points that our developers got when creating/maintaining services, in terms of service discovery.

@jolynch
Copy link
Collaborator

jolynch commented Aug 24, 2017

Ah, very cool. I actually left Yelp about a week ago for another opportunity, but given that most of my SmartStack work has been on my own time I'm happy to keep working on it heh. I am reading the document now and have a few comments, mostly around the tradeoffs of creating something like synapse-tools (a periodic per machine job that polls sources of services e.g. chef, mesos, etc ... and then filters by upstream service) vs a centralized system that publishes configuration to zookeeper itself.

I'm currently traveling but will try to review the patch tonight. This seems useful regardless of why, especially if we could also discover the list of services from ZK as well (eventually probably not in this iteration).

@lap1817
Copy link
Contributor Author

lap1817 commented Aug 24, 2017

thank you! I cannot see the comments you left in the doc. Do you mind paste them here? Or we can discuss in email thread ([email protected]).

Copy link
Collaborator

@jolynch jolynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some documentation updates for this change, but the code seems pretty reasonable to me, clever usage of the existing ZK node to provide the configuration data for the watcher itself.

I am curious though, how is this more dynamic than the job that configures synapse writing out this config and restarting synapse as needed? I suppose you only have to have one machine do the re-configuring per zk cluster with this design (whichever one writes the configuration data to the zk node), but I'm not sure how big of a benefit that is.

For example at Yelp we've really benefited from having fine control over each machine, including which services are advertised (nerve) and discovered (synapse) at a per machine level, while this architecture would essentially bring it down to you can control services per ZK cluster differently.

# generates a new config based on the state of the watchers
def generate_config(watchers)
new_config = generate_base_config
shared_frontend_lines = generate_shared_frontend

watchers.each do |watcher|
watcher_config = watcher.config_for_generator[name]
next if watcher_config.nil? || watcher_config.empty? || watcher_config['disabled']
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, so we're changing the previous behavior that empty watcher configs would be present without a listening port (e.g. a bare backend). Is this intentional?

I think if it's just HAProxy this is potentially reasonable (typically you have backend options for bare backends but ...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it looks like that in the parse_watcher_config function (the next line), it tries to get fields from watcher_config['listen'], and will throw exception if there is no 'listen' field. So I thought the watcher_config should at least be non-empty.

What is a valid use case when the watcher_config is completely empty?


# if we've never needed the backends, now is the time to load them
@seen = read_state_file if @seen.nil?
KEY_WATCHER_CONFIG_FOR_GENERATOR = "watcher_config_for_generator"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we achieve the same end perhaps slightly cleaner by versioning the state file and just supporting version 1 or the new version 2? E.g. in the new state files we write out there is a synapse_state_version key? If the key's there and it's == version 2 we can parse out well defined fields, otherwise we treat it as a list of backends.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to do versioned file once we start introducing breaking schemas into the cache. For this particular change, it is intended to be a non-breaking one. The field is added under each watcher and is optional for each watcher to have. So it seems to me that having a file-level version does not help much here, since we still need to logic such as checking "whether the watcher has the config_for_generator" field. Does it make sense to you? Or am I missing some of your points?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main part that's a bit hacky to me is that there is a special key (watcher_config_for_generator) that we effectively treat as a version check. This is fine, I just feel like having a well defined schema for what's in these hashes in each version would make reasoning about the state file easier (as opposed to this change which makes it so that we implicitly support two versions of the state file). This is not a strong objection, just raising the point (feel free add a TODO or just not do it heh, if I care enough I can always go back and change it).

I am all for backwards compatibility, and you could achieve something similar with a synapse_state_file_version key, which if found would indicate this newer formatting.

@lap1817
Copy link
Contributor Author

lap1817 commented Aug 31, 2017

@jolynch , thanks for the review! I've updated the README file accordingly. To your question, 'why this is more dynamic than synapse-tool', it is not. The main reason we choose this implementation is that it fits better to Airbnb's current stack, with less additional loads to ZK, and less cost in migration (potentially a long period given all the legacy services we have).

 -use node.first as the data blob

 -make haproxy generator to refresh the config before generate config

 -make the config_for_generator hash readonly in base watcher by deep clone
@lap1817
Copy link
Contributor Author

lap1817 commented Sep 2, 2017

@jolynch , I'd like to merge this PR. I can make follow-up PRs if you have further questions on the two comments. Can you grant me write permission for doing it?

And in order for this change to be effective, I made a very minor change in the nerve gem too. Please help to take look. airbnb/nerve#95

Copy link
Collaborator

@jolynch jolynch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg2m, I can merge it for you if you don't have merge otherwise go for it :-)

Gemfile.lock Outdated
@@ -1,9 +1,10 @@
PATH
remote: .
specs:
synapse (0.14.7)
synapse (0.15.1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just go to 0.15.0


# if we've never needed the backends, now is the time to load them
@seen = read_state_file if @seen.nil?
KEY_WATCHER_CONFIG_FOR_GENERATOR = "watcher_config_for_generator"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main part that's a bit hacky to me is that there is a special key (watcher_config_for_generator) that we effectively treat as a version check. This is fine, I just feel like having a well defined schema for what's in these hashes in each version would make reasoning about the state file easier (as opposed to this change which makes it so that we implicitly support two versions of the state file). This is not a strong objection, just raising the point (feel free add a TODO or just not do it heh, if I care enough I can always go back and change it).

I am all for backwards compatibility, and you could achieve something similar with a synapse_state_file_version key, which if found would indicate this newer formatting.

@lap1817
Copy link
Contributor Author

lap1817 commented Sep 4, 2017

@jolynch , I changed the version to 0.15.0 and added a TODO comment in the HaproxyState class.
I don't have the "merge" button". Please help me to merge and publish.

Thanks for taking caring of this over the long weekend. Have a good one!

@lap1817
Copy link
Contributor Author

lap1817 commented Sep 5, 2017

hi @jolynch , can you help me to merge? I don't have permisison

@jolynch jolynch merged commit 117e26d into airbnb:master Sep 6, 2017
@jolynch
Copy link
Collaborator

jolynch commented Sep 6, 2017

@lap1817 ok thanks! merged :-)

I think that folks at airbnb can still admin this project fwiw, it's still under the airbnb namespace so I imagine someone should be able to give you merge access in the future (or someone internal if I don't get back soon enough).

@lap1817
Copy link
Contributor Author

lap1817 commented Sep 6, 2017

@jolynch thank you, I will ask around to check (Igor left Airbnb a while back, who was the last one that I know have the permission).

Did you publish the gem btw?

@jolynch
Copy link
Collaborator

jolynch commented Sep 6, 2017

Yup, 0.15.0 on rubygems

@lap1817 lap1817 deleted the feng-pan-synapse branch September 6, 2017 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants