Skip to content
This repository has been archived by the owner on Nov 13, 2018. It is now read-only.

TO: Should enumerate all origins separately in parent.config #825

Closed
mtorluemke opened this issue Dec 1, 2015 · 5 comments
Closed

TO: Should enumerate all origins separately in parent.config #825

mtorluemke opened this issue Dec 1, 2015 · 5 comments
Assignees
Milestone

Comments

@mtorluemke
Copy link
Contributor

The "dest_domain=." line catches too much -- ATS maintains health on each line in parent.config separately, so there is potential to affect more delivery services when a single origin is slow. Traffic Ops should enumerate each origin into its own line in parent.config

@mtorluemke mtorluemke added this to the 1.4.0 milestone Dec 1, 2015
@dewrich
Copy link
Contributor

dewrich commented Dec 1, 2015

Do you have an example of what that should look like? Comma separated? And where does that list come from that gets assigned? A dropdown of Edges and Mids?

@smalenfant
Copy link
Contributor

@mtorluemke Are you sure about this? I'm seeing events here that my Mids reaches the threshold, but they go down only for the specific origin (which is caught by the "."). Other DS are fine. The line in diags.log is deceiving.

@mtorluemke
Copy link
Contributor Author

@smalenfant only as sure as I've been convinced -- I don't think we have a lot of empirical data. Anything further you can share?

@mtorluemke
Copy link
Contributor Author

@dewrich all of the origins that have special requirements get enumerated already -- code to do that is 1025-1034 of lib/UI/ConfigFiles.pm in master. This change would be to expand that, and replace (remove?) lines 1049-1055 in the same file.

Also, today all mids in the cache group serve traffic for all delivery services, unlike at edge tier where you can assign specific edges to a given delivery service.

@smalenfant
Copy link
Contributor

@mtorluemke I'm not against this change at all (even would like it if that would prevent my possible Mid marked down because on a single origin).

Will try to find some data in Splunk related to past event that we had with a single origin failure. It didn't seem to affect other services in the catch all, but will need to double verify. I remember hitting @jrushf1239k about this.

I'll I can say is I see this in the ATS debug logs which was in the catch all, there is a mention about the origin.

[Sep 14 16:59:22.107] Server {0x2b91c3e06700} DEBUG: (parent_select) Matched with 0x110d8f8 parent node from line 2
[Sep 14 16:59:22.107] Server {0x2b91c3e06700} DEBUG: (parent_select) config->FailThreshold = 10
[Sep 14 16:59:22.107] Server {0x2b91c3e06700} DEBUG: (parent_select) Selecting a down parent due to little failCount(faileAt: 1442249951 failCount: 1)
[Sep 14 16:59:22.107] Server {0x2b91c3e06700} DEBUG: (parent_select) Chosen parent = cdn1cdmid01.coxlab.net.80
[Sep 14 16:59:22.107] Server {0x2b91c3e06700} DEBUG: (parent_select) Result for origin.coxlab.net was parent cdn1cdmid01.coxlab.net:80
[Sep 14 17:00:23.173] Server {0x2b91c3e06700} DEBUG: (parent_select) Parent fail count increased to 2 for cdn1cdmid01.coxlab.net:80
[Sep 14 17:00:23.173] Server {0x2b91c3e06700} DEBUG: (parent_select) result->start_parent=1, num_parents=2
[Sep 14 17:00:23.173] Server {0x2b91c3e06700} DEBUG: (parent_select) config->FailThreshold = 10
[Sep 14 17:00:23.173] Server {0x2b91c3e06700} DEBUG: (parent_select) Selecting a down parent due to little failCount(faileAt: 0 failCount: 0)
[Sep 14 17:00:23.173] Server {0x2b91c3e06700} DEBUG: (parent_select) Chosen parent = cdn1cdmid02.coxlab.net.80

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants