Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLOUDFLARE: adopt ZoneCache #3373

Merged
merged 1 commit into from
Jan 15, 2025

Conversation

das7pad
Copy link
Contributor

@das7pad das7pad commented Jan 15, 2025

This PR is adopting the new ZoneCache for the CLOUDFLARE driver.

The integration tests are passing (CF_REDIRECT tests skipped). I've done some extra manual testing on the zone creation plus provisioning a zone in a single pass, see below.

(#3330 plus the zone cache changes as linked in #3337)

Manual testing

Preview, can create zone and preview new records

var REG_NONE = NewRegistrar('none')
var DSP = NewDnsProvider('CLOUDFLAREAPI')

D('testing-2025-01-15-0.dev', REG_NONE, DnsProvider(DSP),
    A('@', '127.0.0.1')
)
$ dnscontrol preview
******************** Domain: testing-2025-01-15-0.dev
1 correction (CLOUDFLAREAPI)
#1: Ensuring zone "testing-2025-01-15-0.dev" exists in "CLOUDFLAREAPI"
Added zone for testing-2025-01-15-0.dev to Cloudflare account: xxx
SUCCESS!
CONCURRENTLY gathering 1 zone(s)
SERIALLY gathering 0 zone(s)
Waiting for concurrent gathering(s) to complete...DONE
******************** Domain: testing-2025-01-15-0.dev
1 correction (CLOUDFLAREAPI)
#1: + CREATE testing-2025-01-15-0.dev A 127.0.0.1 proxy=false ttl=1
Done. 2 corrections.

Push, can add record to existing zone and create and add records in one go

var REG_NONE = NewRegistrar('none')
var DSP = NewDnsProvider('CLOUDFLAREAPI')

D('testing-2025-01-15-0.dev', REG_NONE, DnsProvider(DSP),
    A('@', '127.0.0.1')
)

D('testing-2025-01-15-1.dev', REG_NONE, DnsProvider(DSP),
    A('@', '127.0.0.1')
)
$ dnscontrol push
******************** Domain: testing-2025-01-15-1.dev
1 correction (CLOUDFLAREAPI)
#1: Ensuring zone "testing-2025-01-15-1.dev" exists in "CLOUDFLAREAPI"
Added zone for testing-2025-01-15-1.dev to Cloudflare account: xxx
SUCCESS!
CONCURRENTLY gathering 2 zone(s)
SERIALLY gathering 0 zone(s)
Waiting for concurrent gathering(s) to complete...DONE
******************** Domain: testing-2025-01-15-0.dev
1 correction (CLOUDFLAREAPI)
#1: + CREATE testing-2025-01-15-0.dev A 127.0.0.1 proxy=false ttl=1
SUCCESS!
******************** Domain: testing-2025-01-15-1.dev
1 correction (CLOUDFLAREAPI)
#1: + CREATE testing-2025-01-15-1.dev A 127.0.0.1 proxy=false ttl=1
SUCCESS!
Done. 3 corrections.

// no corrections on retry
$ dnscontrol push
CONCURRENTLY gathering 2 zone(s)
SERIALLY gathering 0 zone(s)
Waiting for concurrent gathering(s) to complete...DONE
******************** Domain: testing-2025-01-15-0.dev
******************** Domain: testing-2025-01-15-1.dev
Done. 0 corrections.

@das7pad
Copy link
Contributor Author

das7pad commented Jan 15, 2025

The integration tests are passing

@tlimoncelli How long does it take to run the integration tests for CLOUDFLARE on your end? It takes about 30min on my end, which I find quite slow. The HETZNER tests run in about a third of that, which is still slow. Do you usually run a subset of the tests during development?

@tlimoncelli
Copy link
Contributor

The integration tests are passing

@tlimoncelli How long does it take to run the integration tests for CLOUDFLARE on your end? It takes about 30min on my end, which I find quite slow. The HETZNER tests run in about a third of that, which is still slow. Do you usually run a subset of the tests during development?

It takes about 14 minutes. Cloudflare is one of the slower providers.

You can see the timings here:
https://github.com/StackExchange/dnscontrol/actions/runs/12797268789
(click on "Matrix: integration-tests")

screencapture_2025-01-15_17 20 21 screencapture 2025-01-15 at 5 20 28 PM

I would love it if they all took less than 5 minutes.

Tom

@tlimoncelli
Copy link
Contributor

Do you usually run a subset of the tests during development?

Yes, I use the -start and -end flags to just run specific tests. However I don't merge to main until they all run cleanly, and that can take a long while.

I'm open to suggestions. Here are some ideas I've had:

  1. Have a "fast test" version that skips all but the most crucial tests.
  2. Work with each maintainer to speed up the code (like I did with TransIP recently)
  3. Ask the maintainer to request a rate limit exception from the provider (some have done that!)
  4. Use 2 domains and run half the tests on each.
  5. Use a bigger VM for the tests. GHA's free tier runs these tests much, much, slower than on my desktop

I could use a volunteer to manage the relationships with the maintainers. First job would be to find them all (some emails now bounce), second job would be to get more providers into the automated testing, third would be to get the tests to run a lot faster.

Tom

@tlimoncelli tlimoncelli merged commit 0d5b3c2 into StackExchange:main Jan 15, 2025
20 checks passed
@das7pad
Copy link
Contributor Author

das7pad commented Jan 15, 2025

Thanks for the insights. I can throw in another idea for improving the wall-time of the tests:

Run independent tests concurrently:

  • give each TestGroup a "namespace"/prefix: e.g. "complex TXT" gets prefix "0-", record "foo0" turns into "0-foo0"; "testByLabel" gets "1-", record "foo" turns into "1-foo"
  • bundle the corrections of each TestCase step together and apply them
  • provide each TestCase with their filtered records after applying
-> time ->

Before: TestGroup1
        `-> TestCase1 "foo" is "1.1.1.1" -> TestCase2 "foo" is "1.1.1.2"
                                                                        TestGroup2
                                                                        `-> TestCase1' "foo" is "2.2.2.2" -> TestCase2' "foo" is "2.2.2.3"

---

After: TestGroup1 -> TestCase1  "0-foo" is "1.1.1.1" -> TestCase2  "0-foo" is "1.1.1.2"
       TestGroup2 -> TestCase1' "1-foo" is "2.2.2.2" -> TestCase2' "1-foo" is "2.2.2.3"
                                                                                       TestGroup3 -- opting out of concurrency
                                                                                       `-> TestCase1'' -> TestCase2''

Some TestGroups might need to opt-out of this as they do something special, e.g. need to control the exact record name. Some drivers might hit limits on the number of records in a zone.

As many drivers can batch together multipler create/update operations in a single API calls, this should cut the wall-time from reducing the waterfall of requests. The number of requests will also drop, easing the pressure on rate-limits.

In the above example:

  • Before: list, create 1, list, update 1, list; delete 1; list, create 1 , list, update 1, list, delete 1
  • After: list, create 2, list, update 2, list, delete 2

@tlimoncelli
Copy link
Contributor

@das7pad I agree about grouping updates. I think all (most?) providers do that if its an option.

Sadly the way the testcases work, we can run them concurrently unless we have a separate domain for each concurrent group. Not an impossible request, but it will require reworking the github action.

@das7pad
Copy link
Contributor Author

das7pad commented Jan 17, 2025

Me two days ago: It takes about 30min on my end [to run the CLOUDFLARE integration tests], which I find quite slow.

Perhaps I should have compared the timings with main 🤦 #3394 brings them back to 11min.

tlimoncelli added a commit that referenced this pull request Jan 21, 2025
commit 901a3ac
Author: Tom Limoncelli <[email protected]>
Date:   Tue Jan 21 14:43:33 2025 -0500

    CHORE: Update dependencies (#3397)

commit 70e9659
Author: Tom Limoncelli <[email protected]>
Date:   Tue Jan 21 14:29:53 2025 -0500

    MSDNS: Provider is failing due to lint fix gone wrong (#3396)

commit 5e15bbe
Author: Jakob Ackermann <[email protected]>
Date:   Sat Jan 18 13:54:37 2025 +0000

    BUG: fetch zones once in ZoneCache (#3394)

commit a631c5b
Author: Kai Schwarz <[email protected]>
Date:   Fri Jan 17 20:15:10 2025 +0100

    CNR: Initial Performance improvement; golint review (#3391)

commit e1c9785
Author: Tom Limoncelli <[email protected]>
Date:   Fri Jan 17 07:11:10 2025 -0500

    CHORE: Update dependencies (#3385)

commit 9e88b6a
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 21:47:10 2025 -0500

    CICD: Make pager tests more visible (#3387)

commit 67db0e2
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 14:41:18 2025 -0500

    GCLOUD: remove (irrelevant) slow test (#3384)

commit c348e35
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 14:32:32 2025 -0500

    GCLOUD: CICD: Skip the pager1201 integration test (#3383)

commit 5cfb907
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 14:17:47 2025 -0500

    TRANSIP: Pause when rate-limited (#3378)

commit f666af8
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 13:56:46 2025 -0500

    GCLOUD: Re-try on 502 errors (#3376)

commit 1a1a4bf
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 12:54:48 2025 -0500

    INWX: Enable SRV to have "." target (#3380)

commit 3556439
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 10:58:11 2025 -0500

    CLOUDFLAREAPI: No longer treat TTL=300 as special (#3368)

    Co-authored-by: Sukka <[email protected]>

commit 89c65b6
Author: Tom Limoncelli <[email protected]>
Date:   Thu Jan 16 10:03:00 2025 -0500

    INWX: Permit "." target for SRV records (#3377)

commit fc2c506
Author: Tom Limoncelli <[email protected]>
Date:   Wed Jan 15 18:28:15 2025 -0500

    CICD: Warn user if -provider does not match profiles.json:TYPE (#3375)

commit 0d5b3c2
Author: Jakob Ackermann <[email protected]>
Date:   Wed Jan 15 22:43:24 2025 +0000

    CLOUDFLARE: adopt ZoneCache (#3373)

commit 2ef2362
Author: Jakob Ackermann <[email protected]>
Date:   Wed Jan 15 20:23:02 2025 +0000

    HETZNER: adopt ZoneCache (#3372)

commit ab00797
Author: Tom Hughes <[email protected]>
Date:   Wed Jan 15 02:07:19 2025 +0000

    FEATURE: Extend PTR magic handling to support RFC4183 names (#3364)

commit 5c9b170
Author: Jakob Ackermann <[email protected]>
Date:   Wed Jan 15 02:05:17 2025 +0000

    FEAT: Add ZoneCache primitive (#3365)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants