Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poller-lambdas #83

Merged
merged 22 commits into from
Dec 17, 2024
Merged

poller-lambdas #83

merged 22 commits into from
Dec 17, 2024

Conversation

twrichards
Copy link
Contributor

@twrichards twrichards commented Oct 28, 2024

This PR introduces the concept of and an abstraction for 'polling' any data source we like and queuing things up for ingestion (via the existing ingestion-lambda).

This abstraction takes care of the infrastructure (see below), build, deployment and boiler plate (obtaining secrets, writing to ingestion queue etc.) so that each 'poller-lambda' implementation (in the new poller-lambdas/src/pollers directory) is focussed purely on the business logic of requesting from the supplier and transforming items with proprietary shape into one or more of our shared IngestorPayloads.

It supports both 'long-polling' and the more traditional fixed-frequency polling (with a frequency in seconds, not constrained by min. 1 min frequency of Cloudwatch/Evenbridge rules). In both cases, it can pass a string output from the invocation to be used as the input for the next (e.g. for passing a pointer id/timestamp etc, but theoretically anything via stringified JSON, albeit transferred in plain text).

It has a useful interactive local run mode, launched by npm run dev in the poller-lambdas directory, to facilitate easy iteration when onboarding new suppliers (see README).

Infrastructure

For every poller (defined in POLLERS_CONFIG in shared/pollers.ts), the following CDK resources are provisioned and wired together...

  • Secret's Manager Secret - named according to STAGE and name of the poller, and populated with an arbitrary value ready to be updated with the real value from the supplier (typically holds an API key)
  • Lambda - uses the same zip for all the implementations (since they're tiny, and its easier that way) but references the correct handler export
  • SQS Queue - this is used to queue further invocations of the lambda, with whatever delay is needed and pass on the output of one invocation as input for the next invocation (the lambda is wired up to process things from the queue)

Each lambda is given permission to read the secret and to write to the ingestion queue and its own queue.

Each lambda has alarms for errors & throttling (both zero-tolerance) plus custom alarms for when the invocation rate is too high or low, which might indicate something has gone haywire.

Thanks to GuCDK, plus a bit of tinkering with the contentDirectory (in cdk/bin/cdk.ts) for the auto-generated riff-raff.yaml the deployment is also abstracted for all pollers.

Any other TODOs are things to be explored in future, not in this PR.

@twrichards twrichards force-pushed the infra/poller-lambdas branch 14 times, most recently from 8fea04d to 5d23daf Compare November 4, 2024 13:14
@twrichards twrichards force-pushed the infra/poller-lambdas branch 3 times, most recently from ff6b7b2 to a10d330 Compare November 4, 2024 16:48
@twrichards twrichards force-pushed the infra/poller-lambdas branch 3 times, most recently from a678b5c to aa48010 Compare November 5, 2024 17:02
@twrichards twrichards changed the base branch from main to riff-raff-dependency November 5, 2024 17:31
@twrichards twrichards force-pushed the riff-raff-dependency branch from fc4d5df to 9f8ca92 Compare November 5, 2024 19:21
@twrichards twrichards force-pushed the infra/poller-lambdas branch from 32d6100 to 5142877 Compare November 5, 2024 19:22
@twrichards twrichards force-pushed the riff-raff-dependency branch from 9f8ca92 to f5e149e Compare November 5, 2024 19:26
@twrichards twrichards force-pushed the infra/poller-lambdas branch from 5142877 to 914790e Compare November 5, 2024 19:26
Base automatically changed from riff-raff-dependency to main November 7, 2024 10:23
An error occurred while trying to automatically change base from riff-raff-dependency to main November 7, 2024 10:23
@twrichards twrichards force-pushed the infra/poller-lambdas branch from 914790e to b206dd2 Compare November 7, 2024 11:46
stage: 'PROD',
domainName: 'newswires.gutools.co.uk',
enableMonitoring: false,
sourceQueue: prodWiresFeeds.sourceQueue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a particular reason for creating the source queue as part of a different stack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think because FIP have a reference to it, so I think we wanted to decouple, such that we could theoretically tear down the newswires stack without impacting FIP writing to the queue. @andrew-nowak could probably confirm...

@@ -1,5 +1,5 @@
{
"app": "tsx bin/cdk.ts",
"app": "ts-node-dev bin/cdk.ts",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we discussed off-thread that the reason for using ts-node-dev here is because tsx restarts if the user presses Enter, which doesn't work with the current CLI setup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, swapped out the top-level dependency on tsx so needed to change it in cdk.json too (worth noting GuCDK intialised this command to one using npx which seemed a shame so I can see why it was changed to tsx at one point)

docker-compose.yml Outdated Show resolved Hide resolved

Poller lambdas have their own SQS queues to facilitate long-polling and to allow invoking on a frequency faster than the once per minute allowed by CloudWatch/EventBridge rules.

## Adding a new poller lambda
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💜🙌

Copy link
Contributor

@bryophyta bryophyta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested locally and read through the code and this looks good to me. Great work! Should make it very easy to get the AP and Reuters pollers set up 👍

@twrichards twrichards force-pushed the infra/poller-lambdas branch 2 times, most recently from 792dbea to b9bc81b Compare December 13, 2024 16:48
…adds complexity and potential race conditions
Copy link
Member

@andrew-nowak andrew-nowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work! all looks well thought-out and documented. I've left just a couple of non-blocking thoughts and questions

…nded and will be changing the code away from what's in VCS
@twrichards twrichards merged commit 981d7ca into main Dec 17, 2024
3 checks passed
@twrichards twrichards deleted the infra/poller-lambdas branch December 17, 2024 10:28
@bryophyta bryophyta mentioned this pull request Dec 18, 2024
7 tasks
@bryophyta bryophyta mentioned this pull request Jan 13, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants