Save Jobs History on Flink #6

ranchodeluxe · 2023-11-15T13:53:54Z

Mount EFS to Job Managers so they can archive jobs for historical status lookups

Addresses: pangeo-forge/pangeo-forge-runner#122

Related PR: pangeo-forge/pangeo-forge-runner#131

terraform/aws/k8s_manifests.tf

yuvipanda

So, EFS is NFS. And NFS is one of those 'you have a problem, you think you will use NFS, and now you have two problems' situations. It plays poorly with a lot of data formats that use any kinda file locking (see https://www.sqlite.org/howtocorrupt.html#_filesystems_with_broken_or_missing_lock_implementations), and the file corruption only shows up in the worst possible times. So I think the primary, and perhaps the only, time to use NFS (and hence EFS) is when providing home directories.

Given we already have the EBS provisioner setup and use it for prometheus, can we not use EBS here too? It does mean that only one pod can write to an EBS volume at a time, but relying on NFS for multiple-replica high availability eventually only leads to tears, pain, blood, stale file handle crashes and death.

Left some inline comments about the kubernetes provider.

terraform/aws/k8s_manifests.tf

terraform/aws/variables.tf

ranchodeluxe · 2023-11-16T15:35:32Z

So, EFS is NFS. And NFS is one of those 'you have a problem, you think you will use NFS, and now you have two problems' situations. It plays poorly with a lot of data formats that use any kinda file locking (see https://www.sqlite.org/howtocorrupt.html#_filesystems_with_broken_or_missing_lock_implementations), and the file corruption only shows up in the worst possible times. So I think the primary, and perhaps the only, time to use NFS (and hence EFS) is when providing home directories.

Given we already have the EBS provisioner setup and use it for prometheus, can we not use EBS here too? It does mean that only one pod can write to an EBS volume at a time, but relying on NFS for multiple-replica high availability eventually only leads to tears, pain, blood, stale file handle crashes and death.

Left some inline comments about the kubernetes provider.

Thanks for giving me the deep deets on why EFS/NFS is bad. I was going to use EBS but then I realized something when playing with multiple job managers that made me switch back to EFS:

There's no reason we need to start the historyserver as the docs recommend. It seems the job manager REST API serves the history API (that's how the job manager UI basically works)
More importantly even if a job manager DID NOT RUN a job it can still find the archived job in the EFS mount and return information about it. This is important b/c that means any of the existing job manager REST APIs can tell us about all history even if the job manager that specially ran a job is killed (hence needing multiple pods to have the EFS mount). In the future we are probably going to need to create some type of kind: Job || CronJob reaper that cleans up kind: FlinkDeployment on a regular basis. If we do that we can't expect job-manager pods to stick around anyway

Does any of that assuage your fears and persuade you one way or the other @yuvipanda?

ranchodeluxe · 2023-11-16T19:57:04Z

doh, so poor: hashicorp/terraform-provider-kubernetes#1775 (comment)

maybe I just write a helm config since that works

yuvipanda · 2023-11-17T04:56:39Z

maybe I just write a helm config since that works

YESSS, I always prefer this over raw manifests :)

yuvipanda · 2023-11-17T05:03:52Z

Thanks for engaging with me on the EFS issues :) My goal here is not to say 'no EFS ever', but just to make sure we are only using it after we have completely determined that EBS is not an option.

So if I understand this correctly, the reason for EFS over EBS are:

Multiple pods may be writing to this filesystem.
a. QUESTION: Will these be concurrently writing to the same filesystem, or non-concurrently? What is the 'level' of concurrency - one writer per job, or multiple writers per job?
b. QUESTION: Will these multiple writers be writing to the same files, or different files? And concurrently, or serially?
Will this reaper process require direct read and write access to the files dropped there by the flink servers? I don't think I fully understand the relationship between the reaper and EFS.

I think answers to these questions will help me a lot :)

ranchodeluxe · 2023-11-17T12:54:15Z

1. Multiple pods may be writing to this filesystem.
   a. QUESTION: Will these be _concurrently_ writing to the same filesystem, or non-concurrently? What is the 'level' of concurrency - one writer per job, or multiple writers per job?

since jobs (and hence pods) can run concurrently then, yes, these will be writing to the same filesystem concurrently
I will need to look into the Flink source code more to discover how many writers per job. The logs make it seem like it's a single service handling the archival process so guessing one writer per job

   b. QUESTION: Will these multiple writers be writing to the _same_ files, or different files? And concurrently, or serially?

don't know the answer to this question until I investigate more
this question anticipates another thing to confirm in the Flink source -- how are Job IDs determined. Will they be unique across jobs (hence pods) or only unique per job manager? Or are they hashes of the source? If the Job IDs are not unique then multiple writers "could" be trying to write to the same file in the case of two jobs running simultaneously

2. Will this reaper process require direct read and write access to the files dropped there by the flink servers? I don't think I fully understand the relationship between the reaper and EFS.

No, the reaper process doesn't need to access the EFS mount. It's only checking kind: FlinkDeployment and their ages and then kubectl delete <kind:flinkdeployment> past some age expiry

ranchodeluxe · 2023-11-21T16:12:15Z

These clowns removed the 1.5.0 operator: https://downloads.apache.org/flink/flink-kubernetes-operator-1.5.0

ranchodeluxe · 2023-11-21T16:45:03Z

These clowns removed the 1.5.0 operator: https://downloads.apache.org/flink/flink-kubernetes-operator-1.5.0

Got confirmation from one of the devs that only the latest two operator versions are supported and one was just released. He's not sure if this documentation applies to the operators as well but it pretty much aligns:

https://flink.apache.org/downloads/#update-policy-for-old-releases

specific to the operator: https://cwiki.apache.org/confluence/display/FLINK/Release+Schedule+and+Planning

yuvipanda · 2023-11-22T19:15:13Z

helm-charts/flink-historyserver/templates/efs.yaml

+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: flink-historyserver-efs-pv


I like that this has 'historyserver' in the name, so it gets used specifically just for this and not much more :)

PersistentVolume is also not namespaced, and should get same treatment as StorageClass with .Release.Name. sorry for not catching that earlier.

@ranchodeluxe I think this still needs to be fixed?

yuvipanda

Thanks for working with me on this, @ranchodeluxe. I think using EFS is alright here! I've left some other minor comments, but overall lgtm

helm-charts/flink-historyserver/templates/efs.yaml

helm-charts/flink-historyserver/values.schema.json

helm-charts/flink-historyserver/values.yaml

ranchodeluxe · 2023-11-24T15:13:57Z

Thanks for working with me on this, @ranchodeluxe. I think using EFS is alright here! I've left some other minor comments, but overall lgtm

Sorry @yuvipanda I thought I muted this by turning it back into a draft so it wouldn't ping you. I'll do that now (it still needs a bit of work) and I'll incorporate your feedback before requesting another review. Here are some answers to some previous questions:

Multiple pods may be writing to this filesystem.
a. QUESTION: Will these be concurrently writing to the same filesystem, or non-concurrently? What is the 'level' of concurrency - one writer per job, or multiple writers per job?

The JobID(s) returned are statistically unique. And the writers of history to the NFS are a single process/thread

requestors changes have been made and I've requested a new review

Mount EFS to Job Managers so they can archive jobs for historical status lookups Addresses #122 Related PR: pangeo-forge/pangeo-forge-cloud-federation#6 Co-authored-by: ranchodeluxe <[email protected]> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

yuvipanda

Thank you very much for working on this, @ranchodeluxe

yuvipanda · 2023-11-28T19:02:06Z

helm-charts/flink-historyserver/templates/efs.yaml

+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: flink-historyserver-efs-pv


PersistentVolume is also not namespaced, and should get same treatment as StorageClass with .Release.Name. sorry for not catching that earlier.

helm-charts/flink-historyserver/templates/historyserver.yaml

terraform/aws/variables.tf

terraform/aws/main.tf

yuvipanda · 2023-12-05T04:03:39Z

terraform/aws/operator.tf

-        "prometheus.io/port" : "9999"
-      }
-    })
+    value = local_file.flink_operator_config.content


Can this simply be templatefile("flink_operator_config.tpl",{ mount_path=var.historyserver_mount_path }) instead? That way, we can get rid of having to gitignore .yaml files and save an additional resource here. Also keeps it simpler with one fewer level of redirection.

Actually, why is this a template at all? Can't it be just yamlencode still, with the values for jobmanager.archive.fs.dir be set to var.historyserver_mount_path? I think that's much cleaner, and we'll never run into YAML indentation issues due to how templating works.

yamlencode was already creating something that couldn't be parsed by the operator:

[WARN ] Error while trying to split key and value in configuration file /opt/flink/conf/flink-conf.yaml:55: Line is not a key-value pair (missing space after ':'?)

Let me look into this one after I clean other things up

Haha, this inline version of a Map is the only one that doesn't give the parse warning 😆

kubernetes.jobmanager.annotations: {"prometheus.io/scrape": true, "prometheus.io/port": 9999}

I'm gonna keep the template file b/c IMHO yamlencode is hard to grok

Could get that solution to render via templatefile or yamlencode correctly so ticked and moving onto better problems for now. Will have a look at it when I get around to getting Prometheus to work better

terraform/aws/helm_historyserver.tf

yuvipanda · 2023-12-05T04:07:38Z

terraform/aws/helm_historyserver.tf

+
+locals {
+  # removing lines that start with '#' b/c TF >> helm doesn't like them
+  filtered_log4j_config = join("\n", [


I'm actually really confused about what's happening here. Are we copy pasting the default values from a configmap generated by the operator onto our helm setup, but with # removed? Why are we copy pasting them?

I had removed these changes but there were committed to some other branch it seems. It's reusing the default flink-operator-config now

ranchodeluxe · 2023-12-19T19:00:44Z

@yuvipanda gentle nudge with some 🧁 for dessert 😄

yuvipanda

Thanks for the extra ping, @ranchodeluxe :)

yuvipanda · 2023-12-19T19:25:11Z

helm-charts/flink-historyserver/templates/efs.yaml

+apiVersion: v1
+kind: PersistentVolume
+metadata:
+  name: flink-historyserver-efs-pv


@ranchodeluxe I think this still needs to be fixed?

helm-charts/flink-historyserver/templates/historyserver.yaml

yuvipanda · 2023-12-19T19:28:54Z

helm-charts/flink-historyserver/values.yaml

@@ -0,0 +1,4 @@
+efsFileSystemId: ""


I think undoing the 'required' is much cleaner than enforcing them via minLength: 1. If there is a future change coming in that makes these required, we can modify the schema at that point, no?

terraform/aws/.gitignore

Co-authored-by: Yuvi Panda <[email protected]>

ranchodeluxe · 2024-01-31T19:57:45Z

alrighty then @yuvipanda, back at this with recent changes so @thodson-usgs can use EFS

moradology

LGTM

thodson-usgs · 2024-02-27T17:21:55Z

Looks good to me

ranchodeluxe changed the title ~~WIP: Save Jobs HIstory on Flink~~ WIP: Save Jobs History on Flink Nov 15, 2023

ranchodeluxe changed the title ~~WIP: Save Jobs History on Flink~~ Save Jobs History on Flink Nov 15, 2023

ranchodeluxe mentioned this pull request Nov 15, 2023

Save Job History on Flink pangeo-forge/pangeo-forge-runner#131

Closed

ranchodeluxe changed the title ~~Save Jobs History on Flink~~ WIP: Save Jobs History on Flink Nov 15, 2023

ranchodeluxe changed the title ~~WIP: Save Jobs History on Flink~~ Save Jobs History on Flink Nov 15, 2023

ranchodeluxe requested a review from yuvipanda November 15, 2023 23:09

ranchodeluxe commented Nov 15, 2023

View reviewed changes

terraform/aws/k8s_manifests.tf Outdated Show resolved Hide resolved

yuvipanda previously requested changes Nov 16, 2023

View reviewed changes

terraform/aws/k8s_manifests.tf Outdated Show resolved Hide resolved

terraform/aws/variables.tf Outdated Show resolved Hide resolved

ranchodeluxe requested a review from yuvipanda November 16, 2023 18:14

ranchodeluxe mentioned this pull request Nov 21, 2023

Upgrade Flink pangeo-forge/pangeo-forge-runner#150

Merged

yuvipanda reviewed Nov 22, 2023

View reviewed changes

ranchodeluxe marked this pull request as draft November 24, 2023 15:14

by default add flink historyserver to deployment

b47cc34

ranchodeluxe force-pushed the gcorradini/historyserver branch from f8b3ea5 to b47cc34 Compare November 24, 2023 23:15

ranchodeluxe mentioned this pull request Nov 25, 2023

[Flink] Add option to use JobManager to store history of jobs pangeo-forge/pangeo-forge-runner#151

Merged

ranchodeluxe marked this pull request as ready for review November 25, 2023 03:13

ranchodeluxe requested a review from yuvipanda November 25, 2023 03:13

incorporate Yuvi's feedback and move to vars

3f1baa8

yuvipanda requested changes Dec 5, 2023

View reviewed changes

ranchodeluxe added 5 commits December 6, 2023 07:46

clean up historyserver manifests and config names

07cd6ab

clean up historyserver manifests and config names partII

4cc9b4d

remove default hardcoded namespace and other things

15559f5

remove configmap ETL

7be14c5

remove file intermediary for templatefile

1fc95d9

ranchodeluxe mentioned this pull request Dec 7, 2023

kubernetes.jobmanager.annotations result in parse error #7

Open

move back to yamlencode

a3b0bb8

ranchodeluxe requested a review from yuvipanda December 7, 2023 17:43

yuvipanda requested changes Dec 19, 2023

View reviewed changes

ranchodeluxe and others added 4 commits December 23, 2023 08:54

Update helm-charts/flink-historyserver/templates/historyserver.yaml

dcbbd35

Co-authored-by: Yuvi Panda <[email protected]>

Update helm-charts/flink-historyserver/templates/historyserver.yaml

fbc07a3

Co-authored-by: Yuvi Panda <[email protected]>

Update helm-charts/flink-historyserver/templates/historyserver.yaml

bc1a2ea

Co-authored-by: Yuvi Panda <[email protected]>

Update helm-charts/flink-historyserver/templates/historyserver.yaml

cd672be

Co-authored-by: Yuvi Panda <[email protected]>

yuvipanda mentioned this pull request Jan 30, 2024

Need a persistent efs volume for storing Flink metadata #17

Closed

thodson-usgs mentioned this pull request Jan 30, 2024

Using yamlencode() to define flink-conf.yaml might be broken #13

Open

ranchodeluxe added 2 commits January 31, 2024 11:15

manually resolve merge conflicts

dbbf520

address comments

c92ddc4

ranchodeluxe requested a review from yuvipanda January 31, 2024 19:57

ranchodeluxe assigned yuvipanda Jan 31, 2024

ranchodeluxe mentioned this pull request Feb 1, 2024

Legacy Flink Support #19

Open

moradology approved these changes Feb 27, 2024

View reviewed changes

ranchodeluxe merged commit 62cdddf into main Feb 27, 2024

thodson-usgs mentioned this pull request Mar 1, 2024

Bump flink to 1.6.1 #16

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save Jobs History on Flink #6

Save Jobs History on Flink #6

ranchodeluxe commented Nov 15, 2023 •

edited

Loading

yuvipanda left a comment

ranchodeluxe commented Nov 16, 2023 •

edited

Loading

ranchodeluxe commented Nov 16, 2023

yuvipanda commented Nov 17, 2023

yuvipanda commented Nov 17, 2023

ranchodeluxe commented Nov 17, 2023 •

edited

Loading

ranchodeluxe commented Nov 21, 2023

ranchodeluxe commented Nov 21, 2023 •

edited

Loading

yuvipanda Nov 22, 2023

yuvipanda Nov 28, 2023

yuvipanda Dec 19, 2023

yuvipanda left a comment

ranchodeluxe commented Nov 24, 2023

yuvipanda left a comment

yuvipanda Nov 28, 2023

yuvipanda Dec 5, 2023

yuvipanda Dec 5, 2023

ranchodeluxe Dec 6, 2023

ranchodeluxe Dec 7, 2023

ranchodeluxe Dec 7, 2023

yuvipanda Dec 5, 2023

ranchodeluxe Dec 6, 2023

ranchodeluxe commented Dec 19, 2023

yuvipanda left a comment

yuvipanda Dec 19, 2023

yuvipanda Dec 19, 2023

ranchodeluxe commented Jan 31, 2024

moradology left a comment

thodson-usgs commented Feb 27, 2024

Save Jobs History on Flink #6

Save Jobs History on Flink #6

Conversation

ranchodeluxe commented Nov 15, 2023 • edited Loading

yuvipanda left a comment

Choose a reason for hiding this comment

ranchodeluxe commented Nov 16, 2023 • edited Loading

ranchodeluxe commented Nov 16, 2023

yuvipanda commented Nov 17, 2023

yuvipanda commented Nov 17, 2023

ranchodeluxe commented Nov 17, 2023 • edited Loading

ranchodeluxe commented Nov 21, 2023

ranchodeluxe commented Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuvipanda left a comment

Choose a reason for hiding this comment

ranchodeluxe commented Nov 24, 2023

yuvipanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranchodeluxe commented Dec 19, 2023

yuvipanda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ranchodeluxe commented Jan 31, 2024

moradology left a comment

Choose a reason for hiding this comment

thodson-usgs commented Feb 27, 2024

ranchodeluxe commented Nov 15, 2023 •

edited

Loading

ranchodeluxe commented Nov 16, 2023 •

edited

Loading

ranchodeluxe commented Nov 17, 2023 •

edited

Loading

ranchodeluxe commented Nov 21, 2023 •

edited

Loading