Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flink Historyserver for Async Job Status #122

Closed
ranchodeluxe opened this issue Nov 6, 2023 · 2 comments
Closed

Flink Historyserver for Async Job Status #122

ranchodeluxe opened this issue Nov 6, 2023 · 2 comments
Assignees

Comments

@ranchodeluxe
Copy link
Collaborator

Problem

We shouldn't have to run kubectl logs -f pod/<job-manager-pod> | grep 'Job BeamApp-flink-.*RUNNING to.*' to know if a job succeeded for failed. We want to poll against something async

Goal

Figure out how to set up and configure the HistoryServer

@ranchodeluxe ranchodeluxe self-assigned this Nov 6, 2023
@cisaacstern
Copy link
Member

Good call! And related to #20 where we discuss the possibility of adding a generic pangeo-forge-runner get-logs command that can dispatch to runner-specific (Flink, dataflow, etc.) logic for log fetching. This is also linked somewhere in that thread, but surfacing that here's a prototype of a Dataflow-specific implementation:

https://github.com/pangeo-forge/pangeo-forge-orchestrator/pull/150/files#diff-6c7aa5c43028a369df9006f054e85f0550bf33bbf1a13b6fdccfc8a61317b67b

(@ranchodeluxe for your context, the pangeo-forge-orchestrator repo that contains that file is our now-defunct attempt at running a Pangeo Forge web service, which I'm now in the process of winding down.)

@ranchodeluxe
Copy link
Collaborator Author

ranchodeluxe commented Nov 14, 2023

So close to having this working 🥳

Image

But still this:

Caused by: java.io.FileNotFoundException: /opt/job/history/04698b6be739a6db52e0b804ec915d93 (Permission denied)

yuvipanda pushed a commit that referenced this issue Nov 28, 2023
Mount EFS to Job Managers so they can archive jobs for historical status lookups

Addresses #122

Related PR: pangeo-forge/pangeo-forge-cloud-federation#6

Co-authored-by: ranchodeluxe <[email protected]>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants