Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation to reprocess last 12 months of data #436

Merged
merged 7 commits into from
Feb 15, 2024

Conversation

cristinaleonr
Copy link
Contributor

@cristinaleonr cristinaleonr commented Feb 15, 2024

This PR adds an implementation to only reprocess the last 12 months of data by default. There is an option to reprocess the full archive history for a datatype by adding the FullHistory option in the config.

The changes have been running in sandbox the past couple of days with the FullHistory option enabled for ndt7 (now removed).

Screen.recording.2024-02-15.3.06.09.PM.webm

This change is Reviewable

@coveralls
Copy link
Collaborator

coveralls commented Feb 15, 2024

Pull Request Test Coverage Report for Build 3718

Details

  • -1 of 11 (90.91%) changed or added relevant lines in 1 file are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.1%) to 78.417%

Changes Missing Coverage Covered Lines Changed/Added Lines %
job-service/job-service.go 10 11 90.91%
Totals Coverage Status
Change from base Build 3709: 0.1%
Covered Lines: 1268
Relevant Lines: 1617

💛 - Coveralls

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: with optional comment.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @cristinaleonr)


config/config.go line 34 at r1 (raw file):

	Datasets    Datasets `yaml:"target_datasets"`
	DailyOnly   bool     `yaml:"daily_only"`
	FullHistory bool     `yaml:"full_history"` // FullHistory = true implies DailyOnly = false.

High level thought if you like.

The default behavior has been "full history".
We added daily_only to limit that to daily.
You're adding an option to effectively limit to "yearly only" as the new default.

What if the new option were named "YearlyOnly" so the default behavior is unchanged and the boolean cases for daily and yearly would read similarly. The daily and yearly options would still be mutually exclusive.

Copy link
Contributor Author

@cristinaleonr cristinaleonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @stephen-soltesz)


config/config.go line 34 at r1 (raw file):

Previously, stephen-soltesz (Stephen Soltesz) wrote…

High level thought if you like.

The default behavior has been "full history".
We added daily_only to limit that to daily.
You're adding an option to effectively limit to "yearly only" as the new default.

What if the new option were named "YearlyOnly" so the default behavior is unchanged and the boolean cases for daily and yearly would read similarly. The daily and yearly options would still be mutually exclusive.

Given the higher cost of reprocessing the entire archive once the data is moved to cold storage, I thought the default behavior should be "yearly only" (not "full history" anymore). Reprocessing an entire datatype's history should be a deliberate choice added to the datatype's configuration.

What do you think?

Copy link
Contributor

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @cristinaleonr)


config/config.go line 34 at r1 (raw file):

Previously, cristinaleonr (Cristina Leon) wrote…

Given the higher cost of reprocessing the entire archive once the data is moved to cold storage, I thought the default behavior should be "yearly only" (not "full history" anymore). Reprocessing an entire datatype's history should be a deliberate choice added to the datatype's configuration.

What do you think?

Yes, it may be a safer option in the event of a configuration failure. YearlyOnly would depend on explicit configuration updates for our desired behavior. I think I'm responding to the slight oddness of three time intervals (full, yearly, daily) that are opt in, default, or opt in. My bias is towards making the default the largest or smallest interval. This may be more of an aesthetic consideration than a functional one (since all variations would do the same thing effectively). My comment is not blocking but I wanted to share what I was seeing.

@cristinaleonr cristinaleonr merged commit a6a2865 into main Feb 15, 2024
5 checks passed
@cristinaleonr cristinaleonr deleted the sandbox-cristinaleon-12-months branch February 15, 2024 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants