[Issue #3899] Task to Add new record into OpportunityVersion table #4061

babebe · 2025-02-28T21:33:00Z

Summary

Fixes #{3899}

Time to review: 15 mins

Changes proposed

StoreOpportunityVersionTask has been implemented to call saved_opportunity_version whenever there’s an update in the OpportunityChangeAudit table for any opportunity since the last task run. Added as a 4th step to the load_transform job.
The saved_opportunity_version utility has been updated to invoke the diff_nested_dicts function. If there’s a difference between the latest saved opportunity version and the current opportunity, a new record is created in the OpportunityVersion table.
Added JobLogFactory
Added tests.

chouinar · 2025-03-03T14:38:16Z

api/src/task/opportunities/store_opportunity_version_task.py

+                select(Opportunity).where(Opportunity.opportunity_id == opp.opportunity_id)
+            ).scalar_one()
+            # Get Json
+            opportunity_v1 = SCHEMA.dump(opportunity)


You made a utility for loading these records into that table, we should be using that as much as possible - I know it's used below, but re-converting something into a schema multiple times is a bit wasteful.

We'd likely need to add logic to that method of "only add if there is a diff", but we should keep that contained in one place.

Moved the logic to check for detecting diffs to save_opportunity_version function. And the task only checks if there is a change in the OpportunityChangeAudit table.

chouinar · 2025-03-03T14:42:13Z

api/src/task/opportunities/store_opportunity_version_task.py

+            select(JobLog)
+            .where(JobLog.job_type == self.cls_name())
+            .where(
+                or_(JobLog.job_status == JobStatus.COMPLETED, JobLog.job_status == JobStatus.FAILED)


If the job failed, we probably don't want to count it.

chouinar · 2025-03-03T14:42:56Z

api/src/task/opportunities/store_opportunity_version_task.py

+        latest_time = (
+            latest_job.created_at
+            if latest_job
+            else get_now_us_eastern_datetime() - timedelta(hours=24)


DB values are stored in UTC, use utcnow whenever you do a comparison with now against something in the DB.

I think if there hasn't been a prior job, we should actually just grab everything (ie. pick a datetime of like 1970-01-01) since we want to backfill these records anyways.

chouinar · 2025-03-03T14:46:25Z

api/src/task/opportunities/store_opportunity_version_task.py

+        updated_opportunities = self.db_session.scalars(
+            select(OpportunityChangeAudit).where(OpportunityChangeAudit.updated_at > latest_time)
+        ).all()
+
+        for opp in updated_opportunities:
+            # Get Opportunity object
+            opportunity = self.db_session.execute(
+                select(Opportunity).where(Opportunity.opportunity_id == opp.opportunity_id)
+            ).scalar_one()
+            # Get Json


No need to write a query when the relationship already is that:

Suggested change

updated_opportunities = self.db_session.scalars(

select(OpportunityChangeAudit).where(OpportunityChangeAudit.updated_at > latest_time)

).all()

for opp in updated_opportunities:

# Get Opportunity object

opportunity = self.db_session.execute(

select(Opportunity).where(Opportunity.opportunity_id == opp.opportunity_id)

).scalar_one()

# Get Json

opportunity_change_audits = self.db_session.scalars(

select(OpportunityChangeAudit).where(OpportunityChangeAudit.updated_at > latest_time)

).all()

for opp_change_audit in opportunity_change_audits:

opportunity = opp_change_audit.opportunity

That's right! Updated!

chouinar · 2025-03-03T14:49:59Z

api/src/task/opportunities/store_opportunity_version_task.py

+            latest_versioned_opp = self.db_session.execute(
+                select(OpportunityVersion).where(
+                    OpportunityVersion.opportunity_id == opp.opportunity_id
+                )
+            ).scalar_one_or_none()


What happens if there are multiple versions?

Ordered by created_at desc

… 3899/task-add-opp-version-record

chouinar · 2025-03-06T21:16:10Z

api/src/services/opportunities_v1/opportunity_version.py

+    latest_opp_version = db_session.execute(
+        select(OpportunityVersion)
+        .where(OpportunityVersion.opportunity_id == opportunity.opportunity_id)
+        .order_by(OpportunityVersion.created_at.desc())
+    ).scalar_one_or_none()


One additional recommendation:

Suggested change

latest_opp_version = db_session.execute(

select(OpportunityVersion)

.where(OpportunityVersion.opportunity_id == opportunity.opportunity_id)

.order_by(OpportunityVersion.created_at.desc())

).scalar_one_or_none()

latest_opp_version = db_session.execute(

select(OpportunityVersion)

.where(OpportunityVersion.opportunity_id == opportunity.opportunity_id)

.order_by(OpportunityVersion.created_at.desc())

.options(selectinload("*"))

).scalar_one_or_none()

Won't have any obvious affect on how the code works, but makes it so SQLAlchemy will select all parts of an opportunity in one set of queries rather than lazy-loading every relationship which each individually fire a select query off to the database. When we have a job that'll iterate over tens of thousands of records, it'll make it much faster (I tested adding it with set-current-opportunities and I want to say it was 70x faster?)

Yes indeed! Updated , thanks

chouinar · 2025-03-06T21:19:46Z

api/src/services/opportunities_v1/opportunity_version.py

+    opportunity_existing = latest_opp_version.opportunity_data if latest_opp_version else {}
+
+    diffs = diff_nested_dicts(opportunity_new, opportunity_existing)
+    if diffs:
+        # Add new OpportunityVersion instance to the database session
+        opportunity_version = OpportunityVersion(
+            opportunity_id=opportunity.opportunity_id,
+            opportunity_data=opportunity_new,
+        )
+
+        db_session.add(opportunity_version)


Can put a small optimization/short circuit in the case where there is no existing since there will always be a diff.

Something like:

Suggested change

opportunity_existing = latest_opp_version.opportunity_data if latest_opp_version else {}

diffs = diff_nested_dicts(opportunity_new, opportunity_existing)

if diffs:

# Add new OpportunityVersion instance to the database session

opportunity_version = OpportunityVersion(

opportunity_id=opportunity.opportunity_id,

opportunity_data=opportunity_new,

)

db_session.add(opportunity_version)

diffs = {}

if latest_opp_version:

diffs = diff_nested_dicts(opportunity_new, latest_opp_version.opportunity_data)

if diffs or latest_opp_version is None:

# Add new OpportunityVersion instance to the database session

opportunity_version = OpportunityVersion(

opportunity_id=opportunity.opportunity_id,

opportunity_data=opportunity_new,

)

db_session.add(opportunity_version)

chouinar · 2025-03-06T21:21:00Z

api/src/task/opportunities/store_opportunity_version_task.py

+@task_blueprint.cli.command(
+    "store-opportunity-version",
+    help="Store a new opportunity version if an opportunity has been updated",
+)
+@flask_db.with_db_session()
+def store_opportunity_version(db_session: db.Session) -> None:
+    StoreOpportunityVersionTask(db_session).run()


We don't need a way to run this as a task directly, we want to add it as a 4th step of the existing transform jobs: https://github.com/HHS/simpler-grants-gov/blob/main/api/src/data_migration/command/load_transform.py

chouinar · 2025-03-06T21:21:30Z

api/src/task/opportunities/store_opportunity_version_task.py

+    def __init__(self, db_session: db.Session) -> None:
+        super().__init__(db_session)


If we're not changing the init function, can exclude it and just use the one from task

Suggested change

def __init__(self, db_session: db.Session) -> None:

super().__init__(db_session)

chouinar · 2025-03-06T21:23:42Z

api/src/task/opportunities/store_opportunity_version_task.py

+        updated_opportunities_change_audit = self.db_session.scalars(
+            select(OpportunityChangeAudit).where(OpportunityChangeAudit.updated_at > latest_time)
+        ).all()
+
+        for oca in updated_opportunities_change_audit:
+            # Get Opportunity object
+            opportunity = self.db_session.execute(
+                select(Opportunity).where(Opportunity.opportunity_id == oca.opportunity_id)
+            ).scalar_one()


We don't need to query for the opportunity separately - the change audit object has a relationship to it. Also like I commented elsewhere, performance will be a lot better adding the selectinload bit.

Suggested change

updated_opportunities_change_audit = self.db_session.scalars(

select(OpportunityChangeAudit).where(OpportunityChangeAudit.updated_at > latest_time)

).all()

for oca in updated_opportunities_change_audit:

# Get Opportunity object

opportunity = self.db_session.execute(

select(Opportunity).where(Opportunity.opportunity_id == oca.opportunity_id)

).scalar_one()

updated_opportunities_change_audit = self.db_session.scalars(

select(OpportunityChangeAudit)

.where(OpportunityChangeAudit.updated_at > latest_time)

.options(selectinload("*"))

).all()

for oca in updated_opportunities_change_audit:

opportunity = oca.opportunity

chouinar · 2025-03-06T21:24:26Z

api/src/task/opportunities/store_opportunity_version_task.py

+    StoreOpportunityVersionTask(db_session).run()
+
+
+SCHEMA = OpportunityV1Schema()


Don't think this is used here anymore

Suggested change

SCHEMA = OpportunityV1Schema()

chouinar · 2025-03-06T21:25:48Z

api/tests/src/db/models/factories.py

+        model = task_models.JobLog
+
+    job_id = Generators.UuidObj
+    job_type = factory.LazyAttribute(


This is the sort of field that if you want to use a factory for it, you should pass the value in yourself since a random value is going to be weird with tests (since you usually care a lot about the exact value).

removed. Can optionally do a validation?

chouinar · 2025-03-06T21:26:22Z

api/tests/src/db/models/factories.py

+    job_type = factory.LazyAttribute(
+        lambda _: random.choice(["StoreOpportunityVersionTask", "SetCurrentOpportunitiesTask"])
+    )
+    job_status = factory.Iterator(JobStatus)


I'd probably say to just make this always the success value since most tests would only care about that (or want to override it to started/failed).

chouinar · 2025-03-06T21:27:08Z

api/src/task/opportunities/store_opportunity_version_task.py

+        latest_job = self.db_session.scalars(
+            select(JobLog)
+            .where(JobLog.job_type == self.cls_name())
+            .where(or_(JobLog.job_status == JobStatus.COMPLETED))


Suggested change

.where(or_(JobLog.job_status == JobStatus.COMPLETED))

.where(JobLog.job_status == JobStatus.COMPLETED)

… 3899/task-add-opp-version-record

chouinar · 2025-03-07T17:37:57Z

api/src/data_migration/command/load_transform.py

+@click.option(
+    "--store-version/--no-store-version", default=True, help="run StoreOpportunityVersionTask"
+)


Let us default this to not being enabled, I think we should test the behavior out manually first to be safe.

chouinar · 2025-03-07T17:42:08Z

api/tests/src/task/opportunities/test_store_opportunity_version_task.py

+    @pytest.fixture(autouse=True)
+    def clear_db(self, db_session):
+        opportunities = db_session.query(Opportunity).all()
+        for opp in opportunities:
+            db_session.delete(opp)
+
+        db_session.execute(delete(OpportunityVersion))
+        db_session.execute(delete(OpportunityChangeAudit))


Minor nitpick, technically those two deletes do nothing because they had to have been deleted for the opportunity deletes to work (the cascade="all, delete-orphan" on the relationships tells SQLAlchemy to cleanup orphaned records if the opportunity itself is deleted).

Suggested change

@pytest.fixture(autouse=True)

def clear_db(self, db_session):

opportunities = db_session.query(Opportunity).all()

for opp in opportunities:

db_session.delete(opp)

db_session.execute(delete(OpportunityVersion))

db_session.execute(delete(OpportunityChangeAudit))

@pytest.fixture(autouse=True)

def clear_db(self, db_session):

opportunities = db_session.query(Opportunity).all()

for opp in opportunities:

db_session.delete(opp)

Thats right ! Deleted

chouinar · 2025-03-07T17:43:55Z

api/src/task/opportunities/store_opportunity_version_task.py

+        for opp_change_audit in opportunity_change_audits:
+            opportunity = opp_change_audit.opportunity
+
+            # Store to OpportunityVersion table
+            save_opportunity_version(self.db_session, opportunity)


Something I hadn't thought of until now, but has been briefly mentioned before - if an opportunity is a draft, we don't want to store anything for it yet in the version table.

I think it would make more sense to put that check in the save_opportunity_version function so when we use it elsewhere, we don't need to implement that twice, but I'm not 100% on that behavior.

Should be just a simple if statement, but apologies for not mentioning it sooner.

NP, added check.

chouinar · 2025-03-07T17:45:03Z

api/src/util/dict_util.py

@@ -74,5 +74,7 @@ def diff_nested_dicts(dict1: dict, dict2: dict) -> list:

 def _convert_iterables_to_set(data: Any) -> Any:
    if isinstance(data, (list, tuple)):
+        if data and isinstance(data[0], (dict, list)):  # test with applicant typen


Should this comment still be here?

Nope. Removed

… 3899/task-add-opp-version-record

chouinar

Two minor nitpicks, LGTM otherwise

chouinar · 2025-03-10T16:23:03Z

api/src/data_migration/command/load_transform.py

+@click.option(
+    "--store-version/--no-store-version", default=False, help="run StoreOpportunityVersionTask"
+)


Minor nit - put this option between the set-current and insert-chunk-size options. Since these options are big "turn on/off parts of the job" and the other options are configurational.

chouinar · 2025-03-10T16:24:15Z

api/src/services/opportunities_v1/opportunity_version.py


-    schema_data = SCHEMA.dump(opportunity)
+    if not opportunity.is_draft:


Nit - flip the logic here so the whole function doesn't need an indentation level

Suggested change

if not opportunity.is_draft:

if opportunity.is_draft:

return

...

babebe added 3 commits February 28, 2025 16:30

task to store new opp version

a745647

rename

8308475

test wip

e4373bd

babebe added the draft Not yet ready for review label Feb 28, 2025

babebe requested review from chouinar, freddieyebra and mdragon as code owners February 28, 2025 21:33

babebe added 4 commits February 28, 2025 17:00

update test

05e92e3

add extra loggic for dicts and lists

0c96aa5

update test

e9f05d5

cleanup

cff9f69

chouinar reviewed Mar 3, 2025

View reviewed changes

babebe added 9 commits March 6, 2025 09:17

update tests

27f8cea

update naming

6dace5d

add joblog factory

023ad42

update dict util

5362c5f

update test

e0b88a5

cleanup

1b907d5

mv diff loggic to save_opportunity

4053ccd

mv diff loggic to save_opportunity

17feed2

Merge branch 'main' of https://github.com/HHS/simpler-grants-gov into…

6777a73

… 3899/task-add-opp-version-record

babebe requested a review from chouinar March 6, 2025 18:35

babebe removed the draft Not yet ready for review label Mar 6, 2025

babebe linked an issue Mar 6, 2025 that may be closed by this pull request

Create a task that adds a version to the opportunity version table after the transformations run #3899

Open

2 tasks

chouinar reviewed Mar 6, 2025

View reviewed changes

babebe added 3 commits March 6, 2025 16:51

mv diff loggic to save_opportunity

1628ca4

cleanup

6076e1d

Merge branch 'main' of https://github.com/HHS/simpler-grants-gov into…

1d43926

… 3899/task-add-opp-version-record

babebe requested a review from chouinar March 7, 2025 15:43

chouinar reviewed Mar 7, 2025

View reviewed changes

babebe added 4 commits March 7, 2025 13:11

don't save if opp draft and cleanup

b04eb88

Merge branch 'main' of https://github.com/HHS/simpler-grants-gov into…

f9266ed

… 3899/task-add-opp-version-record

add test

5fe7c77

update test

9bf6add

babebe requested a review from chouinar March 7, 2025 19:37

Merge branch 'main' of https://github.com/HHS/simpler-grants-gov into…

3c47e40

… 3899/task-add-opp-version-record

chouinar reviewed Mar 10, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue #3899] Task to Add new record into OpportunityVersion table #4061

[Issue #3899] Task to Add new record into OpportunityVersion table #4061

babebe commented Feb 28, 2025 •

edited

Loading

chouinar Mar 3, 2025

babebe Mar 6, 2025

chouinar Mar 3, 2025

chouinar Mar 3, 2025

chouinar Mar 3, 2025

babebe Mar 6, 2025

chouinar Mar 3, 2025

babebe Mar 6, 2025

chouinar Mar 6, 2025

babebe Mar 6, 2025

chouinar Mar 6, 2025

chouinar Mar 6, 2025

chouinar Mar 6, 2025

babebe Mar 6, 2025

chouinar Mar 6, 2025

chouinar Mar 6, 2025

chouinar Mar 6, 2025

babebe Mar 6, 2025 •

edited

Loading

chouinar Mar 6, 2025

chouinar Mar 6, 2025

chouinar Mar 7, 2025

chouinar Mar 7, 2025

babebe Mar 7, 2025

chouinar Mar 7, 2025

babebe Mar 7, 2025

chouinar Mar 7, 2025

babebe Mar 7, 2025

chouinar left a comment

chouinar Mar 10, 2025

chouinar Mar 10, 2025

		def __init__(self, db_session: db.Session) -> None:
		super().__init__(db_session)

		StoreOpportunityVersionTask(db_session).run()


		SCHEMA = OpportunityV1Schema()

	.where(or_(JobLog.job_status == JobStatus.COMPLETED))
	.where(JobLog.job_status == JobStatus.COMPLETED)


		schema_data = SCHEMA.dump(opportunity)
		if not opportunity.is_draft:

[Issue #3899] Task to Add new record into OpportunityVersion table #4061

Are you sure you want to change the base?

[Issue #3899] Task to Add new record into OpportunityVersion table #4061

Conversation

babebe commented Feb 28, 2025 • edited Loading

Summary

Time to review: 15 mins

Changes proposed

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

babebe Mar 6, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chouinar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

babebe commented Feb 28, 2025 •

edited

Loading

babebe Mar 6, 2025 •

edited

Loading