Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lack of definitive Grid serialisation #226

Open
dafyddstephenson opened this issue Jan 24, 2025 · 12 comments
Open

Lack of definitive Grid serialisation #226

dafyddstephenson opened this issue Jan 24, 2025 · 12 comments

Comments

@dafyddstephenson
Copy link
Collaborator

In searching for a solution to CWorthy-ocean/C-Star#201 and CWorthy-ocean/C-Star#223 I've noticed that, in the situations described there, roms-tools does not offer a definitive serialization of the final grid object (the existence of which would solve both issues).

For example,

from datetime import datetime
from roms_tools import Grid, RiverForcing

start_time = datetime(1998, 1, 1)
end_time = datetime(2000, 12, 31)

grid = Grid(nx=100, ny=100, size_x=800, size_y=800, center_lon=-18, center_lat=65, rot=20)
grid.to_yaml("grid_initial.yaml")

river_forcing = RiverForcing(grid=grid,start_time=start_time,end_time=end_time)
river_forcing.to_yaml("river_forcing.yaml")

river_forcing.grid.to_yaml("grid_final.yaml")

We now have three yaml files, but in all three cases, Grid.from_yaml produces the same grid, lacking the river flux. I guess(?) the behaviour I would expect is that:

  • Grid.from_yaml("river_forcing.yaml") would use the fields of both entries to recreate the correct Grid in the same way RiverForcing.from_yaml does.
  • river_forcing.grid.to_yaml("grid_final.yaml") would produce a yaml file with sufficient information to independently recreate the river-informed grid

The issue goes a bit deeper:

from roms_tools import Grid, RiverForcing
G1 = Grid.from_yaml("grid_initial.yaml")
G2 = Grid.from_yaml("grid_final.yaml")
G3 = Grid.from_yaml("river_forcing.yaml")
G4 = RiverForcing.from_yaml("river_forcing.yaml").grid

# comparing with `roms-tools`:
print(G1 == G2) #returns True
print(G2 == G3) # returns True
print(G3 == G4) # returns True

# comparing with `xarray`:
print(G1.ds.equals(G2.ds))  # returns True
print(G2.ds.equals(G3.ds)) # returns True
print(G3.ds.equals(G4.ds)) # returns False
@NoraLoose
Copy link
Collaborator

The problem is that the Grid class does not know about RiverForcing. In fact, it cannot know about it, otherwise we have a circular dependency problem because RiverForcing depends on the Grid.

To me, this seems like an issue that is best solved on the UCLA-ROMS side. But I know it would be a hard sell.

@dafyddstephenson
Copy link
Collaborator Author

A quick and pretty unpleasant solution to

Grid.from_yaml("river_forcing.yaml") would use the fields of both entries to recreate the correct Grid in the same way RiverForcing.from_yaml does.

could be to take the right hand side of

G4 = RiverForcing.from_yaml("river_forcing.yaml").grid

and return it when Grid.from_yaml is called on a yaml file with river forcing in it?

While I totally agree that the ROMS side of this is... unintuitive and we should absolutely push for changes there, I still feel that the inconsistencies I ran into in the above snippets suggest that the circular dependency issue you're describing has been shifted rather than resolved, and this hints at a potential deeper issue in the current implementation

@NoraLoose
Copy link
Collaborator

A quick and pretty unpleasant solution to

Grid.from_yaml("river_forcing.yaml") would use the fields of both entries to recreate the correct Grid in the same way RiverForcing.from_yaml does.

could be to take the right hand side of

G4 = RiverForcing.from_yaml("river_forcing.yaml").grid

and return it when Grid.from_yaml is called on a yaml file with river forcing in it?

I'm not sure I understand...

If Grid.from_yaml has to call

RiverForcing.from_yaml("river_forcing.yaml").grid

then RiverForcing needs to be imported within the Grid class, so we have a circular dependency issue, correct?

@dafyddstephenson
Copy link
Collaborator Author

dafyddstephenson commented Jan 27, 2025

Yes, right - though I wouldn't suggest implementing things this "quick" way anyway as even if it worked it still wouldn't be addressing the underlying issue (as evidenced by the fact that it doesn't work).

I took a closer look this morning and remembered (perhaps incorrectly?) that the only information embedded into Grid.ds by RiverForcing is the locations, which (despite the weirdness of the actual implementation in ROMS), does strike me as a reasonable thing to have in the Grid.

Could we not initialise the Grid with any appropriate river locations? Is there any situation by which these locations change after the fact based on, e.g. the start_date and end_date or other parameters supplied to RiverForcing? I suppose source, but this again could be passed to Grid first and then would be implicit in the grid argument to RiverForcing

@NoraLoose
Copy link
Collaborator

I think it would be really counter-intuitive to having to provide a river dataset if one simply wants to create a grid.

I wonder how much work it would be to make a fix to ROMS, so that ROMS will look for the river_flux variable not in the grid file, but in an independent NetCDF file (possibly the river forcing file, or a third one).

@dafyddstephenson
Copy link
Collaborator Author

Afraid I disagree here, providing the locations of rivers to a grid doesn't feel any more counterintuitive to me than providing the locations of coastlines. The user isn't required to provide any temporally varying data like the actual river flux, just the locations, and most users (who don't have something against Dai & Trenberth) won't have to provide anything at all (perhaps just include_river_locations=True).

I'd even go as far as to say we should initialise the Grid with the river locations whether or not the user asks for it - I assume (though I'm happy to check) ROMS won't care that there are unused variables in the grid file. Advanced users could replace the source dataset, but most would go directly to RiverForcing(grid=my_grid, start_date=my_start_date, end_date=my_end_date) without even thinking about how the Grid was constructed.

We can mention it to CESR on Wednesday, but it will be a huge lift from them and I don't really see a problem with the above suggestion.

@NoraLoose
Copy link
Collaborator

Afraid I disagree here, providing the locations of rivers to a grid doesn't feel any more counterintuitive to me than providing the locations of coastlines.

Where does the user have to provide the locations of coastlines?

I'd even go as far as to say we should initialise the Grid with the river locations whether or not the user asks for it - I assume (though I'm happy to check) ROMS won't care that there are unused variables in the grid file.

What if we do that, and then the user creates their RiverForcing with their own river dataset (like Ulla does, for example)? Bang, we have inconsistencies between the river locations in the grid file (based on Dai and Trenberth) and in the river forcing file (based on user-specified rivers). Very error-prone... I think it would be the least error-prone to have all this information in a single NetCDF file and create the information at the same time.

We can mention it to CESR on Wednesday, but it will be a huge lift from them and I don't really see a problem with the above suggestion.

I don't see how it would be a big lift. The code could first look for river_flux in the river forcing file, and if it doesn't find it there in the grid file. (Or vice versa.) This solution would work for both the MATLAB and python ROMS-Tools. I'm happy to put in a PR.

@dafyddstephenson
Copy link
Collaborator Author

dafyddstephenson commented Jan 27, 2025

Where does the user have to provide the locations of coastlines?

via topography_source. Perhaps "coastlines" was bad phrasing, but there is an analogous existing situation in Grid.

What if we do that, and then the user creates their RiverForcing with their own river dataset (like Ulla does, for example)?

This is again an example of an advanced user, who would provide their river locations to the Grid at initialisation via river_source, alongside their topography_source. I don't see how your example of unintentionally using Dai & Trenberth and running ROMS with the wrong rivers is different from saying they could unintentionally use ETOPO5 and run ROMS with the wrong topography.

I'm happy to put in a PR.

If you'd be happy to work on that PR to ROMS, that would be amazing - the huge lift isn't so much the amount of work, but convincing somebody to do it.

In either case, I believe we should enshrine something that safeguards against what has happened here: the Grid object serialised by to_yaml in all 4 cases is not a representation of the Grid object that is actually required by the user.

@NoraLoose
Copy link
Collaborator

I don't see how your example of unintentionally using Dai & Trenberth and running ROMS with the wrong rivers is different from saying they could unintentionally use ETOPO5 and run ROMS with the wrong topography.

The difference is that ETOPO5 is still a valid topography. (If you think we should force the user to specify what topography they want, let's talk about that in a separate issue.) In contrast, if the Amazon suddenly flows out of Africa's west coast, that would result in a very wrong circulation.

I am currently pretty slammed with working on multiple [C]Worthy projects, but I will work on the PR to ROMS as soon as I have time.

In either case, I believe we should enshrine something that safeguards against what has happened here: the Grid object serialised by to_yaml in all 4 cases is not a representation of the Grid object that is actually required by the user.

Making the changes on the ROMS side is the only sensible way to fix this problem IMO.

@dafyddstephenson
Copy link
Collaborator Author

Opened CESR-lab/ucla-roms#52

@NoraLoose
Copy link
Collaborator

NoraLoose commented Feb 6, 2025

@dafyddstephenson This issue can probably be closed because:

Sounds good?

@dafyddstephenson
Copy link
Collaborator Author

Sounds good - I think I'd still rather wait to see if any niggles emerge during the upstream ROMS changes and subsequent testing of roms-tools, but that should turn around pretty quickly. I appreciate that's related to rivers rather than grid, but I'm hoping to avoid a situation where we didn't account for something and end up back at square 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants