Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

align column headers with SS3/r4ss and/or within the package? #164

Open
iantaylor-NOAA opened this issue Nov 22, 2024 · 7 comments
Open
Assignees
Labels
enhancement priority_low Priority scale is low, i.e., does not need to immediately be worked on question

Comments

@iantaylor-NOAA
Copy link
Contributor

@chantelwetzel-noaa, I was working on a scripted workflow yesterday to modify SS3 input files with output from {nwfscSurvey} and found it to be really easy, which is awesome.

However, there were a few differences in the column headers created by get_expanded_comps() that required modification to work (see my in-process code at https://github.com/dgbolser/Resample-survey-data/blob/add-to-assessments/resample_survey_bio_data.R#L54-L56). In particular, the nwfscSurvey uses "partition" instead of "part", "input_n" instead of "Nsamp", and capital letters for the observed data (e.g. "F12" instead of "f12"). It only took two lines of code to resolve that difference, but I'm curious if you think it's worth considering making a change to avoid the difference. Given the slower pace of SS3 releases and the disruption already caused by the previous round of r4ss changes (see PIFSCstockassessments/ss3diags#109), I would rather not make the changes on that side even if the names used here are better. We can also just live with the difference.

The nwfscSurvey::SurveyAgeAtLen.fn() is further away from SS3/r4ss as shown in the table below.

Tagging @kellijohnson-NOAA in case there's any value in thinking about this within her ongoing changes to {PacFIN.Utilities}.

It's also OK to not deal with this right now (or ever) and continue to work around the differences.

SS3/r4ss SurveyAgeAtLen.fn()
year year
month month
fleet Fleet
sex sex
part partition
ageerr ageErr
Lbin_lo LbinLo
Lbin_hi LbinHi
Nsamp nSamps
f1 F1
f2 F2
... ...
m1 F1.1
m2 F2.1
... ...
@iantaylor-NOAA iantaylor-NOAA added enhancement question priority_low Priority scale is low, i.e., does not need to immediately be worked on labels Nov 22, 2024
@chantelwetzel-noaa
Copy link
Contributor

I think switching to capital letters for the sex labeling of the composition bin is fine if we all agree the ideal naming structure should use capital letters. I will note that @kellijohnson-NOAA and I spoke about whether the input sample size column should be named Nsamp or input_n and we both agreed that the latter was the preferred term since it is more clear about what that column represents (input sample size). Similarly we can discuss whether the partition column should be part or partition. Currently I prefer the latter since it is very clear, but I could easily be convinced to move to the short column name.

Once we do decide on the best labeling for column names, we should work to ensure that each package the provides processed composition data are in agreement ({PacFIN.Utilities}, {nwfscSurvey}, {nwfscDiscard}).

@iantaylor-NOAA
Copy link
Contributor Author

Given that the choice of input_n was deliberate, I would support keeping it in place along with partition (which is clearer than part) and just requiring something like dplyr::rename(part = "partition", Nsamp = "input_n").

I like the plan to ensure data in agreement within and among the three pfmc-assessments packages and am happy to help with that effort.

@iantaylor-NOAA
Copy link
Contributor Author

I should have added that SS3 doesn't care about column names in input files so it would also be possible to go the other direction and modify the objects created by r4ss::SS_read() via dplyr::rename(partition = "part", input_n = "Nsamp") to achieve alignment and allow the dataframes to be combined.

@kellijohnson-NOAA
Copy link
Contributor

Just let me know what you want them to be and I will accommodate. I hope to have this done by EOD but I am going to take a break for a bit.

@iantaylor-NOAA
Copy link
Contributor Author

@kellijohnson-NOAA, thanks for being so accommodating and getting the changes done so quickly.

@chantelwetzel-noaa, what do you think of the "compromise" column below that I've just added to the previous below as a potential standard for the pfmc-assessments packages?

The one capital "L" in Lbin_lo stands out to me but I think lbin_lo could be mistaken for Ibin_lo (maybe Lbin_Lo would have been best in hindsight), but it's not worth adding additional differences from SS3.

Regarding the data bins, even if it's female only data, I think having "m" columns for the second group makes more sense than repeating the "f" throughout (or "m" in both blocks for male-only data). For single-sex models with only a single block of data bins, r4ss::SS_readdat_3.30() uses l or a (for length or age bins) instead of f and then m. But since we very rarely use the pfmc-assessment packages for single-sex models I'm not sure it's worth worrying about or aligning the notation in that case.

SS3/r4ss SurveyAgeAtLen.fn() compromise
year year year
month month month
fleet Fleet fleet
sex sex sex
part partition partition
ageerr ageErr ageerr
Lbin_lo LbinLo Lbin_lo
Lbin_hi LbinHi Lbin_hi
Nsamp nSamps input_n
f1 F1 f1
f2 F2 f2
... ... ...
m1 F1.1 m1
m2 F2.1 m2
... ... ...

@chantelwetzel-noaa
Copy link
Contributor

Could compositions for single-sexed models use "u" instead of "l" or "a"? I think this would be more consistent but I don't know if that would create downstream impacts. I will note that {plot_comps} looks for either a f, m, or u before the composition bin number but this function could be easily revised. Otherwise I am okay with your proposed column naming. I agree that capitalizing just Lbin_hi and Lbin_lo does look strange but I understand your concern about it being easily misread if lower case l was used.

@iantaylor-NOAA
Copy link
Contributor Author

@chantelwetzel-noaa, good idea. Using "u" for single-sex models makes sense to me. Here's the resulting plan, please edit if you see something wrong.

lengths ages single-sex (ages)
year year year
month month month
fleet fleet fleet
sex sex sex
partition partition partition
  ageerr ageerr
  Lbin_lo Lbin_lo
  Lbin_hi Lbin_hi
input_n input_n input_n
f1 f1 u1
f2 f2 u2
... ... ...
m1 m1  
m2 m2  
... ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement priority_low Priority scale is low, i.e., does not need to immediately be worked on question
Projects
None yet
Development

No branches or pull requests

3 participants