-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback wanted on rOpenSci standards for spatial statistical software #48
Comments
@mpadge Regarding spatial cross validation, you could add the following paper: https://www.sciencedirect.com/science/article/abs/pii/S0304380019303230 |
@mpadge great, a lot of work went into that! Reviewing it is also a lot of work; here's a first shot at it:
Again: great work, and happy to discuss! |
First and foremost, thanks for declaring that,
That is indeed a really good point, and the whole nomenclature of coordinate systems has been revised to be as consistent with these WKT2 definitions as possible. That has the primary effect of generally replacing the former "rectilinear" with "Cartesian". These standards nevertheless reflect a general purview of spatial software beyond "geodetic" systems (WKT2 terminology), such that curvilinear systems may include, for example, celestial coordinate systems, and even potentially extending to systems of unknown dimensionality. In contrast, WKT2 defines no "umbrella" term for such systems, and merely declares that elliptical coordinate systems are geodetic, while spherical systems are commonly associated with geodetic systems. In lieu of external reference, these standards employ the term "curvilinear", but please feel free to suggest alternatives.
The statement is that,
It can still be 3-D though (x,y,z), and is notably always so in celestial coordinate systems which are still spherical, so I presume that statement is okay?
Terminology updated as noted above, so identity of rectilinear with Cartesian is now explicit.
Point amended accordingly; thanks!
Happy to accept that, and so have replaced 2.0b (see below) with a new version stating this expectation.
Good point - reference to time explicitly included now, thanks!
Also a good point, and so SP2.0 has been revised to now state that all spatial software should use class systems, and SP2.0b has been effectively entirely removed, replaced with the new version given above.
Thanks - those examples have been removed now regardless, because the revision of SP2.0 described above states that software should error rather than warn.
Good point which made me realise that simply removing that point makes that paragraph clearer and more focussed. Thanks! (And as you well know, my
Thanks, amended accordingly.
Yeah, unclear terminology introduced by me in response to @Nowosad's comment that spatial packages should demonstrate loading data in "spatial data formats rather than R objects". Revised to now say packages should:
Feel free to suggest improvements!
Amended accordingly, thanks.
Sorry for lack of clarity - that was indeed what i was trying to convey, with updated version now stating:
Any further suggestions for improvement?
Yes, that is true. That was an extension of the equivalent time-series standard which is itself really only enabled because of the mighty
Thanks, that has been removed, also because revised general standards now include sufficient expectation of that kind of ability to accept class-based columns in general tabular inputs anyway, so should be covered there.
Thanks for the reference - that standard definitely "takes a model-based perspective," which that paper clarifies very well. I nevertheless think that some form of that standard should be retained, importantly noting that it only states that software "should enable sampling procedures to be based on ... densities. The paragraph immediately following that standard declares that,
An important link is again with time-series standards, for which covariance estimation in particular has seen great recent advances through routines for what is effectively density-based sampling of covariance ellipsoids, leading to the equivalent covariance standard being elevated into the General class. The sole motivation both of that one (G3.1) and SP3.2 is to (strongly) encourage developers to at least think about these issues. Alternative sampling procedures do not have to be implemented by default, and in the case of covariance, compliance with G3.1 merely requires that the function generating covariance estimates be minimally exposed as a parameter, so calculation methods can be substituted as desired by users. The standards attempt to avoid any recommendation of any particular methodological approach, and only take forms like these where we have perceived that the "default" behaviour (
Fair enough, that was quite proscriptive as formulated, and has been revised to,
This still presents one of the more proscriptive of all standards, but was, as with all other standards, primarily informed by empirical examinations of good and bad practices in actual software. It merely aims to discourage bad practices as much as possible, and to encourage developers to at least think of alternatives. Do you think it should just be removed?
Additional explanatory note added immediately afterwards with link to the definition of "broadcasting".
Having read the reference you included above, I see what you mean. From a design-based perspective, sampling test and training data from a common region may indeed be perfectly okay. I nevertheless think there is a role for a standard which encourages developers to think explicitly about the (spatial) relationships between test and training data sets, and to avoid potential pitfalls arising through exclusively generating the two as random samples from the same region. How about something like the following?
I have updated to that form for now in the hope that that allays some of your concerns, but would be open to any further feedback.
Updated accordingly; thanks.
Good examples, and so standard weakened to now state that software should,
That's a good point, but also if it is to be included an entirely general one that would belong in General Standards, yet even there, it would be hard to formulate it in a way that wasn't partly intended as a discouragement. Being so, I'll leave that for now, but feed free to PR anytime if you do see a need for such a statement. Thank you very much @edzer for this very insightful and helpful feedback! To help even further, could you now provide at least some brief responses to some of the direct questions i raised above. Thanks! And anybody else who happens to see this: Please also feel free to have a look over the standards and offer feedback. |
I would replace "defined on the basis of" by "represented merely by" (and drop "alone"), and add: "but use at least WKT2". In the 3.6 area I think you try to write standards about how to do research, not how to write software. I don't think that will be helpful when writing or reviewing software. Software can't tell what kind of sampling design underlies a dataset, if any. |
Thanks again, SP2.4 revised as you suggested. I appreciate your comments on SP3.6, and agree that this is more like standardising how to do research than how to write software. I suspect in response that we should remove that standard, but will first seek some broader feedback on that general issue of research versus software design, which raises an important general point. Thanks! |
rOpenSci is expanding its scope of peer-reviewed software to include specifically statistical software. One of the categories we will consider in-scope is spatial software, for which we have developed a suite of standards which we would like general feedback on. The standards can be viewed in the main project document. Please note the following:
This issue is accordingly intended to seek answers to the following primary question:
Please comment here; in rOpenSci's discussion forum, directly via issues in the project repository (like this one from @jsta), or - the best way of all!! - by helping directly with the standards via pull request to that repo (like this one from @Nowosad). Please also help by suggesting any other improvements you might see to any aspects of these standards, beyond the specific question of algorithms. Hoping for some constructive input from the
r-spatial
community!The text was updated successfully, but these errors were encountered: