-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Boyce index code #100
base: main
Are you sure you want to change the base?
Boyce index code #100
Conversation
…ated gitignore, and updated __init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for submitting this PR, @PC-FSU. Including this metric is a nice addition to the package. I have a few comments and requests.
- Thank you very much for the detailed docstring and jupyter examples.
- Thank you very very much for including tests.
- Please apply proper code formatting. To do this, review the contributing guidelines and set up a dev environment with
pre-commit
installed. Then runpre-commit run --all
to apply formatting. - To simplify the code, and to better align with other
sklearn
metrics, I'd recommend breaking theboyce_index
function into smaller components. One to compute the intervals, one for the p/e ratio, one for plotting, and one for returning the correlation coefficient. As a user, I'd expect a single value returned from the boyce_index function. - I haven't had much time to evaluate the scientific merit of the code yet, so I'll likely provide another review after addressing the key software comments I've made. But what strikes me at the moment is that:
- The relationships between
nclass
,window
andres
are not super clear to me, and the results appear very sensitive to these parameters. - It is very easy to return
nan
values, despite the manynan
checks that are applied. This makes me think that something is not sufficiently robust.
- The relationships between
Thanks for your submission, and I'll look forward to your updates.
elapid/evaluate.py
Outdated
# implement Boyce index as describe in https://www.whoi.edu/cms/files/hirzel_etal_2006_53457.pdf (Eq.4) | ||
|
||
|
||
def boycei(interval, obs, fit): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i might prefer renaming these functions boyce_index
and continuous_boyce_index
to differentiate (boycei/boyce_index are easily confused)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as per suggestion.
elapid/evaluate.py
Outdated
Args: | ||
interval (tuple or list): Two elements representing the lower and upper bounds of the interval. | ||
obs (numpy.ndarray): Observed suitability values (i.e., predictions at presence points). | ||
fit (numpy.ndarray): Suitability values (e.g., from a raster), i.e., predictions at presence + background points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to better align with sklearn
api design, prefer renaming variables and order them as boyce_index(yobs, ypred, interval)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as per suggestion.
elapid/evaluate.py
Outdated
return fi | ||
|
||
|
||
def boyce_index(fit, obs, nclass=0, window="default", res=100, PEplot=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename and reorder fit, obs as yobs, ypred
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as per suggestion.
elapid/evaluate.py
Outdated
|
||
|
||
# Remove NaNs from fit | ||
fit = fit[~np.isnan(fit)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove duplicate code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done as per suggestion.
elapid/evaluate.py
Outdated
print(vec_mov) | ||
print(intervals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove debug print statements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed.
elapid/evaluate.py
Outdated
vec_mov = np.linspace(mini, maxi, num=nclass + 1) | ||
intervals = np.column_stack((vec_mov[:-1], vec_mov[1:])) | ||
else: | ||
raise ValueError("Invalid nclass value.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a comment or two in this section would be useful to elucidate the different methods for computing the intervals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored the code. Intervals are now calculated in a new function, and refactored the code, turned out res was not required at all. Updated the name of the argument to more logical names.
elapid/evaluate.py
Outdated
corr, _ = spearmanr(f_valid, intervals_mid) | ||
|
||
|
||
if PEplot: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep plotting as a separate function, which makes for cleaner code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
elapid/evaluate.py
Outdated
|
||
|
||
# Remove NaNs | ||
valid = ~np.isnan(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there seems to be a lot of nan
checking. if the nans are removed from the initial arrays, what would lead to 'invalid' values? are we sure this isn't covering up some other issue in the calculations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove extra nan checking.
elapid/evaluate.py
Outdated
|
||
results = { | ||
'F.ratio': f, | ||
'Spearman.cor': round(corr, 3) if not np.isnan(corr) else np.nan, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another suspicious nan check here. also, it seems unnecessary to round the correlation coefficient.
elapid/evaluate.py
Outdated
predicted = np.array([0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]) | ||
|
||
# Observed presence suitability scores (e.g., predictions at presence points) | ||
observed = np.array([0.3, 0.7, 0.8, 0.9]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on this example, it's not clear to me why users would want to pass in arrays of different lengths (where predicted and observed are not matching: one is presence+background, the other is just presence).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I misinterpreted the equation. For P/E ratio, you need the predicted frequency
for each habitat suitability region and need only presence-only data. Expected frequency
is calculated using the random distribution only. i.e at background points and doesn't required prediction at presence data. I changed the code and docs to reflect the same.
…e made at presence and background points, not presence and combined presence + background (Thanks for pointing that out, I misinterpreted the paper). Removed redundant NaN checks. Updated test cases, and example notebook.
Any updates? |
PR Description: Add Continuous Boyce Index Calculation and Test Cases
Summary:
In this pull request, I have added functionality to calculate the continuous Boyce index as described in Hirzel et al. (2006) . This method provides a reliable way to evaluate habitat suitability models, specifically for presence-only data. Along with the implementation, I have also added test cases to ensure the correctness and robustness of the new function.
Key Updates:
Boyce Index Calculation:
Test Cases:
Notebook Update:
WorkingWithGeospatialData.ipynb
to include a detailed example demonstrating how to use the continuous Boyce index function.Testing:
The test cases ensure that the continuous Boyce index function works as expected. The test cases cover:
This PR enhances the project by providing a robust and well-tested method to evaluate habitat suitability models using presence-only data, with clear examples in the updated notebook.