-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bias st.datetimes()
towards bug-revealing values such as DST transitions and other oddities
#69
Comments
As an aside, here's a great write-up about errors when dealing with time. |
Forgot to say at the time: Thanks for bringing this up. As per discussion elsewhere, it'll be a little while before I get to this, but I'm definitely going to look into things along these lines. |
You're welcome. I'll try to contribute some code when I get the chance, too. I was just looking at the leap-second issue this morning, and it looks fairly easy to implement. A github version of the time zone database and code can be found here: https://github.com/eggert/tz |
@pganssle (dateutil maintainer) and I met at Pylondinium and designed a general way to generate nasty datetimes, which works with any source of timezones ( Given two naive datetimes as bounds - defaulting to the representable limit if not passed explicitly - and a strategy for The current algorithm:
Proposed, "nasty datetimes", algorithm:
Note that this should work with any source of timezones, any bounds the user might choose, and have small overhead in any case where there are no nasty datetimes to find (and 'reasonable' overhead when it does). If you want to work on this issue, you're entirely welcome - but please contact me first so we avoid can overlapping work or comms and you can ask questions as you go. |
From some minor thinking about it, these are the classes of "nasty" time zones that I think we should try to find for each time zone:
When writing tests for this kind of thing, I tend to hand-select zones that match these edge cases preferably in both the Northern and Southern hemisphere, though I've never found any where this has bitten me. |
Based on a proof-of-concept Paul wrote at PyCon: 27df358. Now we "just" need to implement the search functionality I described last year! More broadly, I'm thinking that there are actually two distinct things we might want to generate here: Datetimes that are "weird" in isolation; i.e. exhibit an unusual property listed on the Transitions between datetimes. I'm not sure how to to take advantage of this when generating single datetimes, but it would be nice if there was a way to do so... I think this is properly a follow-up issue once we get the first stage working. |
More notes: anyone thinking about leap seconds should go look at @aarchiba's work in nanograv/PINT#549, which includes some lovely new strategies as well as thoughtful consideration of "leap smear". Unfortunately we can't represent leap seconds with Python's |
The documented contract of this function is that it is the inverse of `datetime.isoformat()`. See GH HypothesisWorks/hypothesis#1401, HypothesisWorks/hypothesis#69 and to a lesser extent HypothesisWorks/hypothesis#2273 for background on why I have set max_examples so high.
The documented contract of this function is that it is the inverse of `datetime.isoformat()`. See GH HypothesisWorks/hypothesis#1401, HypothesisWorks/hypothesis#69 and to a lesser extent HypothesisWorks/hypothesis#2273 for background on why I have set max_examples so high.
what about using zoneinfo? |
It's got better handling of the |
st.datetimes()
towards bug-revealing values such as DST transitions and other oddities
After spending some more time looking into leap seconds, I now think that handling them at all is so rare in Python that biasing towards them is unlikely to be a net improvement in bug-finding power outside of literally astronomical timekeeping code. If anyone working on that is interested in picking it up, here's some code to get the UTC datetimes of each leap second; you could then sample one and add a random diff from (0, +/- <+1s, +/- 12h). I'd sketched plans to integrate that into diff --git a/hypothesis-python/setup.py b/hypothesis-python/setup.py
index 14c24b9ef..8c96d9dd8 100644
--- a/hypothesis-python/setup.py
+++ b/hypothesis-python/setup.py
@@ -82,7 +82,13 @@ setuptools.setup(
author_email="[email protected]",
packages=setuptools.find_packages(SOURCE),
package_dir={"": SOURCE},
- package_data={"hypothesis": ["py.typed", "vendor/tlds-alpha-by-domain.txt"]},
+ package_data={"hypothesis": ["py.typed", "vendor/leap-seconds.list", "vendor/tlds-alpha-by-domain.txt"]},
url="https://hypothesis.works",
project_urls={
"Source": "https://github.com/HypothesisWorks/hypothesis/tree/master/hypothesis-python",
diff --git a/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py b/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py
index 581b3ac3c..5710fabdc 100644
--- a/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py
+++ b/hypothesis-python/src/hypothesis/strategies/_internal/datetime.py
@@ -34,6 +34,18 @@ def is_pytz_timezone(tz):
return module == "pytz" or module.startswith("pytz.")
+@lru_cache(maxsize=1)
+def get_leap_seconds() -> tuple[dt.datetime, ...]:
+ """Return a list of UTC datetimes corresponding to each leap second."""
+ traversable = resources.files("hypothesis.vendor") / "tlds-alpha-by-domain.txt"
+ epoch = dt.datetime(1900, 1, 1, tzinfo=dt.timezone.utc)
+ return tuple(
+ epoch + dt.timedelta(seconds=int(line.split()[0]))
+ for line in traversable.read_text(encoding="utf-8").splitlines()
+ if not line.startswith("#")
+ )
+
+
def replace_tzinfo(value, timezone):
if is_pytz_timezone(timezone):
# Pytz timezones are a little complicated, and using the .replace method
diff --git a/tooling/src/hypothesistooling/__main__.py b/tooling/src/hypothesistooling/__main__.py
index 6eb938510..d07914c85 100644
--- a/tooling/src/hypothesistooling/__main__.py
+++ b/tooling/src/hypothesistooling/__main__.py
@@ -363,6 +363,10 @@ def update_vendored_files():
if fname.read_bytes().splitlines()[1:] != new.splitlines()[1:]:
fname.write_bytes(new)
+ url = "https://hpiers.obspm.fr/iers/bul/bulc/ntp/leap-seconds.list"
+ (vendor / url.split("/")[-1]).write_bytes(requests.get(url).content)
+
# Always require the most recent version of tzdata - we don't need to worry about
# pre-releases because tzdata is a 'latest data' package (unlike pyodide-build).
# Our crosshair extra is research-grade, so we require latest versions there too. |
pytz.tzinfo.localize will raise a NonExistentTimeError or AmbiguousTimeError exception if it can't resolve the current local time due to the change to/from daylight savings time. This is the source for numerous bugs in software dealing with datetimes in Python. A strategy that selects for these error causing times would help improve the quality of Hypothesis-Datetime.
The text was updated successfully, but these errors were encountered: