-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change timestamps from custom strings to datetime
in result model and to ISO 8601 format in output.xml
#4258
Comments
What exactly we need to change to take |
If someone wonders why we have even created our own timestamp handling functions and didn't use |
Thought about the API bit more and think this would work:
|
We probably should change the formatted timestamps from the current The benefits of using the ISO 8601 format is to make the date part easier to interpret for humans and to make dates in output.xml easier to parse for external tools. Very importantly, we could use datetime.isoformat and datetime.fromisoformat that are very fast compared to using datetime.strftime and datetime.strptime we needed to use with custom times: >>> d = datetime.now()
>>> d.isoformat(' ') == d.strftime('%Y-%m-%d %H:%M:%S.%f')
True
>>> timeit(lambda: d.isoformat(' '))
0.6668484999972861
>>> timeit(lambda: d.strftime('%Y-%m-%d %H:%M:%S.%f'))
1.6757996669912245
>>>
>>> s = d.isoformat(' ')
>>> datetime.fromisoformat(s) == datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')
True
>>> timeit(lambda: datetime.fromisoformat(s))
0.1304736560123274
>>> timeit(lambda: datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f'))
5.866141628997866 A drawback of using A bigger drawback of changing the date format in output.xml is that it's a backwards incompatible change. I still consider benefits bigger, but it's a bit questionable to do such changes like that in RF 5.1. If this is considered a big issue, we can decide to post-pone the change to RF 6.0 (or just change RF 5.1 to 6.0). |
One smallish thing to decide is the timestamp precision. So far we have used milliseconds but We should use milliseconds at least in log and report to avoid timestamps getting overly long and, more importantly, to avoid adding three digits for each timestamp. Those added digits would probably increase log.html size considerably. In output.xml we could still use microseconds if we wanted to. That would have a benefit that when we read the timestamp from there we get the same data as we originally had. If we only store timestamps in millisecond accuracy, then the data after the round-trip is different than what we originally for from >>> d = datetime.now()
>>> s = d.isoformat()
>>> d == datetime.fromisoformat(s)
True
>>> d.isoformat(' ')[:-3]
'2022-03-08 17:36:38.972'
>>> s = d.isoformat(' ')[:-3]
>>> d == datetime.fromisoformat(s)
False A drawback of using microseconds in output.xml is the increased file size. One way to reduce the size would be that instead of storing start and end times in output.xml, we'd save start and elapsed time. The elapsed time in microsecond accuracy would look like If we decide to use milliseconds in output.xml, then we probably should use milliseconds also with the initial start/end times we get during execution. That way model objects would have same start/end time regardless are they created during execution or based on output.xml. That would mean that instead of just using def now():
d = datetime.now()
ms = round(d.microsecond, -3)
if ms == 1_000_000:
return d.replace(microsecond=0) + timedelta(seconds=1)
return d.replace(microsecond=ms) The above obviously isn't as fast as using >>> timeit(lambda: datetime.now())
0.2810397749999538
>>> timeit(lambda: now())
1.3667088740039617
>>> timeit(lambda: get_timestamp())
1.0332882049988257 Calculating the elapsed time would still be a lot faster than earlier, so even with this approach we'd get performance benefits overall. Anyway, either using microseconds in output.xml or just accepting that model objects have different times depending on do we get them during execution or from output.xml is probably a better idea. I think it is best to use microseconds in output.xml. |
Yet another thing to decide is how to handle timezones. Using datetime terminology, we should decide should timestamps be aware or naive. I believe using naive timestamps is better, because adding timezone information would increase file sizes and all timestamps having something like |
Due to changes needed to output.xml, and output.xml being such an important interface between external systems, it's better to do this in RF 6.0. A benefit of waiting for RF 6.0 is that it won't anymore support Python 3.6 that doesn't have datetime.fromisoformat. |
This will be done in RF 7.0. For forward compatibility, it would be good to add propertys containing start, end and elapsed times as |
I commented timestamp precision above and explained how rounding microseconds used by >>> dt = datetime.now()
>>> dt.isoformat(' ')
'2023-08-26 01:19:26.574297'
>>> dt.isoformat(' ', 'milliseconds')
'2023-08-26 01:19:26.574' The performance with and without |
In addition to changing start and end times of the result objects to |
This is the first step of using `datatime` for timestamps (#4258). Result objects now get their timestamps using `datetime.now()` and they are stored to `start_time` and `end_time` attributes. Old `starttime` and `endtime` are propertyes that return same string representation as earlier. Timestamps are still stored to output.xml in the old format. Moving to ISO 8601 format for performance and standard compatibility is the next step.
Actually end time isn't saved at all, instead we save elapsed. Also XML attribute names are changed to shorter `start` and `elapsed` from `starttime` and `endtime`. This is the second part of #4258. Also output.xml generated time is now in ISO format.
Timestamps in output.xml and also in the debug file are now in ISO 8601 format. Part of #4258.
Avoid old `starttime`, `endtime` and `elapsedtime` and use new `start_time`, `end_time´ and `elapsed_time` instead. Related to #4258.
Most importantly, fix tests using message timestamps in format `yyyymmdd hh:mm:ss`. They worked with Python 3.11 and 3.12, because with them `datetime.fromisoformat` accepted them, but with older there was a ValueError.
Some of these aren't needed anymore due to timestamps being created using `datetime` (#4258). Others were used so few times that preserving them didn't make sense. Also introduce new `parse_timestamp` util that parses timestamps to a `datetime`. It isn't as strict as `datetime.fromisoformat`.
datetime
in Python model and to ISO 8601 format in output.xml
Use new `elapsed_time` instead. Hopefully the final part of #4258.
This ought to be now done. Based on my quick tests performance benefits didn't really materialize, but there's a small boost in execution and processing output.xml with Rebot can be up to 10% faster. There are, however, various other benefits with these changes:
|
As discussed above, this issue caused some backwards incompatible changes and deprecations. I try to summarize them here:
In practice normal users shouldn't be affected. The biggest problem is that external tools processing output.xml files need to updated, but timestamps being in a standard format ought to be nice for them in the long run. Previous Robot versions cannot process new output.xml files, but output.xml files created by previous Robot versions are handled by the new version. |
Setting any two of `start_time`, `end_time` and `elapsed_time` is enough, because the third can be calculated. Elapsed time is written to output.xml so better to calculate it immediately. End time is't generally needed during execution. This is to some extend related to #4258.
datetime
in Python model and to ISO 8601 format in output.xmldatetime
in result model and to ISO 8601 format in output.xml
It seems that string representation of floats that are smaller than `0.0001` uses scientific notation which we don't want. For example, `str(0.00004) == '4e-05'`. Formatting using the `f` modifier solves the issue. It uses six digits by default which is fine, but it unfortunately doesn't omit trailing zeros. They could be rstrippped, but it could strip all digits and leave us with something like `3.`. Stripping also `.` would work, but then the result could be just `3` which wouldn't technically amyore be `xs:float`. Easier to just always use six digits. Related to #4258.
We currently have our own
get_timestamp
function for getting the current timestamp andget_elapsed_time
for calculating the difference between two timestamps as milliseconds.We should replace these custom functions with functionality provided by Python's standard datetime module. In practice we could get the current time by using
datetime.now()
and could calculate the elapsed time by simply subtractingdatetime
objects:One benefit of using the standard
datetime
module would be that we could remove lot of our custom code. A bigger benefit is performance measured here using timeit:As can seen from the examples, the difference in performance is pretty big when getting the timestamp and rather huge when calculating the elapsed time. These tests each run one million iterations, so the time for one iteration isn't huge and this doesn't matter much with smallish test/task runs. With bigger runs containing e.g. loops times start to add, though, because we call
get_timestamp
twice andget_elapsed_time
once for each keyword, loop iteration, IF/ELSE branch, and so on.The text was updated successfully, but these errors were encountered: