Change timestamps from custom strings to `datetime` in result model and to ISO 8601 format in output.xml #4258

pekkaklarck · 2022-03-08T10:59:11Z

We currently have our own get_timestamp function for getting the current timestamp and get_elapsed_time for calculating the difference between two timestamps as milliseconds.

>>> t1 = get_timestamp()
>>> t2 = get_timestamp()
>>> print(t1)
20220308 12:37:43.875
>>> print(t2)
20220308 12:37:49.022
>>> get_elapsed_time(t1, t2)
5147

We should replace these custom functions with functionality provided by Python's standard datetime module. In practice we could get the current time by using datetime.now() and could calculate the elapsed time by simply subtracting datetime objects:

>>> d1 = datetime.now()
>>> d2 = datetime.now()
>>> print(d1)
2022-03-08 12:43:56.109349
>>> print(d2)
2022-03-08 12:44:03.567573
>>> delta = d2 - d1
>>> round(delta.total_seconds() * 1000)
7458

One benefit of using the standard datetime module would be that we could remove lot of our custom code. A bigger benefit is performance measured here using timeit:

>>> timeit(lambda: get_timestamp())
1.0487582740024664
>>> timeit(lambda: datetime.now())
0.2848877740034368
>>> timeit(lambda: get_elapsed_time(t1, t2))
5.105318886999157
>>> timeit(lambda: round((d2-d1).total_seconds() * 1000))
0.24636966099205893

As can seen from the examples, the difference in performance is pretty big when getting the timestamp and rather huge when calculating the elapsed time. These tests each run one million iterations, so the time for one iteration isn't huge and this doesn't matter much with smallish test/task runs. With bigger runs containing e.g. loops times start to add, though, because we call get_timestamp twice and get_elapsed_time once for each keyword, loop iteration, IF/ELSE branch, and so on.

The text was updated successfully, but these errors were encountered:

pekkaklarck · 2022-03-08T11:03:26Z

What exactly we need to change to take datetime objects into use is still a bit unclear. One thing to consider is should we preserve existing starttime and endtime attributes in result model objects for backwards compatibility reasons. In practice we could store actual datetime objects to other properties (possibly under some new parent object like times that could be used like kw.times.start) and the old attributes would be turned to properties.

pekkaklarck · 2022-03-08T11:07:59Z

If someone wonders why we have even created our own timestamp handling functions and didn't use datetime to begin with, the reason is that earliest Robot Framework releases supported Python 2.2 and datetime was added in Python 2.3. After we had added our own utils, switching didn't anymore make sense. I also got a feeling datetime didn't have as good performance back then as it has now and at least it didn't have all the current features.

pekkaklarck · 2022-03-08T14:35:52Z

Thought about the API bit more and think this would work:

Model objects get new start_time and end_time attributes that contain the actual datetime objects (and are None by default).
Model objects also get new elapsed_time attribute that is property returning self.end_time - self.start_time. If either start or end time is not defined, should behave same way as nowadays.
Old starttime, endtime and elapsedtime are preserved as deprecated propertys that get the actual value from the aforementioned new attributes. They'd return same values as these attributes return currently.
For consistency, arguments passed to model objects when they are created are renamed from starttime and endtime to start_time and end_time. This is backwards incompatible, but creating these model objects ought to be pretty rare. They are more often only inspected by custom tools.

pekkaklarck · 2022-03-08T15:10:22Z

We probably should change the formatted timestamps from the current 20220308 16:36:12.123 format to more ISO 8601 compatible format like 2022-03-08 16:36:12.123. We should use it in logs, reports and in output.xml. In the latter we could consider using the official T as the separator between date and time parts, but with logs and reports it's better to use the space character.

The benefits of using the ISO 8601 format is to make the date part easier to interpret for humans and to make dates in output.xml easier to parse for external tools. Very importantly, we could use datetime.isoformat and datetime.fromisoformat that are very fast compared to using datetime.strftime and datetime.strptime we needed to use with custom times:

>>> d = datetime.now()
>>> d.isoformat(' ') == d.strftime('%Y-%m-%d %H:%M:%S.%f')
True
>>> timeit(lambda: d.isoformat(' '))
0.6668484999972861
>>> timeit(lambda: d.strftime('%Y-%m-%d %H:%M:%S.%f'))
1.6757996669912245
>>> 
>>> s = d.isoformat(' ')
>>> datetime.fromisoformat(s) == datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f')
True
>>> timeit(lambda: datetime.fromisoformat(s))
0.1304736560123274
>>> timeit(lambda: datetime.strptime(s, '%Y-%m-%d %H:%M:%S.%f'))
5.866141628997866

A drawback of using 2022-03-08 instead of 20220308, and the reason we've used the latter so far, is that it's two characters wider. This probably requires us to adjust styles in log and report, but that's a small task and in general the benefits above are bigger than problems. The deprecated starttime and endtime properties, discussed in the above comment, should nevertheless return the timestamps in the old format.

A bigger drawback of changing the date format in output.xml is that it's a backwards incompatible change. I still consider benefits bigger, but it's a bit questionable to do such changes like that in RF 5.1. If this is considered a big issue, we can decide to post-pone the change to RF 6.0 (or just change RF 5.1 to 6.0).

pekkaklarck · 2022-03-08T16:28:50Z

One smallish thing to decide is the timestamp precision. So far we have used milliseconds but datetime.now() uses microsecond precision. Due to Robot's own overhead, I got a feeling microsecond accuracy isn't that precise and would in general prefer using milliseconds.

We should use milliseconds at least in log and report to avoid timestamps getting overly long and, more importantly, to avoid adding three digits for each timestamp. Those added digits would probably increase log.html size considerably.

In output.xml we could still use microseconds if we wanted to. That would have a benefit that when we read the timestamp from there we get the same data as we originally had. If we only store timestamps in millisecond accuracy, then the data after the round-trip is different than what we originally for from datetime.now():

>>> d = datetime.now()
>>> s = d.isoformat()
>>> d == datetime.fromisoformat(s)
True
>>> d.isoformat(' ')[:-3]
'2022-03-08 17:36:38.972'
>>> s = d.isoformat(' ')[:-3]
>>> d == datetime.fromisoformat(s)
False

A drawback of using microseconds in output.xml is the increased file size. One way to reduce the size would be that instead of storing start and end times in output.xml, we'd save start and elapsed time. The elapsed time in microsecond accuracy would look like 0:01:30.123456 and would certainly be shorter than end timestamp in millisecond accuracy. This obviously would be a backwards incompatible change, but if we are anyway changing timestamp format in output.xml this could be fine as well. Often the elapsed time is anyway more interesting than actual start/end times and in those cases having the elapsed time directly available would be convenient. Anyway, if we decide to do this, the change should get its own issue.

If we decide to use milliseconds in output.xml, then we probably should use milliseconds also with the initial start/end times we get during execution. That way model objects would have same start/end time regardless are they created during execution or based on output.xml. That would mean that instead of just using datetime.now(), we needed to use something like this:

def now():
    d = datetime.now()
    ms = round(d.microsecond, -3)
    if ms == 1_000_000:
        return d.replace(microsecond=0) + timedelta(seconds=1)
    return d.replace(microsecond=ms)

The above obviously isn't as fast as using datetime.now() alone would be. In fact, it's even a bit slower than using the current get_timestamp:

>>> timeit(lambda: datetime.now())
0.2810397749999538
>>> timeit(lambda: now())
1.3667088740039617
>>> timeit(lambda: get_timestamp())
1.0332882049988257

Calculating the elapsed time would still be a lot faster than earlier, so even with this approach we'd get performance benefits overall. Anyway, either using microseconds in output.xml or just accepting that model objects have different times depending on do we get them during execution or from output.xml is probably a better idea. I think it is best to use microseconds in output.xml.

pekkaklarck · 2022-03-08T16:41:20Z

Yet another thing to decide is how to handle timezones. Using datetime terminology, we should decide should timestamps be aware or naive. I believe using naive timestamps is better, because adding timezone information would increase file sizes and all timestamps having something like +02:00 at the end would likely be a distraction rather than bring benefits. Possibly we could store the local timezone information somewhere separately, at least in output.xml, to have it available for those who need it.

pekkaklarck · 2022-04-01T12:47:19Z

Due to changes needed to output.xml, and output.xml being such an important interface between external systems, it's better to do this in RF 6.0.

A benefit of waiting for RF 6.0 is that it won't anymore support Python 3.6 that doesn't have datetime.fromisoformat.

pekkaklarck · 2023-05-12T12:45:45Z

This will be done in RF 7.0. For forward compatibility, it would be good to add propertys containing start, end and elapsed times as datetime and timedelta objects already in RF 6.1. I'll submit a separate issue about that.

pekkaklarck · 2023-08-25T22:26:16Z

I commented timestamp precision above and explained how rounding microseconds used by datetime.now() to milliseconds we currently use slows getting the timestamp down. I just noticed that datetime.isoformat has an optional timespec argument that makes it easy to round to milliseconds:

>>> dt = datetime.now()
>>> dt.isoformat(' ')
'2023-08-26 01:19:26.574297'
>>> dt.isoformat(' ', 'milliseconds')
'2023-08-26 01:19:26.574'

The performance with and withouttimespec is the same so it avoids performance issues. There would, however, be still the problem that times would have higher precision during execution (microseconds) than afterwards (milliseconds). My current thinking is that we should use microseconds in output.xml and decide what precision to use in the log file later.

pekkaklarck · 2023-09-07T11:57:01Z

In addition to changing start and end times of the result objects to datetime, we should change the timestamp of Message objects too.

This is the first step of using `datatime` for timestamps (#4258). Result objects now get their timestamps using `datetime.now()` and they are stored to `start_time` and `end_time` attributes. Old `starttime` and `endtime` are propertyes that return same string representation as earlier. Timestamps are still stored to output.xml in the old format. Moving to ISO 8601 format for performance and standard compatibility is the next step.

Actually end time isn't saved at all, instead we save elapsed. Also XML attribute names are changed to shorter `start` and `elapsed` from `starttime` and `endtime`. This is the second part of #4258. Also output.xml generated time is now in ISO format.

Timestamps in output.xml and also in the debug file are now in ISO 8601 format. Part of #4258.

Avoid old `starttime`, `endtime` and `elapsedtime` and use new `start_time`, `end_time´ and `elapsed_time` instead. Related to #4258.

Most importantly, fix tests using message timestamps in format `yyyymmdd hh:mm:ss`. They worked with Python 3.11 and 3.12, because with them `datetime.fromisoformat` accepted them, but with older there was a ValueError.

Some of these aren't needed anymore due to timestamps being created using `datetime` (#4258). Others were used so few times that preserving them didn't make sense. Also introduce new `parse_timestamp` util that parses timestamps to a `datetime`. It isn't as strict as `datetime.fromisoformat`.

Use new `elapsed_time` instead. Hopefully the final part of #4258.

pekkaklarck · 2023-09-09T14:21:38Z

This ought to be now done. Based on my quick tests performance benefits didn't really materialize, but there's a small boost in execution and processing output.xml with Rebot can be up to 10% faster. There are, however, various other benefits with these changes:

datetime is much easier to work with than our custom string timestamp.
Also timedelta has benefits compared to elapsed time being an integer representing milliseconds.
Timestamps in output.xml, syslog, and debug file now use the standard ISO 8601 format. That's especially nice with output.xml because the standard format is easier for external tools to process. Also in our schema we can now use xs:dateTime instead of xs:string that accepts anything.
We can easily use the ISO 8601 format with future outputs such as JSON results (Support JSON serialization with result model #4847).
Attribute names start_time, end_time and elapsed_time are consistent with Python coding guidelines.
We now have elapsed time directly available in output.xml. External tools processing them doesn't need to calculate it based on the start time and the end time. The end time isn't there anymore so it needs to be calculated if needed, but I believe it's not often used than the elapsed time.
We store timestamps in microsecond accuracy in our result and in output.xml.
output.xml got a bit smaller because instead of starttime="20230907 20:00:01.000" endtime="20230907 20:00:11.123" we nowadays use start="2023-09-07T20:00:01.000001" elapsed="10.122999".

pekkaklarck · 2023-09-09T14:59:56Z

As discussed above, this issue caused some backwards incompatible changes and deprecations. I try to summarize them here:

output.xml now has start and elapsed attributes instead of starttime and endtime it used to have. Tools processing output.xml files themselves and interested in times need to be updated. Tools using Robot's result model don't need to care.
Related to the above, start now contains the timestamp in the ISO 8601 format like start="2023-09-07T20:00:01.000001" when we earlier had a custom format like starttime="20230907 20:00:01.000". The new elapsed attribute is a float with up to six digits and it represents seconds.
Still related to the above, timestamp attribute with <msg> elements has been changed to time. Its format has also been changed from our custom timestamp to ISO 8601.
Also the generated time in output.xml uses ISO 8601.
Result model objects now accept start_time and end_time arguments when they used to be starttime and endtime. This shouldn't matter much, because results objects are seldom created by external tools. Inspecting them works the same way as earlier.
starttime, endtime and elapsedtime attributes of the result objects are considered deprecated. There is no actual deprecation warning yet, but such a warning will be added later and these attributes will eventually be removed.
Utility functions related to getting and processing timestamps have been deprecated. Using them causes a deprecation warning and these utils will be removed in RF 8.0.

In practice normal users shouldn't be affected. The biggest problem is that external tools processing output.xml files need to updated, but timestamps being in a standard format ought to be nice for them in the long run. Previous Robot versions cannot process new output.xml files, but output.xml files created by previous Robot versions are handled by the new version.

Setting any two of `start_time`, `end_time` and `elapsed_time` is enough, because the third can be calculated. Elapsed time is written to output.xml so better to calculate it immediately. End time is't generally needed during execution. This is to some extend related to #4258.

It seems that string representation of floats that are smaller than `0.0001` uses scientific notation which we don't want. For example, `str(0.00004) == '4e-05'`. Formatting using the `f` modifier solves the issue. It uses six digits by default which is fine, but it unfortunately doesn't omit trailing zeros. They could be rstrippped, but it could strip all digits and leave us with something like `3.`. Stripping also `.` would work, but then the result could be just `3` which wouldn't technically amyore be `xs:float`. Easier to just always use six digits. Related to #4258.

pekkaklarck · 2024-01-31T14:59:27Z

Due to the changes being pretty badly backwards incompatible, we added --legacy-output option for generating outputs in same format as RF 6 and earlier in c3c5170. It has a bug with Rebot (#5012), but it will be fixed in RF 7.0.1.

pekkaklarck added enhancement priority: high labels Mar 8, 2022

pekkaklarck added this to the v5.1 milestone Mar 8, 2022

pekkaklarck added backwards incompatible deprecation labels Mar 8, 2022

pekkaklarck mentioned this issue Mar 10, 2022

Honor SOURCE_DATE_EPOCH environment variable when generating library documentation #4262

Closed

pekkaklarck modified the milestones: v5.1, v6.0 Apr 1, 2022

pekkaklarck mentioned this issue Apr 1, 2022

Drop Python 3.6 and 3.7 support #4294

Closed

pekkaklarck changed the title ~~Performance enhacements to getting timestamps and calculating elaped times~~ Performance enhancements for getting timestamps and calculating elapsed times Aug 26, 2022

pekkaklarck mentioned this issue May 12, 2023

Add forward compatible start_time, end_time and elapsed_time propertys to result objects #4765

Closed

pekkaklarck mentioned this issue Aug 25, 2023

Support JSON serialization with result model #4847

Closed

pekkaklarck added the effort: large label Sep 5, 2023

pekkaklarck self-assigned this Sep 5, 2023

pekkaklarck added a commit that referenced this issue Sep 8, 2023

Change Message.timestamp to datatime.

b23bc24

Timestamps in output.xml and also in the debug file are now in ISO 8601 format. Part of #4258.

pekkaklarck added a commit that referenced this issue Sep 8, 2023

Test cleanup.

d537bac

Avoid old `starttime`, `endtime` and `elapsedtime` and use new `start_time`, `end_time´ and `elapsed_time` instead. Related to #4258.

pekkaklarck mentioned this issue Sep 8, 2023

Loudly deprecate old Python 2/3 compatibility layer and other deprecated utils #4501

Closed

pekkaklarck changed the title ~~Performance enhancements for getting timestamps and calculating elapsed times~~ Change timestamps from custom strings to datetime in Python model and to ISO 8601 format in output.xml Sep 9, 2023

pekkaklarck mentioned this issue Sep 9, 2023

Deprecate elapsed_time_to_string accepting time as milliseconds #4862

Closed

pekkaklarck added a commit that referenced this issue Sep 9, 2023

Avoid old elapsedtime attribute in statistics.

f4058f4

Use new `elapsed_time` instead. Hopefully the final part of #4258.

pekkaklarck closed this as completed Sep 9, 2023

pekkaklarck changed the title ~~Change timestamps from custom strings to datetime in Python model and to ISO 8601 format in output.xml~~ Change timestamps from custom strings to datetime in result model and to ISO 8601 format in output.xml Sep 13, 2023

pekkaklarck mentioned this issue Sep 15, 2023

Use standard XSD data types in output.xml #3829

Closed

pekkaklarck added the alpha 1 label Nov 7, 2023

pekkaklarck mentioned this issue Dec 15, 2023

XML: Support ignoring element order with Elements Should Be Equal #4877

Closed

pekkaklarck mentioned this issue Jan 18, 2024

Robot framework 7.x output does not support robot jenkins plugin ? #5018

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change timestamps from custom strings to `datetime` in result model and to ISO 8601 format in output.xml #4258

Change timestamps from custom strings to `datetime` in result model and to ISO 8601 format in output.xml #4258

pekkaklarck commented Mar 8, 2022

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Mar 8, 2022

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Mar 8, 2022

pekkaklarck commented Apr 1, 2022

pekkaklarck commented May 12, 2023

pekkaklarck commented Aug 25, 2023

pekkaklarck commented Sep 7, 2023

pekkaklarck commented Sep 9, 2023 •

edited

Loading

pekkaklarck commented Sep 9, 2023 •

edited

Loading

pekkaklarck commented Jan 31, 2024

Change timestamps from custom strings to datetime in result model and to ISO 8601 format in output.xml #4258

Change timestamps from custom strings to datetime in result model and to ISO 8601 format in output.xml #4258

Comments

pekkaklarck commented Mar 8, 2022

pekkaklarck commented Mar 8, 2022 • edited Loading

pekkaklarck commented Mar 8, 2022 • edited Loading

pekkaklarck commented Mar 8, 2022

pekkaklarck commented Mar 8, 2022 • edited Loading

pekkaklarck commented Mar 8, 2022 • edited Loading

pekkaklarck commented Mar 8, 2022

pekkaklarck commented Apr 1, 2022

pekkaklarck commented May 12, 2023

pekkaklarck commented Aug 25, 2023

pekkaklarck commented Sep 7, 2023

pekkaklarck commented Sep 9, 2023 • edited Loading

pekkaklarck commented Sep 9, 2023 • edited Loading

pekkaklarck commented Jan 31, 2024

Change timestamps from custom strings to `datetime` in result model and to ISO 8601 format in output.xml #4258

Change timestamps from custom strings to `datetime` in result model and to ISO 8601 format in output.xml #4258

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Mar 8, 2022 •

edited

Loading

pekkaklarck commented Sep 9, 2023 •

edited

Loading

pekkaklarck commented Sep 9, 2023 •

edited

Loading