Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new format code to strptime for when microsecond is optional #100929

Open
mdavis-xyz opened this issue Jan 10, 2023 · 1 comment
Open

Add new format code to strptime for when microsecond is optional #100929

mdavis-xyz opened this issue Jan 10, 2023 · 1 comment
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement

Comments

@mdavis-xyz
Copy link
Contributor

mdavis-xyz commented Jan 10, 2023

Feature or enhancement

I propose that a new % format code(s) be added to strptime to allow parsing of timestamps which are either whole seconds, or fractions of a second.

The current %f format code will throw an error if the input string has no microsecond component.

Pitch

As an example, I would like to pass both of the following strings with the same code.

import datetime as dt

fmt = "%Y-%m-%d %H:%M:%S.%f"
ts = [
    "2023-01-01 09:10:13.513",
    "2023-01-01 09:10:13"
]
for s in ts:
    dt.datetime.strptime(s, fmt)

This currently fails with:

Traceback (most recent call last):
  File "main.py", line 9, in <module>
    dt.datetime.strptime(s, fmt)
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/_strptime.py", line 568, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "/home/ec2-user/.pyenv/versions/3.8.11/lib/python3.8/_strptime.py", line 349, in _strptime
    raise ValueError("time data %r does not match format %r" %
ValueError: time data '2023-01-01 09:10:13' does not match format '%Y-%m-%d %H:%M:%S.%f'

In my code I could write a try/except block to catch the error and try with a different string. I think it would be nice if strptime could handle that for me. Although I do know that the implementation of this function depends on the system, so this might be a tricky change.

There's a few options.

Option A

Add a new format code which is similar to %f, but can handle an empty microsecond component.
I note that %F is not yet taken, so we could use that.
Note that %f today doesn't expect a ..

So in the above example you'd use %Y-%m-%d %H:%M:%S%F.

Option B

Add a new format code for parsing seconds and optionally microseconds in one go. I note that %s is not yet taken, so we could use that.

So in the above example you'd use %Y-%m-%d %H:%M:%s.

Option C

Add [] to make a component optional.

So in the above example you'd use %Y-%m-%d %H:%M:%S[.%f].

My guess is that this would be quite a substantial change.

Option D

Modify the arguments to strptime so that it can take a list of format strings. It tries them in order until it one succeeds.

So in the above example you'd use

dt.datetime.strptime(s, ["%Y-%m-%d %H:%M:%S.%f", "%Y-%m-%d %H:%M:%S"])

Since a list of strings, and a single string are both iterators of strings, this might be a bit fiddly to implement. You could add a new formats argument, and require that either format or formats is passed in. Although that kind of clutters the function signature.

Previous discussion

https://bugs.python.org/issue1982 @abalkin
https://bugs.python.org/issue1158

@BlackAce21
Copy link

BlackAce21 commented Jul 3, 2023

from datetime import datetime
from itertools import combinations

opening_char = '['
closing_char = ']'

def parse_datetime(date_string, date_formats):
    for format_string in date_formats:
        try:
            parsed_date = datetime.strptime(date_string, format_string)
            return parsed_date
        except ValueError:
            continue
    
    print(f"Unable to parse date with any given format for string: {date_string}")
    return None

def _extract_optional_components(format_string):
    if opening_char in format_string:
        sub_strings = _get_bracketed_strings(format_string)
        
        for s in sub_strings:
            s.replace(opening_char, '')
            s.replace(closing_char, '')
        
        return sub_strings
    else:
        return []
                
def _get_bracketed_strings(input_string):
    sub_strings = []
    for i, char in enumerate(input_string):
        if char == opening_char:
            openpos = i
            closepos = openpos
            counter = 1
            while counter > 0:
                closepos += 1
                c = format_string[closepos]
                if c == opening_char:
                    counter += 1
                elif c == closing_char:
                    counter -= 1
            sub_strings.append(input_string[openpos + 1:closepos])
    return sub_strings

def _generate_date_formats(format_string):
    optional_components = _extract_optional_components(format_string)
    num_optionals = len(optional_components)
    
    all_combinations = []
    for r in range(num_optionals + 1):
        for combination in combinations(range(num_optionals), r):
            all_combinations.append(combination)

    output_formats = []
    for combination in all_combinations:
        new_format = format_string
        for i in range(num_optionals):
            if i in combination:
                new_format = new_format.replace(f'[{optional_components[i]}]', optional_components[i])
            else:
                new_format = new_format.replace(f'[{optional_components[i]}]', '')

        output_formats.append(new_format)

    return output_formats


if __name__ == "__main__":
    # Example usage
    format_string = "%Y-%m-%d[T%H:%M:%S[.%f]][Z]"
    optional_format_list = _generate_date_formats(format_string)

    date_string1 = "2023-06-16T03:09:23.155Z"
    date_string2 = "2023-06-16T02:53:18Z"
    date_string3 = "2023-06-16"

    datetime_obj1 = parse_datetime(date_string1, optional_format_list)
    datetime_obj2 = parse_datetime(date_string2, optional_format_list)
    datetime_obj3 = parse_datetime(date_string3, optional_format_list)

    print(datetime_obj1)  # 2023-06-16 03:09:23.155000+00:00
    print(datetime_obj2)  # 2023-06-16 02:53:18+00:00
    print(datetime_obj3)  # 2023-06-16 00:00:00+00:00

Work around for this problem in case anyone else ends up here and just wants something that works. Combination of proposed option C and D in the OP. Apologies if dropping a code block isn't proper etiquette

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extension-modules C modules in the Modules dir type-feature A feature request or enhancement
Projects
Development

No branches or pull requests

3 participants