Bug: Many serious TOML parsing/loading bugs caused by underlying toml library, consider switching? #439

pirate · 2024-09-24T01:54:44Z

Thanks so much for building python-benedict, it's really awesome!

So unfortunatley the underlying TOML library https://github.com/uiri/toml used by benedict has a bunch of long-time outstanding bugs that break parsing/loading and generally make it unsafe to load->dump->load the same string.

For example, you cannot dump any dict containing escape sequences without it throwing an exception:

>>> benedict({
	"reset": "\033[00;00m",
	"lightblue": "\033[01;30m",
}).to_toml()

...
File ~/test/.venv/lib/python3.11/site-packages/toml/encoder.py:113, in _dump_str(v)
    111     else:
    112         joiner = "u00"
--> 113     v = [v[0] + joiner + v[1]] + v[2:]
    114 return unicode('"' + v[0] + '"')

IndexError: list index out of range

But thats not it, there are many other fairly major string escaping, quoting, and parse/dump cycle inconsistency bugs that have bitten other projects using uiri/toml:

Almost all of these are 2yr+ old, indicating they're probably not going to all get fixed anytime soon without significant increase in velocity. No harm no foul, it's open source we can't demand they go faster and they don't owe us anything, but maybe benedict could consider a different library with fewer major outstanding parser consistency issues?

Is benedict open to switching to a library without these issues? Maybe one of these:

I'll chip in $20 towards the work to make the switch if it's an option ⬇️

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

pirate · 2024-09-24T02:16:22Z

After looking more into the other options unfortunately they are worse in other ways, for example they can barely handle a fraction of the types that uiri/toml supports :/

I guess we are stuck writing a custom TomlEncoder class to work around the bugs for now.

import re
from pathlib import Path

import toml

def better_toml_dump_str(val):
    try:
        return toml.encoder._dump_str(val)
    except Exception:
        # if we hit any of toml's numerous encoding bugs,
        # fall back to using json representation of string
        return json.dumps(str(val), default=repr)

class CustomTOMLEncoder(toml.encoder.TomlEncoder):
    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        
        # override the dumper funcs for any types you need to support here:
        self.dump_funcs[str] 			= better_toml_dump_str
		self.dump_funcs[Path] 			= better_toml_dump_str
        self.dump_funcs[re.RegexFlag] 	= better_toml_dump_str
        ...

# this works:
>>> test_dict = {
	"string_with_escape_sequences": "\033[00;00m",
	"some_Path": "/some/path/example.txt",
	"example_weird_type": re.IGNORECASE | re.UNICODE | re.MULTILINE,
	# etc. try more weird types here
}
>>> print(benedict(test_dict).to_toml(encoder=CustomTOMLEncoder()))

string_with_escape_sequences = "\u001b[00;00m"
example_weird_type = "re.IGNORECASE|re.UNICODE|re.MULTILINE"

You're welcome to close this issue 🤷, up to you.

fabiocaccamo · 2024-09-24T13:54:57Z

@pirate thank you for reporting this problem.

Actually, for decoding toml on Python >= 3.11, the tomlib standard lib is used:
https://github.com/fabiocaccamo/python-benedict/blob/main/benedict/serializers/toml.py#L10

I think it would be cool to improve TOML support, would you like to submit a PR with a list of failing test cases (that should succeed with the perfect library)?
So it would be much easier to test other TOML libraries or doing fixes in a different way.

fabiocaccamo · 2024-10-18T15:16:48Z

@pirate is it in your plans to submit a PR for this?

pirate · 2024-10-30T03:42:27Z

I think given there are only two real python toml library options and you're already using the best one, I think it's not worth trying to patch the toml bugs on top within benedict.

You can leave it up to end users to figure out a workaround if they really need one (e.g. my CustomTOMLEncoder above). I just wanted to document this issue so people could find it via Google if they have the same problem and wondered the cause.

Here is a good test suite for TOML parsing though if you want one: https://github.com/toml-lang/toml-test

fabiocaccamo · 2024-11-01T08:29:24Z

@pirate totally agree, it would make more sense to fix these issue at toml lib level.

pirate added the bug Something isn't working label Sep 24, 2024

pirate assigned fabiocaccamo Sep 24, 2024

fabiocaccamo added this to Open Source Oct 14, 2024

fabiocaccamo moved this to Todo in Open Source Oct 14, 2024

pirate closed this as completed Oct 30, 2024

github-project-automation bot moved this from Todo to Done in Open Source Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Many serious TOML parsing/loading bugs caused by underlying toml library, consider switching? #439

Bug: Many serious TOML parsing/loading bugs caused by underlying toml library, consider switching? #439

pirate commented Sep 24, 2024 •

edited

Loading

pirate commented Sep 24, 2024 •

edited

Loading

fabiocaccamo commented Sep 24, 2024

fabiocaccamo commented Oct 18, 2024

pirate commented Oct 30, 2024 •

edited

Loading

fabiocaccamo commented Nov 1, 2024

Bug: Many serious TOML parsing/loading bugs caused by underlying toml library, consider switching? #439

Bug: Many serious TOML parsing/loading bugs caused by underlying toml library, consider switching? #439

Comments

pirate commented Sep 24, 2024 • edited Loading

Upvote & Fund

pirate commented Sep 24, 2024 • edited Loading

fabiocaccamo commented Sep 24, 2024

fabiocaccamo commented Oct 18, 2024

pirate commented Oct 30, 2024 • edited Loading

fabiocaccamo commented Nov 1, 2024

pirate commented Sep 24, 2024 •

edited

Loading

pirate commented Sep 24, 2024 •

edited

Loading

pirate commented Oct 30, 2024 •

edited

Loading