You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to parse the github page of my project (https://github.com/yannham/mechaml/) using Lambdasoup, but I got an underlying unexpected error from Markup.ml. When I type in a REPL (utop)
Soup.read_file "github.html" |> Soup.parse
where github.html is a dump of the previously given github page, I get
Thanks. This is an internal error in Markup.ml that needs to be fixed.
This is due to wrong handling of an unmatched </form> tag in the (ill-formed) HTML input.
I want to note that Markup.ml should not exactly fail quietly, more like report the bad tag to ~report and then recover in a certain way – there is a specific behavior required by HTML5 (see 'An end tag whose tag name is "form"'), so I hesitate to call the correct behavior a failure.
This should be fixed now (in Markup.ml master). Sorry about the delay – I actually wrote most of this commit back in March, but then I faced making a slightly ugly tradeoff due to the specification, which assumes a DOM-building parser, not being fully compatible with streaming parsing. While thinking about how to resolve that, I eventually got swamped by other work. See the commit message for some detail on what I chose – but it's ultimately just some comments on esoteric HTML error recovery behavior.
I tried to parse the github page of my project (https://github.com/yannham/mechaml/) using Lambdasoup, but I got an underlying unexpected error from Markup.ml. When I type in a REPL (utop)
Soup.read_file "github.html" |> Soup.parse
where github.html is a dump of the previously given github page, I get
While I expected Lambdasoup and Markup.ml to fail quietly
on invalid HTML5, or at least not to fail with an uncaught exception.
Here is a snapshot code of the incriminated version of the page
The text was updated successfully, but these errors were encountered: