backcompat.py throws bs4 warning #105

wumpus · 2018-06-09T17:07:34Z

from bs4 import BeautifulSoup

parser = BeautifulSoup('<data></data>').data

spews:

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

Perhaps you want to use the same default as parser.py of 'html5lib' ?

Also parser.py has an except FeatureNotFound block that calls BeautifulSoup(doc) and will also generate this ugly warning.

The text was updated successfully, but these errors were encountered:

kartikprabhu · 2018-06-11T16:06:00Z

backcompat.py does not do any parsing explicitly.

Those warnings are generated by BeautifulSoup if it does not find the parser specified, which is what you seemed to have used directly.

mf2py defaults to the user-specified parser or to html5lib. If neither works it just defers to BeautifulSoup.

wumpus · 2018-06-11T17:23:50Z

I don't understand your comment. I quoted the line in backcompat.py that creates a bs parser. "Deferring to BeautifulSoup" causes the warning. This warning is new, it's intended to get everyone to change their code to specify a parser to use.

kartikprabhu · 2018-06-11T17:26:07Z

@wumpus Ah! yes sorry I got a bit confused. Will fix in the next update.

Thanks!

wumpus · 2018-06-11T17:26:34Z

Thank you!

kartikprabhu · 2018-06-11T22:03:21Z

self-note:

possible resolution: put back the older code from

mf2py/mf2py/backcompat.py

Line 207 in 65c3699

data = doc.new_tag('data')

BS documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring-and-new-tag

and recheck if the same html parser is then used with this change.

kartikprabhu · 2018-06-14T15:43:46Z

After a bit more thinking here are possible ways to fix this each with some drawbacks
cc: @kevinmarks @sknebel any suggestions are appreciated

use new_tag method to create the new <data> element. This method only exists on the main BS doc object and will fail if the user passes a BS element instead to the parser. This is why the older code from

mf2py/mf2py/backcompat.py

Line 207 in 65c3699

data = doc.new_tag('data')

was changed.
Specify html5lib directly while creating the <data> element. This has the disadvantage that now the default parser is declared in multiple locations and has the risk of going out-of-sync. Also, this will ignore any other parser the user specifies.

Not sure what the way out is.

wumpus · 2018-06-14T17:03:42Z

backcompat never paid any attention to the user-specified parser and it only parses that one string. I don't think adding a default of html5lib will cause any harm.

kartikprabhu · 2018-06-14T22:49:06Z

@wumpus the trouble with that is if someone does not have html5lib installed it will throw an error and stop parsing. At least right now it only throws a warning but uses whatever parser it can find.

wumpus · 2018-06-15T19:07:36Z

ok then put a try/except block around it similar to your other code. (I have no idea what parsers are installed by default, etc.)

kartikprabhu added bug backcompat nice-starter-issues labels Jun 11, 2018

kartikprabhu removed the nice-starter-issues label Jun 12, 2018

kartikprabhu pushed a commit to kartikprabhu/mf2py that referenced this issue Jun 15, 2018

make backcompat use the same html parser. fixes microformats#105

2dfcd59

kartikprabhu pushed a commit to kartikprabhu/mf2py that referenced this issue Jun 16, 2018

make backcompat use the same html parser. fixes microformats#105

83723fb

kartikprabhu mentioned this issue Jun 16, 2018

new version 1.1.1 #106

Merged

kevinmarks closed this as completed in #106 Jul 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

backcompat.py throws bs4 warning #105

backcompat.py throws bs4 warning #105

wumpus commented Jun 9, 2018 •

edited

Loading

kartikprabhu commented Jun 11, 2018

wumpus commented Jun 11, 2018

kartikprabhu commented Jun 11, 2018

wumpus commented Jun 11, 2018

kartikprabhu commented Jun 11, 2018

kartikprabhu commented Jun 14, 2018

wumpus commented Jun 14, 2018

kartikprabhu commented Jun 14, 2018

wumpus commented Jun 15, 2018

backcompat.py throws bs4 warning #105

backcompat.py throws bs4 warning #105

Comments

wumpus commented Jun 9, 2018 • edited Loading

kartikprabhu commented Jun 11, 2018

wumpus commented Jun 11, 2018

kartikprabhu commented Jun 11, 2018

wumpus commented Jun 11, 2018

kartikprabhu commented Jun 11, 2018

kartikprabhu commented Jun 14, 2018

wumpus commented Jun 14, 2018

kartikprabhu commented Jun 14, 2018

wumpus commented Jun 15, 2018

wumpus commented Jun 9, 2018 •

edited

Loading