Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backcompat.py throws bs4 warning #105

Closed
wumpus opened this issue Jun 9, 2018 · 9 comments · Fixed by #106
Closed

backcompat.py throws bs4 warning #105

wumpus opened this issue Jun 9, 2018 · 9 comments · Fixed by #106

Comments

@wumpus
Copy link

wumpus commented Jun 9, 2018

from bs4 import BeautifulSoup

parser = BeautifulSoup('<data></data>').data

spews:

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("lxml"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

Perhaps you want to use the same default as parser.py of 'html5lib' ?

Also parser.py has an except FeatureNotFound block that calls BeautifulSoup(doc) and will also generate this ugly warning.

@kartikprabhu
Copy link
Member

backcompat.py does not do any parsing explicitly.

Those warnings are generated by BeautifulSoup if it does not find the parser specified, which is what you seemed to have used directly.

mf2py defaults to the user-specified parser or to html5lib. If neither works it just defers to BeautifulSoup.

@wumpus
Copy link
Author

wumpus commented Jun 11, 2018

I don't understand your comment. I quoted the line in backcompat.py that creates a bs parser. "Deferring to BeautifulSoup" causes the warning. This warning is new, it's intended to get everyone to change their code to specify a parser to use.

@kartikprabhu
Copy link
Member

@wumpus Ah! yes sorry I got a bit confused. Will fix in the next update.

Thanks!

@wumpus
Copy link
Author

wumpus commented Jun 11, 2018

Thank you!

@kartikprabhu
Copy link
Member

self-note:

possible resolution: put back the older code from

data = doc.new_tag('data')

BS documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/#navigablestring-and-new-tag

and recheck if the same html parser is then used with this change.

@kartikprabhu
Copy link
Member

After a bit more thinking here are possible ways to fix this each with some drawbacks
cc: @kevinmarks @sknebel any suggestions are appreciated

  1. use new_tag method to create the new <data> element. This method only exists on the main BS doc object and will fail if the user passes a BS element instead to the parser. This is why the older code from

    data = doc.new_tag('data')
    was changed.

  2. Specify html5lib directly while creating the <data> element. This has the disadvantage that now the default parser is declared in multiple locations and has the risk of going out-of-sync. Also, this will ignore any other parser the user specifies.

Not sure what the way out is.

@wumpus
Copy link
Author

wumpus commented Jun 14, 2018

backcompat never paid any attention to the user-specified parser and it only parses that one string. I don't think adding a default of html5lib will cause any harm.

@kartikprabhu
Copy link
Member

@wumpus the trouble with that is if someone does not have html5lib installed it will throw an error and stop parsing. At least right now it only throws a warning but uses whatever parser it can find.

@wumpus
Copy link
Author

wumpus commented Jun 15, 2018

ok then put a try/except block around it similar to your other code. (I have no idea what parsers are installed by default, etc.)

kartikprabhu pushed a commit to kartikprabhu/mf2py that referenced this issue Jun 15, 2018
kartikprabhu pushed a commit to kartikprabhu/mf2py that referenced this issue Jun 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants