-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing function help with multiline #448
Comments
The docstring parsing code is in https://github.com/google/python-fire/blob/master/fire/docstrings.py Multiline rst args should be handled by python-fire/fire/docstrings.py Lines 468 to 469 in d44d33d
You may have identified a bug. |
I want to work on this. First time contributor here. Was looking into the What do you guys think? |
The first step would be to add one or more failing test cases, e.g. inspired from the comment above #448 (comment) The test cases would go in https://github.com/google/python-fire/blob/master/fire/docstrings_test.py Notice at line 253 there's "# TODO(dbieber): Add parameters for variations in whitespace handling." One possibility that comes to mind is that the first line might be shorter than subsequent lines because it includes the arg name as a prefix, and so we'd get weird formatting if we were to simply keep all the newlines as is (but strip that prefix). But other times, keeping newlines is key, e.g. if there's ascii art in the docstring or other spacing-specific content. |
I recreated the issue on my end and then resolved it with the code I mentioned in the comment above . I don't think this is the real fix rather a quick fix.
I was thinking along the same lines as you were and I was using \n to join lines always. But when I ran the docstring tests, 3 of them failed. The tests were Here is one of the failed tests:
I think that using \n to join always is a good use. In fact, I would argue that the test is wrong. The test docstring is
But the test expects this output:
Notice how the second parameter does not split into the next line? If we use \n to join, in my opinion, it would bring in the new line like how it is in the docstring. I tried looking into it further and found that when we do not use \n, the lines are stripped and stored as one long string. If we do use line breaks, then the code would split it into 2 or more different strings/lines which while joining back should use \n. I was running late working on it last night and wanted to update you guys with something so I made the quick fix. I think the best way to go about it would be to always use \n to join lines like you said and edit the 3 test cases to pass only when the new line is present. Also, I could add the test case for this issue as you mentioned. I am not 100% sure since I do not know all the intricacies of the docstring and also this is my first time contributing ever. What do you think of everything mentioned above? also can you assign this issue to me? |
Btw, the other 2 tests that failed also fail the same way. |
I agree that it would be OK to adjust the behavior of those 3 tests. I do have one concern about simply always using \n, and that's that it might lead to poor formatting in some simple cases. Maybe this is okay. I wonder if there's an approach that gets the best of both approaches though. Wdyt? |
I will work on updating the tests. As far as your concerns, this is how I think it would work; Please correct me if I am wrong.
Right now, the code extracts the docstring and splits by \n. so if the arg name is long or the description starts on the same line, it would be extracted as a really long string. And when we are printing it out, it would print out that long string which would preserve the original docstring. So if we are joining the lines always by \n this would not change since the line is just one huge string.
in this case, the docstring is extracted and split by \n so every line is a string inside a list. the code right now would join them using ' ' which does not preserve the original docstring. joining by \n would create those line breaks. I may be missing something. But I think that \n always is the way to go here and we would not have any issues.
if you plan to expand your docstring capabilities to handle more nuances, I think one way to go about it is what I did for the quick fix. I did it initially to pass the docstring test but I think we can use it. :P I created a new parameter for |
See the part of the logic where Edit: ignore this comment. line_info.remaining is still logically a whole line. I thought it was only the part after the arg, but I was wrong. The part after the arg is named "second" in the code e.g. at line 397 as linked in the next comment, and (from a cursory glance) seems to only apply to the Google docstring format. |
It looks like for the RST format, whole lines are used are used so my concern would not arise. For the Google format, the concern I have arises at python-fire/fire/docstrings.py Line 397 in d44d33d
For numpy, if I'm thinking about this correctly, whole lines are used so my concern would not arise. |
Can you give me an example of where your concern is happening? I'm slightly confused about the part where you said it would arise. I tried to use the google docstring test string to test it out and tried some of my variations, making the string long and spanning multiple lines, but it seems to work. Please review the changes proposed in the PR below. if you can give me a test case where your concern will be caused, it will help me understand it better and fix it. |
👍 Here's an example: def make_function_maker_handler_event(function_maker_handler_event_factory):
"""This is a Google-style docstring with long args.
Args:
function_maker_handler_event_factory: The function-maker-handler event factory
responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically.
""" I haven't double-checked this, but I expect if we use the \n approach this will show up in the help as: The function-maker-handler event factory
responsible for making the function-maker-handler event. Use this whenever
you need a function-maker-handler event made for you programmatically. with the first line awkwardly short. |
True! but my question then would be how would we want to handle it? I was thinking we could set a certain number of letters that would determine if the arg name is long and then if len(arg) > our number, we can join the first and second line of the description but wouldn't it be too long if we joined it with the next line? For example:
Merry Christmas! |
Merry Christmas indeed!
I haven't been able to think of a perfect solution. More thoughts below:
This seems reasonable to me. Let's think this through a little more. Does the content of the second line matter for deciding whether to join the first line to the second? I think it might: if the second line is a blank line, maybe we don't join them. (My reasoning here is that if the second line is blank, then the blank line is likely deliberate formatting from the docstring author.) Also what if the second line is reallly long? Then maybe we don't join then as well. (My reasoning here is that if the second line is really long, (a) the formatting might be intentional, e.g. (b) the docstring author clearly isn't putting text on the next line (line 3) simply because a line (line 2) gets too long, so that's probably not what happened w/ line 1.) Maybe the heuristic is:
Continuing to brainstorm:
What do you think of this approach? Too complex? Seem reasonable? |
Yup, I also was thinking along the same lines, we need a max line length.
This logic seems reasonable to me. For the example you had provided earlier, it would merge the lines right? that would make the merged line too long; is that okay if our output exceeds the max line length?
Since this is just with Google format, is there a max line limit Google says we should use? I am not sure if this is official but here it says 80. I like this approach but I am not sure if this is a perfect solution. I will start working on this tomorrow! |
Yes, Google style uses 80 character as the max line length. Fire's goal is to work well with any Python software though (within reason), so I'd be inclined to do something a little more general than forcing 80 for the Google-style docstrings. |
Sorry, I got a little busy with some other stuff. I was reviewing the heuristic again and I noticed
For this example, the logic we agreed on will give out the output like this:
The first word of line2 can easily fit in line1 so it would think that line1 is intentionally short. If the arg length is more than 50% of "max line length" then the first word of line2 would ALWAYS fit comfortably on line1. I think we can just do:
Does that make sense? Also we could do something like if line1 length + arg length is hitting our "max line length" then we can check if line2 isn't intentionally short or long and then merge line 1 and line 2. I'm trying to think more about it. Do we need to check if line2 isn't intentionally short? My reasoning is that even if line2 is short, merging line1 and line2 would probably not matter in most use cases... I think we should only check if line2 isn't blank and isn't intentionally long. Wdyt? |
This check would need to take place before removing the arg. So
No worries! We move at a glacial pace in developing Fire these days :) 🧊. |
Can you direct me where I would need to implement the checks? I was thinking of doing it after extracting the arg but now I am thinking we should do it right after the lines are created in the
I'm surprised you are replying during the holidays 🤩 Don't forget to enjoy them! |
First, I think the heuristic might warrant a slight addition: If the arg length (measured by the position of the ":") is more than 50% of the "apparent max line length" My thinking for this addition is: if the first line is really long already, it doesn't make sense to merge it onto the next line. Here's my overall view of the heuristic:
Haven't had a chance to properly figure this out yet, but thought I'd just list out my thoughts anyway in case that's helpful. Right now iirc the algorithm works by consuming one line at a time.
To determine these, we'll need to maintain some state.
As we consume lines, we populate these four fields. Then once the full section is consumed we can calculate: apparent_max_line_length = roundup(max_line_length, 10) # made up function; not a real thing yet
line1_intentionally_short = (line1_length + line2_first_word_length) < apparent_max_line_length
line2_intentionally_short = (line2_length + line3_first_word_length) < apparent_max_line_length
line1_intentionally_long = line1_length > 1.05 * apparent_max_line_length
line2_intentionally_long = line2_length > 1.05 * apparent_max_line_length So, where do the changes actually go? We define all the state we're going to track up front here: python-fire/fire/docstrings.py Lines 166 to 179 in d44d33d
Most of the changes would likely take place in _consume_google_args_line. It's not obvious to me where to put the post-section processing (i.e. computing apparent_max_line_length, line1_intentionally_short, etc.); that's going to take a bit more thought. If you have ideas, would love to hear them! |
Ah, might have spoke too soon about where to define the state. Looks like the state for Google-style args is actually defined here, not with the rest of the parser state: python-fire/fire/docstrings.py Lines 295 to 298 in d44d33d
|
Cool, I think I understand the heuristic. I was looking at I'm deciding on how to identify the lines. Right now, I am thinking I can
python-fire/fire/docstrings.py Lines 390 to 394 in d44d33d
I feel like this might not be the best way to identify lines. Thoughts?
I was thinking we could do this at the end of the |
Please review PR #476. I decided to define the states up front where you had initially mentioned. This is because
I was wrong about this. The post-processing is done after the lines are consumed. I did refine the heuristic a little bit handling null values where I think they could occur (mainly if line2 or line3 are null). The code passes all the test cases. |
I was running pylint and pytest and all my lines of code are too long for pylint 😅. I also encountered an issue while running pytest with asyncio. I can fix that issue as well. It shouldn't be that hard. |
Took a skim and left two comments. Will come back to it later. |
Thank you! I am working on it; I have a question for you tho.
if we take the above example, the line length is 81 so rounding up to 90 would make the arg_length (39) not meet our criteria of arg_length being longer than 50% of line_length. Is that what we want? I'm second-guessing rounding up now. maybe we should round closer to 80 or 120. I do not know if this is just a case I made up since most likely authors will follow the docstring guidelines Another thing I noticed:
When I add an enter or new line between the 2 parameters, I get the below as the output:
It merges all the lines for the first argument. I am not sure why this is happening. Also, idk if this is against the google docstring rules? |
Good point. Maybe we adjust the threshold to 40%. I don't think it's critical; the resulting text is going to be formatted a bit wonkily either way. The important thing is the threshold is less than 70-80% because that's where it becomes really clear that merging is better than not.
Sounds like a bug that will require a closer look at your code to investigate. (Not doing that just this second but can take a look later if still needed then.)
iirc I don't think new lines are recommended in the google style guide, but in my opinion we should aim to handle this reasonably anyway. If it was working prior to this change, we should aim to ensure it stays working. If it already wasn't working, we can fix it as a separate change, it doesn't have to be part of this change. |
Cool will make the threshold 40%.
Before this change, all the lines were just joined with
The other error I encountered with pytest was that @asyncio.coroutine is depreciated. We can use the I was thinking we can just do an I do not have much idea about asyncio and how it is being utilized for Fire (besides that it is used for testing 😂), but I came across this as one of the main differences between them. |
Merged #440 (🔲 going forward we will want to change the naming though / possibly reintroduce w/ the guard as you suggest) |
Awesome! I am adding some test cases and I noticed that when the Another test case led to adding a check for if line3 is an arg or not (when the description is 2 lines or less.) to further refine the code. |
Please review #476 |
Can I have flexible line breaks in the help comments?
this is the help doc
printed help I got
printed help I expacted
The text was updated successfully, but these errors were encountered: