-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace tokenize with ast to parse the source code #144
Conversation
@@ -219,56 +226,154 @@ def source_line_to_tokens(obj: object) -> Dict[int, List[Dict[str, Union[str, in | |||
return line_to_tokens | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The three functions above are public but are no longer used in the code at all. Should they just stay there?
Hi @arnaud-ma, Thank you so much for your PR! We really like your After studying this PR, we realized that some of the ideas in this PR could be applied with fewer changes. This is implemented in: #148. Best, |
Hi, thanks for the update! The idea behind this PR was to completely replace the use of |
Oh, very interesting! I'd be interested in knowing whether AST resolves: #45. Either you/(Kyle and I) can compare the performance of code using only |
Given this code where import timeit
from tap import Tap
from tap.tap import NewTap
from tap.utils import _old_get_class_variables, get_class_variables
class SimpleArgumentParser(Tap):
name: str # Your name
language: str = "Python" # Programming language
package: str = "Tap" # Package name
stars: int # Number of stars
max_stars: int = 5 # Maximum stars
class NewSimpleArgumentParser(NewTap):
name: str # Your name
language: str = "Python" # Programming language
package: str = "Tap" # Package name
stars: int # Number of stars
max_stars: int = 5 # Maximum stars
if __name__ == "__main__":
number = 1000
number_cls = 100
t_new_cls = timeit.timeit(
"NewSimpleArgumentParser()",
globals=globals(),
number=number_cls,
)
t_old_cls = timeit.timeit(
"SimpleArgumentParser()",
globals=globals(),
number=number_cls,
)
t_new = timeit.timeit(
"get_class_variables(SimpleArgumentParser)",
globals=globals(),
number=number,
)
t_old = timeit.timeit(
"_old_get_class_variables(SimpleArgumentParser)",
globals=globals(),
number=number,
)
increase = (t_old - t_new) / t_old
increase_cls = (t_old_cls - t_new_cls) / t_old_cls
print(f"time new get_class_variables: {t_new / number * 1000:.4f} ms per call")
print(f"time old get_class_variables: {t_old / number_cls * 1000:.4f} ms per call")
print(f"new is {increase * 100:.2f}% faster")
print(f"time new class init: {t_new_cls / number * 1000:.4f} ms per call")
print(f"time old class init: {t_old_cls / number_cls * 1000:.4f} ms per call")
print(f"new is {increase_cls * 100:.2f}% faster") I get:
It's better, but it won't solve #45 since it is |
I made a specific PR (#149) related to performance, you can ignore this one |
Perfect! Thank you for the update! Kyle and I will discuss it next time we meet. I approved running the tests. |
Instead of parsing by hand token by token, ast returns a python object that just needs to be read. We can still make mistakes when reading this object, but no more inevitable edge cases from parsing the source code.
Closes #97
Closes #130