Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data_classes): return empty dict or list instead of None #4606

Merged

Conversation

ericbn
Copy link
Contributor

@ericbn ericbn commented Jun 22, 2024

Issue number: #2605

Summary

Changes

This simplifies the code internally and also for users.

Also wrap all headers in CaseInsensitiveDict from requests.

These changes replace the need of utility functions like get_header_value, get_query_string_value or get_multi_value_query_string_values, which are removed.

User experience

Before:

encoding = event.get_header_value(name="accept-encoding", default_value="")
params = event.query_string_parameters or {}
id = params["id"]

After:

encoding = event.headers.get("accept-encoding", "")
id = event.query_string_parameters["id"]

Checklist

If your change doesn't seem to apply, please leave them unchecked.

Is this a breaking change?

RFC issue number:

Checklist:

  • Migration process documented
  • Implement warnings (if it can live side by side)

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Disclaimer: We value your time and bandwidth. As such, any pull requests created on non-triaged issues might not be successful.

@ericbn ericbn requested a review from a team as a code owner June 22, 2024 18:37
@boring-cyborg boring-cyborg bot added documentation Improvements or additions to documentation event_handlers labels Jun 22, 2024
@pull-request-size pull-request-size bot added the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Jun 22, 2024
This simplifies the code internally and also for users.

Also wrap all headers in CaseInsensitiveDict from requests.

These changes replace the need of utility functions like
get_header_value, get_query_string_value or
get_multi_value_query_string_values, which are removed.
@ericbn ericbn force-pushed the return-empty-instead-of-none branch from 722d27f to c728d2f Compare June 22, 2024 18:54
@leandrodamascena
Copy link
Contributor

Hello @ericbn! I am positively surprised by this PR. This is something we really need to fix in V3 and it would take us days to do it, but I'm glad you're shortening the path!

I started taking a quick look at this PR and it looks like we're headed in the right direction! I saw small things that perhaps we can improve or understand better. I will start reviewing this PR on Monday and hope to get it merged next week!

Really thanks for this amazing work! 🚀

@leandrodamascena
Copy link
Contributor

Updating this PR: I started the review last week, but as there are a lot of details I plan to finish it this week.

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ericbn! I have started reviewing this PR and have initial feedback that may be a block for further review. Can you check my comments and address them?

I would like to inform you that reviewing this PR will be an excellent source of new learning for me, I am very excited about it. Thanks!

CHANGELOG.md Outdated Show resolved Hide resolved
aws_lambda_powertools/event_handler/appsync.py Outdated Show resolved Hide resolved
aws_lambda_powertools/utilities/data_classes/alb_event.py Outdated Show resolved Hide resolved
ericbn added 2 commits July 4, 2024 10:03
This is hopefully a simpler implementations that the requests' package
one, but still had to be minimally complex to be complete.
@ericbn
Copy link
Contributor Author

ericbn commented Jul 4, 2024

@leandrodamascena, PR updated.

@leandrodamascena
Copy link
Contributor

@leandrodamascena, PR updated.

Perfect @ericbn! I'll continue the review today!

@leandrodamascena
Copy link
Contributor

Just a quick note: I'm still reviewing this PR.

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did another review. It will take a few more rounds before we approve it because this is huge and there are a lot of changes.

One thing I need to test a little more is where we return CaseInsensitiveDict instead of self.get("headers"), for example. This is returning the dictionary without lower in the keys, correct? Or do you always do lower?

Thank you very much for your patience, Eric. This PR has a lot of improvement for Powertools v3, it is, without a doubt, one of the great milestones of this Major release.

@ericbn
Copy link
Contributor Author

ericbn commented Jul 10, 2024

@leandrodamascena, I've updated the PR with the changes you proposed. Super productive conversation during the review BTW!

@leandrodamascena
Copy link
Contributor

@leandrodamascena, I've updated the PR with the changes you proposed. Super productive conversation during the review BTW!

Thanks for addressing them so quickly! And I say the same about the conversation. Working with open source projects sometimes take time to approve things, but working collaboratively makes things easier. I will continue the review. 🚀

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @ericbn! I left some other comments but we're almost there!
There's only one thing I'm testing in the CaseInsensitiveDict class, but I don't think we'll have any new thing.

I hope to merge this PR this week! Thanks a lot!

@ericbn
Copy link
Contributor Author

ericbn commented Jul 23, 2024

Hi @leandrodamascena.

Some key points worth considering regarding your comments. In no particular order:

  • Code is supposed to communicate intent to other programmers reading it later. I like to think that doing my_dict["key"] communicates "I expect 'key' to exist in my_dict" and doing my_dict.get("key") communicates "'key' might not exist in my_dict sometimes and it's fine"...
  • We should get an unhandled exception when something unexpected happens, at least in the Python mindset (other languages like JavaScript were started with the idea of failing silently on the other hand...)
  • Failing fast is better than hiding errors.
  • There's a preference to use the "Easier to Ask Forgiveness Than Permission" than the "Look Before You Leap" style in Python.

This is a good example of the first points above:

assert parsed_event.headers["Authorization"] == "value"

It's saying "I expect the Authorization header to exist and to have a value equal to 'value'". If the header didn't exist, the test would fail with KeyError. If the header did exist and had a different value, the assertion would fail with AssertionError. If it was a get instead, we would always get an AssertionError, but lose the ability to distinguish between "header did not exist" and "header exists and has a different value". Also, we would be delaying the error to the assert statement, arguably hiding the original cause of the error. And using get instead wound, I think, communicate a confusing intent.

How about using .get()? So we prevent the customer from having problems with KeyError.

I didn't use get in the cases where I thought getting a KeyError would be a better outcome than hiding the fact the the header didn't exist and replace that by a default value. I also didn't use get where the key is expected and would make the subsequent code fail otherwise. This is another good example of some of the points above:

api_key = app.current_event.headers["X-Api-Key"]
todos: Response = requests.get(endpoint, headers={"X-Api-Key": api_key})
todos.raise_for_status()

If we used get here instead, we would make a call to the endpoint with an empty X-Api-Key. This would fail at the moment of trying to do the request, which hides the original cause of the error: the X-Api-Key header didn't exist in app.current_event.headers.

-return self.request_headers.get("x-api-key", "")
+return self.request_headers.get("x-api-key") or ""

Here it's good to think about the difference between my_dict.get("key" , "") and my_dict.get("key") or "". The former will return "" when "key" is not found and the latter will result in "" when "key" either was not found or had itself an empty value (None, "", etc.). If the key existed and had a value "", doing "" or "" is redundant. And header values cannot be None by definition, so None or "" will never happen. Given all this, the latter is overkill and arguably communicates a confusing intent.

I think we're using d.get(key) or {} in the data classes to avoid having to think about the nature of each specific code and keep the methods homogeneous. Not doing so would have the risk of bad copy-and-paste of a d.get(key, {}) code when a d.get(key) or {} was expected because the new code has a different nature than the code copied from.

The specific case here has a clear nature: since header values cannot be None, we can generalize all header get code to header.get(key, "") when an empty default value is better than failing because the key was not found.

Does it make sense and can we revisit your comments with these ideas in mind?

Copy link
Contributor Author

@ericbn ericbn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of places where we could do "key" not in headers instead, now that this is possible with the new API.

@leandrodamascena
Copy link
Contributor

Does it make sense and can we revisit your comments with these ideas in mind?

You have great strengths here, @ericbn! Thank you for sharing so much knowledge and experience, I agree with the experience and intention to fail first. My argument was because we could prevent the client from failing in the Lambda environment, as it might be an event-driven architecture, for example, and failing because of a key is the kind of thing that could perhaps be avoided. On the other hand, if the code does not fail, it may create some side effects in the business logic.

Anyway, if the customers wants to fail silently or avoid exception, there is still the option to use get and solve the problem.

I'm going to resolve all comments.

Copy link

Copy link
Contributor

@leandrodamascena leandrodamascena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ericbn! Thank you for your patience after the long back and forth until we got here. This PR is approved and we plan to create an alpha release of V3 this week!
Once again, thank you very much for your work here!

@leandrodamascena leandrodamascena merged commit 1fa7773 into aws-powertools:v3 Jul 29, 2024
9 checks passed
@ericbn ericbn deleted the return-empty-instead-of-none branch July 29, 2024 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation event_handlers size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
2 participants