-
Notifications
You must be signed in to change notification settings - Fork 39
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
10 changed files
with
3,389 additions
and
0 deletions.
There are no files selected for viewing
1,580 changes: 1,580 additions & 0 deletions
1,580
notebooks/L6/advanced-data-processing-with-pandas.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Debugging your Python code\n", | ||
"\n", | ||
"Debugging your code can be time consuming, and probably already uses most of the time you devote to working on it. There is no way to avoid spending time fixing bugs, especially when you're learning to program and simply may not yet know how to solve a given programming problem. That said, there are ways to be more effective when debugging, which can save you time and frustration. Below, we review some tips for debugging.\n", | ||
"\n", | ||
"## Source\n", | ||
"\n", | ||
"This lesson is based in part on the [Software Carpentry group's lesson on debugging](http://swcarpentry.github.io/python-novice-inflammation/09-debugging/).\n", | ||
"\n", | ||
"## Test your code with known outputs\n", | ||
"\n", | ||
"One of the biggest challenges to debugging your code once you solve the syntax issues is knowing whether or not the code actually works like it should. In order to be able to assess this, we need to know the \"answer\" the code should produce. In many cases this means *some form* of calculating a known value using simplified data or test cases.\n", | ||
"\n", | ||
"### Testing with a simplified data file\n", | ||
"\n", | ||
"Let's consider an example of calculating the maximum difference in daily temperature in Helsinki using observations for the past 50 years. First off, we don't know the answer in advance so we cannot simply work on the code until it gives the expected temperature difference. Secondly, we can expect that we should have more than 18 000 observations in our data file for the past 50 years, so it will be hard to confirm we get the right answer because of the size of the dataset. One thing that can be helpful here is to test your program using some small subset of the data. For instance, we could take the top 5 lines of data from the file, which might look like the following:\n", | ||
"\n", | ||
"```\n", | ||
"STATION ELEVATION LATITUDE LONGITUDE DATE PRCP TAVG TMAX TMIN \n", | ||
"----------------- ---------- ---------- ---------- -------- -------- -------- -------- -------- \n", | ||
"GHCND:FIE00142080 51 60.3269 24.9603 19520101 0.31 37 39 34 \n", | ||
"GHCND:FIE00142080 51 60.3269 24.9603 19520102 -9999 35 37 34 \n", | ||
"GHCND:FIE00142080 51 60.3269 24.9603 19520103 0.14 33 36 -9999 \n", | ||
"GHCND:FIE00142080 51 60.3269 24.9603 19520104 0.05 29 30 25 \n", | ||
"GHCND:FIE00142080 51 60.3269 24.9603 19520105 0.06 27 30 25 \n", | ||
"```\n", | ||
"\n", | ||
"From this, we know two things:\n", | ||
"\n", | ||
"1. We should expect the code to be able to handle no data values equal to `-9999`, and to not include days with no data values when calculating the maximum temperature difference (`TMAX` - `TMIN`).\n", | ||
"2. The maximum temperature difference if we test our code with this data file should be 5°.\n", | ||
"\n", | ||
"In this case, we now know that if we can get our code to return the correct answer with the small test file, perhaps the same can be done for the full dataset. In other cases, we may actually know the expected answer, in which case debugging should be a bit easier.\n", | ||
"\n", | ||
"## Make your code crash quickly and regularly\n", | ||
"\n", | ||
"This may sound silly, but it is a good thing when your code crashes the same way every time you run it. If you have different behaviors when you run your code several times without making changes to the code, it will be much more difficult to isolate the problem. What we ideally want in a code is to see behavior that is **consistent**.\n", | ||
"\n", | ||
"In addition, if you expect to debug your program efficiently, you can't afford to wait 30 minutes every time you run it in order for it to crash. If your code crashes when processing a massive data file, you can consider testing with some smaller part of the file. Does the code still crash? If not, why not? Are there some parts of the code that seem to run just fine every time? If you can reduce the time needed for a crash, and isolate where in the code the problem lies (perhaps in memory if you're dealing with really large datasets), you will save yourself time debugging.\n", | ||
"\n", | ||
"## Make small changes and track them\n", | ||
"\n", | ||
"We're teaching you to use [GitHub.com](https://github.com/) to store your work, and to commit your changes regularly. This is for two reasons. First, by keeping track of the changes, you will have a better chance of isolating a problem if you find that suddenly your code doesn't work. You can simply go back to a version of the code that worked and look at what has changed in the version that doesn't work. **This is probably the greatest thing about version control**. Secondly, if you make small changes to the code it is easier to see exactly what changed and where. When it comes to debugging, this is one of the keys to solving problems quickly.\n", | ||
"\n", | ||
"It is worth noting that often we don't keep track and commit every single small change to our codes, but rather commit when things are working as we expected. This means that when you debug, you might not keep track of every little change you make. This is fine, but it is important when you are debugging that you make small changes in one part of the code, then re-test. You should change one thing at a time, test the code, and make more changes if needed. Changing several things at once might be appealing, but it will make it harder to see exactly what is causing the problem because you can't isolate the issue to a single line of the program." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,247 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Dealing with errors\n", | ||
"\n", | ||
"## Interpreting error messages\n", | ||
"\n", | ||
"So far in the course we have encountered a number of different types of error messages in Python, but have not really discussed how to understand what the computer is trying to tell you when you get an error message.\n", | ||
"We'll do that below.\n", | ||
"For most Python errors you will see and exception raised when the error is encountered, providing some insight into what went wrong and where to look to fix it." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Reading error messages\n", | ||
"\n", | ||
"Let's imagine you've written the code below called to convert wind speeds from km/hr to m/s and you're dying to figure out how windy it is in [Halifax, Nova Scotia, Canada](https://www.theweathernetwork.com/ca/weather/nova-scotia/halifax) where they report wind speeds in km/hr.\n", | ||
"\n", | ||
"Unfortunately, when you run your script you observe the following:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"wind_speed_km = 50\n", | ||
"wind_speed_ms = wind_speed_km * 1000 / 3600\n", | ||
"\n", | ||
"print('A wind speed of', wind_speed_km, 'km/hr is', wind_speed_ms, 'm/s.)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Let's break this example down and see what the error message says.\n", | ||
"\n", | ||
"![Syntax error](img/error-message-annotated.png)\n", | ||
"*A SyntaxError, annotated*\n", | ||
"\n", | ||
"As you can see, there is quite a bit of useful information here. We have the name of the script, its location, and which line was a problem. It's always good to double check that you actually are editing the correct script when looking for errors! We also have the type of error, a `SyntaxError` in this case, as well as where it occurred on the line, and a bit more information about its meaning. The location on the line won't always be correct, but Python makes its best guess for where you should look to solve the problem. Clearly, this is handy information.\n", | ||
"\n", | ||
"Let's consider another example, where you have fixed the `SyntaxError` above and now have made a function for calculating a wind speeds in m/s.\n", | ||
"\n", | ||
"When you run this script you encounter a new and bigger error message:" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def convert_wind_speed(speed):\n", | ||
" return speed * 1000 / 3600\n", | ||
"\n", | ||
"wind_speed_km = '30'\n", | ||
"wind_speed_ms = convert_wind_speed(wind_speed_km)\n", | ||
"\n", | ||
"print('A wind speed of', wind_speed_km, 'km/hr is', wind_speed_ms, 'm/s.')" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In this case we see a `TypeError` that is part of a *traceback*, where the problem in the code arises from something other than on the line where the code was run. In this case, we have a `TypeError` where we try to divide a character string by a number, something Python cannot do. Hence, the `TypeError` indicating the data types are not compatible. That error, however, does not occur when the code is run until the point where the function is used. Thus, we see the traceback showing that not only does the error occur when the function is used, but also that the problem is in the function definition.\n", | ||
"\n", | ||
"The traceback above may look a bit scarier, but if you take your time and read through what is written there, you will again find that the information is helpful in finding the problem in your code. After all, the purpose of the error message is to help the user find a problem :)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Common errors and exceptions\n", | ||
"\n", | ||
"Now that we have some idea of how to read an error message, let's have a look at a few different types of common Python exceptions that are displayed for different program errors.\n", | ||
"\n", | ||
"#### IndexErrors\n", | ||
"\n", | ||
"An `IndexError` occurs when you attempt to reference a value with an index outside the range of values.\n", | ||
"We can easily produce an `IndexError` by trying to access the value at index 5 in the following list of cities: `cities = ['Paris', 'Berlin', 'London']`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cities = ['Paris', 'Berlin', 'London']" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cities[5]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Here we get the rather clear error message that the index used for the list `cities` is out of the range of index values for that list." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### NameErrors\n", | ||
"\n", | ||
"A `NameError` occurs when you reference a variable that has not been defined. We can produce a `NameError` by trying `station_id = stations[1]`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"station_id = stations[1]" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In this instance we receive a `NameError` because the list `stations` has not been defined, and we're thus not able to access a value in that list." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### IndentationErrors\n", | ||
"\n", | ||
"An `IndentationError` is raised whenever a code block is expected to be indented and is either not, or is indented inconsistently. Let's consider two examples below." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"for city in cities:\n", | ||
" city = city + ' is a city in Europe'\n", | ||
" print(city)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"for city in cities:\n", | ||
"city = city + ' is a city in Europe'\n", | ||
"print(city)" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In both of the examples above, an `IndentationError` is raised. In the first case, the indentation level is inconsistent. In case two, indentation is expected for the code below a `for` statement." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### TypeErrors\n", | ||
"\n", | ||
"A `TypeError` is raised whenever two incompatible data types are used together. For example, if we try to divide a character string by a number or add a number to a boolean variable, a `TypeError` will be raised." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"cities[0] / 5" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"In this case, the `TypeError` is because it is not possible to divide a character string by a number." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"#### Other kinds of errors\n", | ||
"\n", | ||
"There are certainly [other kinds of errors and exceptions in Python](https://docs.python.org/3/tutorial/errors.html), but this list comprises those you're most likely to encounter.\n", | ||
"As you can see, knowing the name of each error can be helpful in trying to figure out what has gone wrong, and knowing what these common error types mean will save you time trying to fix your programs." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### More information\n", | ||
"\n", | ||
"You can find a bit more information about reading error messages on the [Software Carpentry](http://swcarpentry.github.io/python-novice-inflammation/07-errors/) and [Python Software Foundation](https://docs.python.org/3/tutorial/errors.html) webpages." | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.7.3" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 4 | ||
} |
Oops, something went wrong.