Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UP my solution #122

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
91 changes: 65 additions & 26 deletions numpy_questions.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,51 @@
"""Assignment - using numpy and making a PR.

The goals of this assignment are:
* Use numpy in practice with two easy exercises.
* Use automated tools to validate the code (`pytest` and `flake8`)
* Submit a Pull-Request on github to practice `git`.

The two functions below are skeleton functions. The docstrings explain what
are the inputs, the outputs and the expected error. Fill the function to
complete the assignment. The code should be able to pass the test that we
wrote. To run the tests, use `pytest test_numpy_question.py` at the root of
the repo. It should say that 2 tests ran with success.

We also ask to respect the pep8 convention: https://pep8.org.
This will be enforced with `flake8`. You can check that there is no flake8
errors by calling `flake8` at the root of the repo.
"""Assignment - making a sklearn estimator and cv splitter.

The goal of this assignment is to implement by yourself:

- a scikit-learn estimator for the KNearestNeighbors for classification
tasks and check that it is working properly.
- a scikit-learn CV splitter where the splits are based on a Pandas
DateTimeIndex.

Detailed instructions for question 1:
The nearest neighbor classifier predicts for a point X_i the target y_k of
the training sample X_k which is the closest to X_i. We measure proximity with
the Euclidean distance. The model will be evaluated with the accuracy (average
number of samples corectly classified). You need to implement the `fit`,
`predict` and `score` methods for this class. The code you write should pass
the test we implemented. You can run the tests by calling at the root of the
repo `pytest test_sklearn_questions.py`. Note that to be fully valid, a
scikit-learn estimator needs to check that the input given to `fit` and
`predict` are correct using the `check_*` functions imported in the file.
You can find more information on how they should be used in the following doc:
https://scikit-learn.org/stable/developers/develop.html#rolling-your-own-estimator.
Make sure to use them to pass `test_nearest_neighbor_check_estimator`.


Detailed instructions for question 2:
The data to split should contain the index or one column in
datatime format. Then the aim is to split the data between train and test
sets when for each pair of successive months, we learn on the first and
predict of the following. For example if you have data distributed from
november 2020 to march 2021, you have have 4 splits. The first split
will allow to learn on november data and predict on december data, the
second split to learn december and predict on january etc.

We also ask you to respect the pep8 convention: https://pep8.org. This will be
enforced with `flake8`. You can check that there is no flake8 errors by
calling `flake8` at the root of the repo.

Finally, you need to write docstrings for the methods you code and for the
class. The docstring will be checked using `pydocstyle` that you can also
call at the root of the repo.

Hints
-----
- You can use the function:

from sklearn.metrics.pairwise import pairwise_distances

to compute distances between 2 sets of samples.
"""
import numpy as np

Expand All @@ -29,20 +61,21 @@ def max_index(X):
Returns
-------
(i, j) : tuple(int)
The row and columnd index of the maximum.
The row and column index of the maximum.

Raises
------
ValueError
If the input is not a numpy array or
if the shape is not 2D.
"""
i = 0
j = 0

# TODO

return i, j
if not isinstance(X, np.ndarray):
raise ValueError("Input must be a numpy array.")
if X.ndim != 2:
raise ValueError("Input must be a 2D numpy array.")
# Find the index of the maximum element
max_pos = np.unravel_index(np.argmax(X), X.shape)
return max_pos


def wallis_product(n_terms):
Expand All @@ -57,11 +90,17 @@ def wallis_product(n_terms):
Number of steps in the Wallis product. Note that `n_terms=0` will
consider the product to be `1`.


Returns
-------
pi : float
The approximation of order `n_terms` of pi using the Wallis product.
"""
# XXX : The n_terms is an int that corresponds to the number of
# terms in the product. For example 10000.
return 0.
if n_terms == 0:
return 2.0 # Wallis product starts with 2 when no terms are considered

product = 1.0
for n in range(1, n_terms + 1):
term = (4 * n**2) / (4 * n**2 - 1)
product *= term
return 2 * product
Loading