Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Added Cerberus as a new option for item validation #201

Closed
wants to merge 85 commits into from

Conversation

vipulgupta2048
Copy link
Contributor

@vipulgupta2048 vipulgupta2048 commented Aug 20, 2019

🐣
This brings an end to a great and fulfilling period of contributing to Spidermon and the Scrapy Project as part of Google Summer of Code 2019.

Google Summer of Code 2019 with The Scrapy Project

Project - Integrate Cerberus, solves #182,
Project Description- Google Archive

Co-Org Admin - Cathal
Mentors - @rennerocha , @ejulio
Personal GSoC Blog - Mixster x GSoC
PSF Blog - https://blogs.python-gsoc.org/en/blogs/vipulgupta2048s-blog/

Description

  • Spidermon is a recommended tool for monitoring spiders created using Scrapy. The user at the time can choose between two libraries for item validation rules: jsonschema and schematics. We want to provide a third option that being Cerberus.
  • Cerberus provides powerful yet simple and lightweight data validation functionality out of the box and is designed to be easily extensible, allowing for custom validation. It has no dependencies and is thoroughly tested on several Python versions.
  • The goal of this project was to integrate, test and enable Cerberus as a new option for item validation available for the user.

Deliverables & Work Done

  1. All Code of highest quality standards having detailed documentation, black​ styling and well tested (Pull request – feature: Added Cerberus as a new option for item validation #201)
    This Pull Request Includes:
  1. Unit + integration tests for each component in place.
  2. Documentation for Cerberus Validation method. (Spidermon x Cerberus Documentation vipulgupta2048/spidermon#6)
  3. A ​detailed, well-documented tutorial​ will be developed during the course of the summers implementing almost every feature of Spidermon to help developers as a reference and blogs will be written.
  4. One blog ​each week​ regarding Spidermon and my experience, learning through the project on ​Mixster x GSoC.
  5. For the community to track progress, a ​tracker was maintained with my latest developments containing week-to-week updates​, and MoM of mentor meetings​. This helps to maintain accountability​, transparency and keeping track.​

For system testing, one could go ahead and use the pre-configured Quotes spider https://github.com/vipulgupta2048/testing_quotes and installing Spidermon from the master branch of my fork.

Looking forward to improving the source code even further, through all your valued opinions, reviews, and comments. Would love to clarify and help understand the work done.

This project has been completed with long nights of reading and writing the code, learning new concepts on the fly and asking hundreds of pop-questions on Slack, that were answered duly by my mentors @ejulio @rennerocha as without their constant help, motivation, and guidance completing this uphill task wouldn't be ever possible.

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
* Cerberus Validator : Validate Only

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Implement Translator

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Changes implemented as suggested

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Change location for validator tests

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Changes Implemented as suggested, also code formatted

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Changes implemented as suggested

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Ported to PyTest

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Changes implemented as suggested

Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>

* Remove extra lines
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
a
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
Signed-off-by: Vipul Gupta (@vipulgupta2048) <[email protected]>
.travis.yml Outdated Show resolved Hide resolved
docs/source/getting-started.rst Outdated Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
spidermon/contrib/validation/utils.py Outdated Show resolved Hide resolved
spidermon/contrib/validation/utils.py Show resolved Hide resolved
spidermon/contrib/validation/utils.py Show resolved Hide resolved
)
else:
schema = load_object(source)
if isinstance(schema, six.string_types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The documentation does not mention either that the schema may be specified as a string.

I wonder if we should remove support for this, and instead allow paths to JSON files regardless of their file extension, just as we don’t require a file extension or Content-Type header when using a URL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my perspective, I feel like adding raw schemas directly to the settings.py file as strings for validation is something users wouldn't feel like doing. Also, only my code (Cerberus Validation) has tests and docs for adding raw schemas because I thought that it would be a good feature to work with.

Bit of context behind this change:
The code is from the original file is /jsonschema/tools.py, and it was moved, tweaked, and was made available for both Cerberus and JSONSchema to use. Hence, these lines are from the original file only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rennerocha What are your thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rennerocha How should we proceed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rennerocha pinging again. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rennerocha Pinging again 🙈

setup.py Outdated Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
docs/source/item-validation.rst Show resolved Hide resolved
spidermon/contrib/validation/utils.py Show resolved Hide resolved
Copy link
Contributor Author

@vipulgupta2048 vipulgupta2048 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rennerocha, we need your comments in some reviews 😅, have you checked them out yet.

@rennerocha rennerocha added this to the 1.12.0 milestone Dec 3, 2019
@rennerocha rennerocha removed this from the 1.12.0 milestone Dec 30, 2019
@rennerocha
Copy link
Collaborator

We will not add a new item validation library as built-in in Spidermon now, so we will close this PR. If needed, it can be created as a separated package.

@rennerocha rennerocha closed this Dec 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants