Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return scan results in dataframe fixes #95 #99

Merged
merged 21 commits into from
Dec 22, 2021

Conversation

anilkulkarni87
Copy link
Contributor

This is still work in progress. The required functions have been added to the scan.py.

Changes made:

  1. measurements_to_data_frame
  2. testresults_to_data_frame
  3. scanerror_to_data_frame
  4. convert_scan_result_to_spark_data_frames --> Leverages the above functions to return a tuple of Dataframes.

The user has the option to use any of these methods depending on the usecase.

TODO:

  1. Tests for the new methods.
  2. Add as_frame flag to scan function.
  3. Tests for Schema check.

@anilkulkarni87 anilkulkarni87 marked this pull request as draft December 13, 2021 09:18
@JCZuurmond
Copy link
Contributor

Looks good to me! If you add the todo's you mentioned, we have the functionality mentioned in the issue.

pro tip if you add "fixes #95" to the PR description, the issue gets closed automagically when this PR is merged, see here

src/sodaspark/scan.py Outdated Show resolved Hide resolved
@anilkulkarni87 anilkulkarni87 changed the title Return scan resuts in dataframe Return scan resuts in dataframe fixed #95 Dec 14, 2021
@anilkulkarni87 anilkulkarni87 changed the title Return scan resuts in dataframe fixed #95 Return scan results in dataframe fixes #95 Dec 14, 2021
tests/test_scan.py Outdated Show resolved Hide resolved
tests/test_scan.py Outdated Show resolved Hide resolved
src/sodaspark/scan.py Outdated Show resolved Hide resolved
JCZuurmond and others added 4 commits December 16, 2021 09:02
* Use version range for Soda spark dependency
* Rewrite spark version range
* Add port and host attributes
Add example test for converting test results to a data frame
@anilkulkarni87
Copy link
Contributor Author

I have redefined the tests and added below methods:

  • test_measurements_to_data_frame
  • test_test_results_to_data_frame
  • test_scanerror_to_data_frame

TODO:
Add as_frame flag to execute function in scan.py
Tests for Schema check - This may not be required i think.

Copy link
Contributor

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome! Great work, this is the approach I would expect. I have some remarks about the to_dict and the `select

src/sodaspark/scan.py Outdated Show resolved Hide resolved
src/sodaspark/scan.py Outdated Show resolved Hide resolved
tests/test_scan.py Outdated Show resolved Hide resolved
@anilkulkarni87
Copy link
Contributor Author

@JCZuurmond
Add as_frame flag to execute function in scan.py
Added a test test_scan_execute_return_as_data_frame

Please review

@anilkulkarni87 anilkulkarni87 marked this pull request as ready for review December 20, 2021 22:20
Copy link
Contributor

@JCZuurmond JCZuurmond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely done! I have no major remarks, added a couple minor comments to keep the style of the code more consistent. Could mark the PR as non-draft after you finished resolving the comments?

I'll have one more look at it tomorrow when I have more time. And, I think, we should be able to merge it than.

tests/test_scan.py Outdated Show resolved Hide resolved
src/sodaspark/scan.py Outdated Show resolved Hide resolved
src/sodaspark/scan.py Outdated Show resolved Hide resolved
return out


def testresults_to_data_frame(testresults: list[TestResult]) -> DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make testresults two words? test_results, in function name and parameter

src/sodaspark/scan.py Outdated Show resolved Hide resolved
tests/test_scan.py Outdated Show resolved Hide resolved
@JCZuurmond
Copy link
Contributor

One more important thing! Could add your change to the change log, you deserve the credits for this! You can add your change to the top bullet list with a reference to this PR. And, you can add a contributors section (above the latest release and below the top bullet list) with a reference to your Github profile. See here for an example.

@anilkulkarni87
Copy link
Contributor Author

@JCZuurmond Updated and rebased the branch. Also had to update the schema for Test. All tests are now passing.
Please review and merge.

@JCZuurmond JCZuurmond merged commit a25986f into sodadata:main Dec 22, 2021
JCZuurmond added a commit that referenced this pull request Dec 22, 2021
* Return scan resuts in dataframe
* Update TestResults schema
* Add test for converting test results to a data frame
* Added Tests for Dataframes
* Add optional flag as_frame to execute
* Fix schemas for measurements and scan_error
* Update schema for Test

Co-authored-by: Cor <[email protected]>
JCZuurmond added a commit that referenced this pull request Dec 22, 2021
- Provides the ability to get the scan results in Dataframes. ([#99](#99))
  - measurements
  - test_results
  - scan_errors
- Use version range for `soda-spark-sql` dependency
- Add `host` and `port` attributes to `_SparkDialect`

Contributors:
- [Anil Kulkarni](https://github.com/anilkulkarni87) ([#99](#99))
@JCZuurmond JCZuurmond mentioned this pull request Dec 22, 2021
JCZuurmond added a commit that referenced this pull request Dec 22, 2021
- Provides the ability to get the scan results in Dataframes. ([#99](#99))
  - measurements
  - test_results
  - scan_errors
- Use version range for `soda-spark-sql` dependency
- Add `host` and `port` attributes to `_SparkDialect`

Contributors:
- [Anil Kulkarni](https://github.com/anilkulkarni87) ([#99](#99))
JCZuurmond added a commit that referenced this pull request Dec 27, 2021
- Provides the ability to get the scan results in Dataframes. ([#99](#99))
  - measurements
  - test_results
  - scan_errors
- Use version range for `soda-spark-sql` dependency
- Add `host` and `port` attributes to `_SparkDialect`

Contributors:
- [Anil Kulkarni](https://github.com/anilkulkarni87) ([#99](#99))
@anilkulkarni87 anilkulkarni87 deleted the scan-results-in-dataframe branch April 1, 2022 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants