Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose get_json_object_options to Python #11180

Conversation

SrikarVanavasam
Copy link
Contributor

This PR exposes get_json_object_options to the Python API. Addresses #10196

@SrikarVanavasam SrikarVanavasam requested a review from a team as a code owner June 30, 2022 21:39
@github-actions github-actions bot added the Python Affects Python cuDF API. label Jun 30, 2022
@SrikarVanavasam SrikarVanavasam added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jun 30, 2022
Copy link
Contributor

@shwina shwina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Nice work, @SrikarVanavasam!

@shwina
Copy link
Contributor

shwina commented Jul 6, 2022

OK to test

@ajschmidt8
Copy link
Member

ok to test

bdice
bdice previously requested changes Jul 6, 2022
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @SrikarVanavasam, thanks for the PR. I have a few suggestions for improvement.

python/cudf/cudf/_lib/cpp/strings/json.pxd Outdated Show resolved Hide resolved
python/cudf/cudf/_lib/strings/json.pyx Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/string.py Outdated Show resolved Hide resolved
python/cudf/cudf/core/column/string.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_string.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_string.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_string.py Show resolved Hide resolved
@ajschmidt8
Copy link
Member

rerun tests

@ajschmidt8 ajschmidt8 added non-breaking Non-breaking change and removed non-breaking Non-breaking change labels Jul 6, 2022
@ajschmidt8 ajschmidt8 changed the title Expose get_json_object_options to Python Expose get_json_object_options to Python Jul 6, 2022
@ajschmidt8 ajschmidt8 changed the title Expose get_json_object_options to Python Expose get_json_object_options to Python Jul 6, 2022
@SrikarVanavasam SrikarVanavasam requested a review from bdice July 6, 2022 20:29
@codecov
Copy link

codecov bot commented Jul 6, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@f9ebaf1). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #11180   +/-   ##
===============================================
  Coverage                ?   86.30%           
===============================================
  Files                   ?      144           
  Lines                   ?    22699           
  Branches                ?        0           
===============================================
  Hits                    ?    19590           
  Misses                  ?     3109           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f9ebaf1...3b9f14f. Read the comment docs.

@SrikarVanavasam SrikarVanavasam requested a review from shwina July 7, 2022 16:31
@vyasr
Copy link
Contributor

vyasr commented Jul 11, 2022

@gpucibot merge

@vyasr vyasr dismissed bdice’s stale review July 11, 2022 21:45

Multiple approving reviews

@vyasr
Copy link
Contributor

vyasr commented Jul 11, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 002cb1c into rapidsai:branch-22.08 Jul 11, 2022
Comment on lines +3132 to +3155
def test_string_get_json_object_allow_single_quotes():
gs = cudf.Series(
[
"""
{
"store":{
"book":[
{
'author':"Nigel Rees",
"title":'Sayings of the Century',
"price":8.95
},
{
"category":"fiction",
"author":"Evelyn Waugh",
'title':"Sword of Honour",
"price":12.99
}
]
}
}
"""
]
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps prefer

@pytest.fixture
def gs():
     return cudf.Series(...)


def test-string_get_json_object_allow_single_quotes(gs)
    ...

To avoid repeating the definition of the series object in each test.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion. @vyasr would this be in line with the recommendations for writing test? Or would this be better written in a pytest-case?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @wence-'s suggestion is already a big improvement here. I would indeed prefer to use cases, but before making the argument for, I want to be clear on the fact that the difference between cases and fixtures is almost entirely one of semantics and providing nicer syntactic sugar. @wence- perhaps the discussion will help you give me some suggestions on how to improve with regards to your comment on my other PR, or maybe it'll provide you ammunition for a counterpoint that makes me change my mind!

Here's how I would write something like this with cases:

def json_object_case_single_quotes():
    return cudf.Series(...)

@parametrize_with_cases(gs, cases=".", prefix="json_object_case_")
def test_string_get_json_object(gs):
    ....

@parametrize_with_cases("gs", cases=".", prefix="json_object_case_")
def test_string_get_json_object(gs):
    ....

@parametrize_with_cases("gs", cases=".", prefix="json_object_case_")
def test_string_get_json_object_strip_quotes(gs):

Some notes:

  • I've included the cases in the same file here for brevity, but for a test file test_foo.py cases will be automatically detected from test_foo_cases.py. This has the benefit of enforcing some organization that you would otherwise have to do manually. The alternatives are cluttering the test file (or worse, conftest.py).
  • Cases are designed for parametrization, whereas fixtures are not. By that, what I mean is that a test runs on a single fixture, not many. (pytest-cases actually allows you to relax this restriction in some ways, but not as naturally as with cases. Also, it leads to cognitive dissonance since that's a way in which pytest-cases behaves differently from pytest rather than just adding functionality, so it could confuse users)
  • With cases you can select in various ways: prefix is an obvious choice, but you can also use globs or tag specific cases, for instance. That makes it much easier to create a suite of cases for a particular function, then write a set of tests for that function, and then create a many-to-many mapping between cases and the tests that those cases should be used for.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So let me see if I understand. In this instance, I think there is no difference in the behaviour. So I will try and explain what I think and then provide an example that shows differences?

A pytest.fixture is requested by a test by name.

@pytest.fixture
def foo():
     pass

def test_foo(foo):
     # fixture foo is invoked

So any parametrisation of fixtures happens by params:

@pytest.fixture(params=[1,2,3])
def foo(request):
     return request.param

def test_foo(foo):
    # invoked three times with foo=1, foo=2, foo=3

In the example we're discussing, the object we're testing is not parameterised by anything, so we're using a fixture for OAOO, and to avoid eager initialisation. This is better than just defining a module-level variable in a number of ways (including that the object is recreated for each test in turn, so stateful modifications between tests are not a problem).

In contrast, I think for the pytest-cases approach if I want a test argument to take multiple values I would write separate cases:

def my_case_the_first():
     ...

def my_case_the_second():
    ...

@parameterize_with_cases("foo", cases=".", prefix="my_case_")
def test_foo(foo):
    ... # test is called twice, first with foo=my_case_the_first(), then with foo=my_case_the_second()

So I can create a bunch of test cases and then select how I use them in my test functions.

So now my fixtures can have descriptive names, but I can bind them locally in the test to something short: I agree this makes fixtures more self-documenting.

I guess the other big thing that pytest-cases brings is that it augments the algebra of parameterized fixtures to union of sets rather than only cartesian product (though we're not using that here).

So. In sum, I can see the advantages for cases in the general case (hah!) when building lots of fixtures that one will use for many test functions (and need to manipulate in various ways).

In this particular instance, I'm not sold on it, since it seems to introduce more boilerplate than really necessary. Perhaps just using @fixture from pytest-cases rather than pytest would be reasonable in this case.

Copy link
Contributor

@vyasr vyasr Jul 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular instance, I'm not sold on it, since it seems to introduce more boilerplate than really necessary.

Just to clarify, the boilerplate that you're referring to is the need to decorate each function with parametrize_with_cases, right? And there's not much benefit here because 1) we only have a single case, and 2) the exact same case is used in a large number of tests? If so, I think that I agree.

Using some of your points, let's try to come up with some reasonable guidelines for when using cases vs fixtures.

Use fixtures when:

  • Many tests must be run on exactly the same input
  • Many tests must be run on exactly the same set of inputs, and all of those inputs can be constructed as simple parametrizations.

Use cases when:

  • A test must be run on many different kinds of inputs
  • Given a set of cases, different tests need to run on different subsets with a nonempty intersection
  • Many tests must be run on exactly the same set of inputs, and all of those inputs can be constructed as simple complex parametrizations.

We would also need to clearly define "simple parametrization". To me, it's something where a single scalar parameter varies, and where that parameter is directly passed to some other function (like a constructor). Something like this:

@pytest.fixture(params=['a', 'b'])
def foo(request):
    if request.param == 'a':
        # Some complex initialization
    elif request.param == 'b':
        # Some other complex initialization

would constitute a fixture that should instead be implemented as separate cases.

Do these align with your expectations @wence- ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think that's what I expect. Although I note that the last two points of the "use X when:" lists are the same, was the cases one meant to say "complex parameterizations" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that is what I meant. Fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants