Metadata API: comprehensive de/serialization testing #1391

jku · 2021-05-14T11:57:56Z

It's really hard to avoid de/serialization issues like #1389 and #1386

Can we improve the testing so a large number of different json representatations are deserialized (and then serialized), so that things like missing fields and e.g. empty containers are tested -- I'm not even talking about testing broken data (that would be nice but not first priority), just valid data in all possible forms.

This should be much more doable now that we have the smaller objects: testing the "complete API" of small classes is a much more reasonable goal than testing the whole metadata.

I don't have practical advice right now how this can be implemented without a lot of inline dictionaries...

joshuagl · 2021-05-20T06:37:07Z

Is this a place where fuzzing could add some confidence that would be harder to achieve with designed test cases? Cc @sechkova

jku · 2021-05-20T08:03:41Z

pasting a related comment from chat:

the way we currently write the tests makes seeing what is tested very difficult: the test data is constantly built in the middle of the test so you need to read the whole test very carefully to see if a specific case is handled

parameterized testing could help now that we have the smaller classes. There are test dependencies for that but even plain unittest (on python>3.4) includes SubTest that could be used:

testdata = {
    "with attr": "{'attr': 'value'}",
    "missing attr": "{}",
}
def test_data(self):
    for case, data in testdata.items():
        with self.subTest(case=case):
            # deserialize/serialize data here

The value is that A) error message will now include case="missing attr" and B) all cases are tested even on sub-test failure.

An additional decision here is what checks need to be made:

at least do a round of deserialize + serialize and compare the result
maybe also compare to an object created with constructor and not deser? This requires more test data

MVrachev · 2021-05-20T12:34:17Z

I started working on tests for that.
The way I imagined is:

The tests will focus on testing truthy cases with valid data.
Each test will test deserialization and serialization.
Every class should be tested.
Every attribute in each class should be tested against a couple of variations of valid data.
4.1 If there is a limited number of options for an attribute then test with all valid options.
4.2 If there are no limits on what value an attribute can be (for example 64-character string) then test only for a couple of them.
Each time testing an attribute with many valid options, generate random valid data.
For example to generate 64 character string containing lower case letters and digits:

''.join(random.choice(string.ascii_lowercase + string.digits) for i in range(len))

jku · 2021-05-20T15:40:04Z

My advice is to not get into detailed tests where different variations of attributes are tested, at least not yet: it's so easy to end up with more and more test code where no-one is really sure what actually gets tested.

I'd like to see readable test cases where I'm able to immediately see e.g. that Targets is tested with

empty targets dict
targets dict with data
no delegations
delegations

in a way that it's easy to add new cases when we notice that something would be useful

I'm hoping this could be accomplished so that we only define the test data multiple times, and the actual test code is always the same... You actually already do this in at least invalid delegations (EDIT: actually that may have been me). So hypothetical example:

valid_target_cases = {
    "all attributes": TODO-TEST-DATA1,
    "empty targets": TODO-TEST-DATA2,
    "no delegations": TODO-TEST-DATA3,
    # Add things here when new ideas appear
}
def test_serialization_valid_targets(self):
    for case, data in valid_target_cases.items():
        with self.subTest(case=case):
            targets = Targets.from_dict(copy.deepcopy(data))
            self.assertDictEquals(targets.to_dict(), data)
            # maybe other checks?

This is of course worth it (compared to current way of testing) only if the bits with TODO are easy to construct and the whole thing remains readable. It may be worth experimenting with just writing minimal json by hand at least in the simplest classes... but also with generating the data -- maybe using functions written for the purpose, or just by defining fragments that can be re-used in many tests (VALID_DELEGATIONS = {"keys":{VALID_KEYID: VALID_KEY}, "roles":[VALID_ROLE]}) ?

I'm waving my hands wildly as you can see (so no hard opinions here) but I do believe more comprehensive testing requires some level of parameterizing: otherwise the test code becomes a mess no-one wants to touch.

MVrachev · 2021-05-21T12:38:53Z

I have a couple of questions/observations here:

If I create simple tests aiming to test all of the possible valid (de)serialization cases then we have that, but it's spread into different tests. Each time I make an attribute optional I add a test case for that.
Then, if we want to see clear comprehensive (de)serialization tests we should delete the other tests testing that.
Examples: 71c4992, 139bfc0 and de2644f.
If we are going to move serialization and deserialization separately, I imagine this is better to be in a different test class like TestSerializationDeserialization where we could use local variables to store valid values (as you had mentioned above).
Maybe it will be even better if we move those tests in a separate module?

jku · 2021-05-27T07:09:23Z

What I would like to see in serialization testing is that each component has well defined test sets and minimal amount of code. These test datasets should be easy to read so when I open the file I can quickly check if a case is handled and if not I can easily create a new test case.

There are two main issues with the current code, and these complaints partly apply to the PR here:

understanding which cases are tested is difficult because test data is read from external files and/or modified in the test code itself
the test implementation scales badly: Adding a new case either means modifying the external file which is dangerous because it's used by many other tests (modifying it will have unexpected consequences), or adding code to the test function which makes it even more complex and harder to understand

There's probably many ways to address these. I have a quick prototype of DelegatedRole testing (it's three commits on top of this PR: one is just a generic example, one is the helper descriptor, and the last commit contains the actual DelegatedRole test): https://github.com/jku/tuf/commits/test-example

Uses a decorator to make the test code itself simpler -- now the test literally only needs to deal with one DelegatedRole
Code only contains three test cases (to match the code it's replacing) but it seems to scale acceptably well to 10 or more cases, and adding a test for invalid cases seems simple as well
The test function does not need any SetUp(), SetUpClass(), temp directories or to read files: it's self-contained
It's easy to read the test dataset: compare this to the implementation it replaces which had two bugs in the test data that I could not find before I print-debugged the test. The new test dataset probably has bugs too, but these bugs can be found by just reading the test.
I did not try to go advanced here but this model also allows more interesting things to be added: maybe we want to include an actual DelegatedRole object in the test dataset along with the json: Assuming our classes had eq() implemented, we could then compare the object instances in the test
It's obvious some classes serialized form will not fit in one line and in some cases we may want to use test data more real than the one in my example: I would expect some light data generation and using a few lines per test case could still make things acceptable. E.g. in targets test data we could use "keys": {"k1": K1, "k2": K2}, "roles": [DR1, DR2], to fill in some data that we need using constant snippets, still keeping the test dataset readable enough -- so reader can still figure out what a particular test case is testing.

I did not spend a lot of time on this so there's probably weird decisions in the code (does it make sense for the test input to be str? 🤷 does the decorator really need three nested functions? 🤷 how does one actually fill in constant snippets into json since f-strings and json both want to use double quote 🤷), please consider it a prototype

Jussi in his comment here: theupdateframework#1391 (comment) proposed using decorators when creating comprehensive testing for metadata serialization. The main problems he pointed out is that: 1) there is a lot of code needed to generate the data for each case 2) the test implementation scales badly when you want to add new cases for your tests, then you would have to add code as well 3) the dictionary format is not visible - we are loading external files and assuming they are not changed and valid In this change, I am using a decorator with an argument that complicates the implementation of the decorator and requires three nested functions, but the advantages are that we are resolving the above three problems: 1) we don't need new code when adding a new test case 2) a small amount of hardcoded data is required for each new test 3) the dictionaries are all in the test module without the need of creating new directories and copying data. Signed-off-by: Martin Vrachev <[email protected]>

The idea of this commit is to separate (de)serialization testing outside test_api.py and make sure we are testing from_dict/to_dict for all possible valid data for all classes. Jussi in his comment here: theupdateframework#1391 (comment) proposed using decorators when creating comprehensive testing for metadata serialization. The main problems he pointed out is that: 1) there is a lot of code needed to generate the data for each case 2) the test implementation scales badly when you want to add new cases for your tests, then you would have to add code as well 3) the dictionary format is not visible - we are loading external files and assuming they are not changed and valid In this change, I am using a decorator with an argument that complicates the implementation of the decorator and requires three nested functions, but the advantages are that we are resolving the above three problems: 1) we don't need new code when adding a new test case 2) a small amount of hardcoded data is required for each new test 3) the dictionaries are all in the test module without the need of creating new directories and copying data. Signed-off-by: Martin Vrachev <[email protected]>

jku added the testing label May 14, 2021

jku mentioned this issue May 14, 2021

Avoid using falsy tests for None #1390

Merged

2 tasks

MVrachev self-assigned this May 17, 2021

MVrachev mentioned this issue May 19, 2021

Metadata API: Move common metadata classes of TUF and In-toto in securesystemslib #1393

Closed

MVrachev mentioned this issue May 25, 2021

Tests refactored code: reorganize testing in test_api #1414

Closed

4 tasks

MVrachev added the backlog Issues to address with priority for current development goals label May 25, 2021

MVrachev mentioned this issue May 25, 2021

New API: Comprehensive serialization testing #1416

Merged

3 tasks

sechkova added this to the weeks22-23 milestone May 26, 2021

MVrachev mentioned this issue Jun 3, 2021

Metadata API: Tests (de)serialization with invalid arguments #1434

Closed

3 tasks

sechkova modified the milestones: weeks22-23, weeks24-25 Jun 9, 2021

jku mentioned this issue Jun 14, 2021

Improve testing using generated metadata #1444

Closed

jku closed this as completed in #1416 Jun 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metadata API: comprehensive de/serialization testing #1391

Metadata API: comprehensive de/serialization testing #1391

jku commented May 14, 2021

joshuagl commented May 20, 2021

jku commented May 20, 2021 •

edited

Loading

MVrachev commented May 20, 2021

jku commented May 20, 2021 •

edited

Loading

MVrachev commented May 21, 2021

jku commented May 27, 2021 •

edited

Loading

Metadata API: comprehensive de/serialization testing #1391

Metadata API: comprehensive de/serialization testing #1391

Comments

jku commented May 14, 2021

joshuagl commented May 20, 2021

jku commented May 20, 2021 • edited Loading

MVrachev commented May 20, 2021

jku commented May 20, 2021 • edited Loading

MVrachev commented May 21, 2021

jku commented May 27, 2021 • edited Loading

jku commented May 20, 2021 •

edited

Loading

jku commented May 20, 2021 •

edited

Loading

jku commented May 27, 2021 •

edited

Loading