Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding unit tests, golden files, and supplemental labels #10

Closed
wants to merge 50 commits into from

Conversation

esimpsons3ti
Copy link
Collaborator

@esimpsons3ti esimpsons3ti commented Apr 9, 2024

This pull request will introduce the unit tests for the indexing tools.

Fixes #7

Fixes #6

Fixes #5

@esimpsons3ti esimpsons3ti marked this pull request as ready for review April 23, 2024 15:09
@esimpsons3ti
Copy link
Collaborator Author

This is now in a state to be actually looked at. The changes also now include new functionality in the current program to do the following:

  • Now reads elements file as glob statements
  • square brackets are replaced by angled brackets
  • Comments within label are now removed automatically
  • And more!

Let me know if you have any questions.

@esimpsons3ti
Copy link
Collaborator Author

I have added the new command line argument function, now named --dump-available-elements.

Fixes #8

Copy link
Collaborator

@rfrenchseti rfrenchseti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I haven't tried to run the code, so these comments are just from visual inspection.
  • I know it's a pain, but give some consideration to what I suggested earlier about creating the golden files by hand before running the tests. Right now I think the golden files are being generated by the code, and there are cases where they are wrong. If you had written them by hand first, you would have written the correct answer and then found a bug when running the test. It takes a lot longer to do than just auto-generating them and looking at them to see if you think they're correct, but you will catch more bugs this way.
  • All files related to testing should go under the single test_files directory. There shouldn't be a separate top-level samples directory. Then you can create sub-directories under there for different purposes. For example, test_files/sample_input and test_files/expected. Then you don't need to add _success to the end of your filenames, although it doesn't hurt to leave it there.
  • Your input files seem to be split between two locations, e.g. test_files/tester_label_1.xml and samples/sample_elements.txt. You could put all these under test_files/sample_input or divide them further like test_files/sample_input/element_files and test_files/sample_input/labels.
  • The clean headers option needs to do something with < and > since those are also illegal identifiers in strings. I think it would be easiest just to replace <1> with _1.

)

def test_sort_by(golden_file, new_file, cmd_line):
# Create a temporary directory in the same location as the test_files directory
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you want it in the same location as the test_files directory? Just let the OS put it where it normally puts temp files.

)
]
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not have a blank line between the @pytest.mark.parametrize and the function definition. That way it's clear they are bound together as one entity.




# Set parameters values that you would like to pass into test_sort_by.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than explaining how @pytest.mark.parametrize works, which people should already know if they're looking at a pytest file, I think it would be more useful to explain what you're actually testing in each section.


# These two variables are the same for all tests, so we can either declare them as
# global variables, or get the root_dir at the setup stage before running each test
root_dir = Path(__file__).resolve().parent.parent
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By convention, global variables are usually written in all-caps.

assert config_object['pds:ASCII_Short_String_Collapsed']['anticipated'] == 'anticipated_alt'


# now, a bad config file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indent comment

i += 1
label_results[key_new] = label_results.pop(key)


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many blank lines

'{'+namespace+'}', prefixes[namespace]+':')
xml_results[key_new] = xml_results[key]
del xml_results[key]
elements = ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a real nit, but since tuples are immutable they are not meant to be appended to. If you append an object to a tuple with +=, it creates a whole new tuple from scratch, copying over all the old data. However, lists are mutable and are meant to be extended. So x.append(y) is an efficient operation, while x += (y,) (where x is a tuple) is less efficient.

for part in parts:
if '<' in part:
part = part.split('<')
elements += (part[0],)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really only want to include elements that have <n> at the end?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point in the code, all elements within every XPath header should have <n> at the end.

for item in elements:
file.write("%s\n" % item)
else:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove blank line

negated.append(pat)
else:
positive.append(pat)
for key, value in input_dict.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that might be useful here is allowing the "verbose" option to give details about what is happening here. For example, you could print:

Adding elements due to: **/*a*
  /pds:/whatever
  /pds:/whatever
Deleting elements due to: !**/*ab*
  /whatever

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write unit tests for the index tools Need "golden copy" index files Need "golden copy" labels
2 participants