Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata Survey Updates #460

Merged
merged 10 commits into from
Oct 4, 2022
Merged

Metadata Survey Updates #460

merged 10 commits into from
Oct 4, 2022

Conversation

cassidysymons
Copy link
Collaborator

Updating the Metadata repo to reflect recent survey changes:

  1. Adding MyFoodRepo, the Polyphenol FFQ, and the Spain FFQ to the ignored survey templates, as they're external surveys and can't be retrieved from the database.
  2. Adding all of the fields in the locally hosted Cooking Oils & Oxalate-rich Foods survey to the EBI_REMOVE constant, as they don't exist in Qiita.

Updating the Metadata repo to reflect recent survey changes:
1) Adding MyFoodRepo, the Polyphenol FFQ, and the Spain FFQ to the ignored survey templates, as they're external surveys and can't be retrieved from the database.
2) Adding all of the fields in the locally hosted Cooking Oils & Oxalate-rich Foods survey to the EBI_REMOVE constant, as they don't exist in Qiita.
@wasade
Copy link
Member

wasade commented Sep 22, 2022

Can we add the fields to Qiita so they are accessible for analysis?

@cassidysymons
Copy link
Collaborator Author

Can we add the fields to Qiita so they are accessible for analysis?

I'm all for that as long as it won't cause any problems. Is that something I can/should do, or do I request it from someone else?

@wasade
Copy link
Member

wasade commented Sep 22, 2022 via email

@wasade
Copy link
Member

wasade commented Sep 22, 2022

It's not, see

# gather the categories currently used in qiita. we have to have parity
# with the categories when pushing
cats_in_qiita = qclient.get('/api/v1/study/10317/samples/info')
cats_in_qiita = set(cats_in_qiita['categories'])

@antgonza, is there an easy way to add metadata columns to a qiita study w/o actually needing to create and specify the metadata? In this case, we have variables that need to exist in qiita, but we dont yet have samples that express those variables

@antgonza
Copy link

Programmatically: You could add a fake sample with all the columns you need and then delete it; this will keep the extra columns around.

If this is only going to be performed once: You can open the DB and insert into QIITA_COLUMN_NAME the new columns you want to add, like this.

Note that this will add new and fully empty columns to all the samples - IMOO not useful for analysis.

Hope this helps.

@wasade
Copy link
Member

wasade commented Sep 22, 2022

Thanks! Is the programmatic method exposed and usable by an API? Is it possible to set a default value for historical samples where the variable is not collected (e.g., Missing: not collected or something)? I'm hoping there is an automated solution here rather than asking you to do it manually whenever new columns need to be specified

cc @sejsong, this surrounds new questions to the questionnaire

@antgonza
Copy link

Yes for adding/updating samples/metadata but not for deleting (I think deleting samples can be problematic). No way to define a default value, you would need to add that value for all historical samples.

IMOO the best would be that the TMI pushes whatever changes it requires directly, either automatically or by hand ... and if there is no end-point add one (as long as it makes sense for the rest of the system).

@wasade
Copy link
Member

wasade commented Sep 22, 2022

If I understand the existing API correctly, the metadata category has to already exist? See https://github.com/qiita-spots/qiita/blob/master/qiita_pet/handlers/rest/study_samples.py#L171-L179

Would the Qiita dev team be able to expose endpoints to allow for creation of metadata variables?

@antgonza
Copy link

Good point, I didn't remember that - I wonder who added that restriction 😜.

Anyway, I'll be happy to review/merge new PRs with endpoints.

@sejsong
Copy link

sejsong commented Sep 22, 2022

@antgonza is this something you might be able to do? I'm not sure we have the expertise and bandwidth on the TMI dev team to do this the most efficiently.
I'm assuming this is something that will be useful for Qiita more widely?

@antgonza
Copy link

I'm not sure if it will be useful for other Qiita users (as they do not interact directly with the API - they would need access to the UCSD systems to do it). Also, that original functionality was added specifically for and by the AGP/TMI-dev-team. Now IMOO the best will be that whoever implements that functionality for this repo should issue a PR to qiita to make sure the 2 PRs work properly; does that make sense?

Anyway, I think having an internal meeting to decide what and whom should add new code + timelines for TMI would be beneficial.

@wasade
Copy link
Member

wasade commented Sep 23, 2022

Just spoke with Se Jin about the motivation for this. It seems surprising that samples are failing from the other surveys, but plausible. Definitely recommend testing against qiita-rc. I thought the code was setup to fail on a per survey basis, not per barcode basis, but there may be edge cases I'm misremembering

@cassidysymons
Copy link
Collaborator Author

A "No such survey" error is causing a hardfail in the qiita metadata push:

qiita metadata push errors

[ { "hardfail": "Traceback (most recent call last):\n File "/Users/api_user/microsetta-private-api-2022.9/microsetta_private_api/tasks.py", line 76, in update_qiita_metadata\n n_pushed, error = qiita.push_metadata_to_qiita()\n File "/Users/api_user/microsetta-private-api-2022.9/microsetta_private_api/repo/qiita_repo.py", line 90, in push_metadata_to_qiita\n formatted, error = retrieve_metadata(to_push)\n File "/Users/api_user/microsetta-private-api-2022.9/microsetta_private_api/repo/metadata_repo/_repo.py", line 109, in retrieve_metadata\n survey_templates, st_errors = _fetch_observed_survey_templates(fetched)\n File "/Users/api_user/microsetta-private-api-2022.9/microsetta_private_api/repo/metadata_repo/_repo.py", line 159, in _fetch_observed_survey_templates\n survey, error = _fetch_survey_template(template_id)\n File "/Users/api_user/microsetta-private-api-2022.9/microsetta_private_api/repo/metadata_repo/_repo.py", line 190, in _fetch_survey_template\n survey_template = survey_template_repo.get_survey_template(\n File "/Users/api_user/microsetta-private-api-2022.9/microsetta_private_api/repo/survey_template_repo.py", line 136, in get_survey_template\n raise NotFound("No such survey")\nwerkzeug.exceptions.NotFound: 404 Not Found: No such survey\n" } ]

@wasade
Copy link
Member

wasade commented Sep 23, 2022

Thanks. It may be good to wrap _fetch_survey_template so errors in the future can be gracefully caught and reported, it looks like this is bringing down the entire metadata push which is bad

@cassidysymons
Copy link
Collaborator Author

@wasade I added a try/catch block in _fetch_survey_template to gracefully handle the NotFound exception. Given the urgency to get the metadata push back up and the timeline for having Qiita changes live and available to use, do you see any issues with merging this code as-written? It's admittedly a temporary solution, but it resolves our issue for the time being and we can revisit the Qiita changes in a few weeks.

@wasade
Copy link
Member

wasade commented Sep 28, 2022

I think that strategy works, I'm unsure if e needs to be str or Exception. However, it is going to be painful to recover the responses and update the metadata in Qiita so this is a bandaid and it will hurt when it gets ripped off.

@cassidysymons
Copy link
Collaborator Author

@wasade When you have a minute can you please do a final review on this?

Copy link
Member

@wasade wasade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if the error return is valid. It may be nice to assert it with a unit test

'OTHER_ANIMALS_FREE_TEXT', 'OILS_FREQUENCY_VEGETABLE',
'OILS_FREQUENCY_ANIMAL', 'OILS_FREQUENCY_OTHER',
'OILS_FREQUENCY_MARGARINE', 'OILS_FREQUENCY_OXALATE'
'OILS_FREQUENCY_SOY']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a TODO note be added regarding when these were added, and that we're blocked pending an update on Qiita's API?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


return info, None
return info, error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it work to return type Exception here? I don't know how flexible the system is with the return types but it would be good to make sure that this does work. I think other areas of the code return dict or None

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cast the Exception to string, which goes into an array in the calling function (if an error is actually returned).

@cassidysymons
Copy link
Collaborator Author

I'm not sure if the error return is valid. It may be nice to assert it with a unit test

I added a unit test that intentionally calls the function with a remote survey ID to observe that an error is returned with a value other than "None." Since I've specifically cast the Exception to string before returning it, I don't think we need any further test coverage there. What are your thoughts?


# verify that _fetch_survey_template returns an error, reflecting
# that it's a remote survey for which we can't extract local data
self.assertNotEqual(errors, None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't errors be somethign different than None?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm asserting that errors isn't None in this case using assertNotEqual, which represents the intended behavior. We don't strictly care what the content of the error is, but having a non-None value there will prevent the metadata push from attempting to use that survey.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ya read that fast :) thanks

@wasade wasade merged commit 9f86303 into master Oct 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants