Make a more generalized 'upstream' connector in composition objects/tables? #234

tanaes · 2018-05-09T16:47:16Z

Yesterday, @mortonjt and I were working together on issue #77 and running into a challenge that @ElDeveloper and I also encountered in issue #195: different composition types have differently named tables or methods names necessary to access the constituent samples.

In issue #195, this led to the following code:

https://github.com/jdereus/labman/blob/0dd5afef6a980ff25d6b420618828a2bddae383c/labman/gui/handlers/process_handlers/quantification_process.py#L57-L67

In issue #77, we're trying to follow compositions 'upstream', but this requires us to have a lookup table that references the current composition type so that we can know the appropriate column name to query for the upstream composition:

https://github.com/mortonjt/labman/blob/33277e138dde41a5789759df20e3c08c01ddfad2/labman/db/plate.py#L580-L607

Would it be possible to generalize this somehow? @AmandaBirmingham, do you have any thoughts on this? Is it possible, for example, to add a column name alias to the composition type tables so that one could SELECT upstream_composition_id FROM qiita.gdna_composition instead of SELECT sample_composition_id FROM qiita.gdna_composition?

The text was updated successfully, but these errors were encountered:

josenavas · 2018-05-09T17:46:22Z

IMO this should be pushed down to the DB and create a function to achieve such thing. That will localize all the code in a single place. An example on how this is done is in the function "qiita.get_plate_studies" in the db_patch_manual.sql file. That approach is way faster than the current implemented approach, it doesn't create duplicated information, and there is no need to modify the DB.

tanaes · 2018-05-09T17:59:55Z

Agree that it would be good to have this in the db. I'm worried about maintainability having all this plate type-specific logic though -- that function is at the end of a 2.5 MB file! And if we change any of our processes, we'll have to update the logic accordingly.

How hard would it be to modify these tables in a way that could allow this type of access to be generalized? I know almost no SQL to be sure, but was imagining something like a 'column name alias'.

josenavas · 2018-05-09T18:16:57Z

that is possible but then you have to make sure to maintain consistency on updates, since it is data duplication (which such code is basically doing something similar that the function that mention above is doing to ensure consistency -> hence needs to be updated on changes).

That file is horrible, and it should probably be break down by functionality to aid development. Note, however, that once the system is in production, any updated that you need to do to such function need to go through a patch to the database, not modifying that file directly. That file is initializing the DB to be able to start working at day 0.

AmandaBirmingham · 2018-05-10T20:53:08Z

@tanaes asked "Is it possible, for example, to add a column name alias to the composition type tables so that one could SELECT upstream_composition_id FROM qiita.gdna_composition instead of SELECT sample_composition_id FROM qiita.gdna_composition?"

Let's be careful here: although it is the case that MOST of the (current) compositions have only one upstream composition, some of them have more than one--and there is nothing about a composition which limits it to ONE "upstream" composition id. For example, library_prep_16s_composition--one of the composition types you reference in the code snippet above--records two "upstream" composition ids: normalized_gdna and primer.

That said: in the current structure, any composition table that "inherits" from gdna_composition leads back to a single sample_composition. Perhaps there could be a "sample_composition" property on all *gdna_composition objects that handles the traceback in a way specific to that particular composition type, but hides it from the user?

tanaes · 2018-05-23T22:12:31Z

If it's a method of the object though, doesn't that require us to go through Python (and hence a separate SQL call) for each composition in turn?

josenavas · 2018-05-24T13:36:27Z

I don't think I understand your question. Those python methods just call an SQL method, rather than backtracking the entire chain. An alternative to reduce the code duplication of such method (which it is 3 or 4 LOC) would be to create a subclass of Composition that just adds that method and make all the classes that have a "sample_composition" inherit from it. I'm not sure if that actually saves complexity or adds...

tanaes · 2018-05-24T14:43:13Z

It's not the code duplication I'm concerned about, it's that you'd have to call the python function to call the SQL e.g. 384 times for one plate to get the 'upstream' plates, which would be very slow. Those sorts of calls have kept biting us in terms of performance.

josenavas · 2018-05-24T14:56:30Z

Oh I see - note that such behavior is a result of the ability to "cherry pick" samples from plates. An alternative is to create bulk operations and/or move some of the functionality from python to SQL. I think in that case it just depends on the nature of the operation being made.

For example, for issue #77, ideally you can perform that entire operation in SQL and just return the final result to python, rather than encoding the functionality in python. In issue #195 a similar argument can be made - it may be useful to create a bulk operation on the plate level to return some kind of useful structure containing all the sample ids and their location on such plate. Note that those type of functions create jumps on the OOP structure, but those are valid arguments in favor of performance (which is common).

An alternative is also exploit parallelization. A fair amount of operations are read-only, and hence can be performed at the same time.

tanaes · 2018-05-24T15:05:49Z

Yes, and that's what we've been trying to do. But really this is way beyond my SQL-foo. When @mortonjt and I tried we were stymied by the fact that different composition types had different column names for specifying the the input compositions, which made it impossible to make a simple and generalized 'what are the plates upstream of this plate' query. Any suggestions?

…

On Thu, May 24, 2018 at 7:56 AM Jose Navas ***@***.***> wrote: Oh I see - note that such behavior is a result of the ability to "cherry pick" samples from plates. An alternative is to create bulk operations and/or move some of the functionality from python to SQL. I think in that case it just depends on the nature of the operation being made. For example, for issue #77 <#77>, ideally you can perform that entire operation in SQL and just return the final result to python, rather than encoding the functionality in python. In issue #195 <#195> a similar argument can be made - it may be useful to create a bulk operation on the plate level to return some kind of useful structure containing all the sample ids and their location on such plate. Note that those type of functions create jumps on the OOP structure, but those are valid arguments in favor of performance (which is common). An alternative is also exploit parallelization. A fair amount of operations are read-only, and hence can be performed at the same time. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#234 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AH6JALi5JguXPKLhd0a22PSY8-W7LfEFks5t1soxgaJpZM4T4pbQ> .

josenavas · 2018-05-24T15:28:10Z

I may be able to put some suggestion together by the beginning of next week. Not, however, that I will not be able to test the code locally.

charles-cowart · 2019-04-09T18:38:08Z

Migrating to nice-to-have, as this issue isn't impeding current functionality.

AmandaBirmingham added code refactor scope:large labels Nov 16, 2018

charles-cowart added this to the Nice to haves milestone Apr 9, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a more generalized 'upstream' connector in composition objects/tables? #234

Make a more generalized 'upstream' connector in composition objects/tables? #234

tanaes commented May 9, 2018 •

edited

Loading

josenavas commented May 9, 2018

tanaes commented May 9, 2018

josenavas commented May 9, 2018

AmandaBirmingham commented May 10, 2018

tanaes commented May 23, 2018

josenavas commented May 24, 2018

tanaes commented May 24, 2018

josenavas commented May 24, 2018

tanaes commented May 24, 2018 via email

josenavas commented May 24, 2018

charles-cowart commented Apr 9, 2019

Make a more generalized 'upstream' connector in composition objects/tables? #234

Make a more generalized 'upstream' connector in composition objects/tables? #234

Comments

tanaes commented May 9, 2018 • edited Loading

josenavas commented May 9, 2018

tanaes commented May 9, 2018

josenavas commented May 9, 2018

AmandaBirmingham commented May 10, 2018

tanaes commented May 23, 2018

josenavas commented May 24, 2018

tanaes commented May 24, 2018

josenavas commented May 24, 2018

tanaes commented May 24, 2018 via email

josenavas commented May 24, 2018

charles-cowart commented Apr 9, 2019

tanaes commented May 9, 2018 •

edited

Loading