Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sources #671

Open
sjanssen2 opened this issue May 6, 2017 · 11 comments
Open

Sources #671

sjanssen2 opened this issue May 6, 2017 · 11 comments

Comments

@sjanssen2
Copy link
Contributor

Hi,
in order to model the "source" concept in our DB, I suggest to add in the following two tables.

A source should have a unique, automatically generated ID, the user can give it a name (in the case of a human source that is the participant_name) and it is owned by one ag_login_id.

CREATE TABLE ag."source" (
	source_id            uuid DEFAULT ag.uuid_generate_v4() ,
	source_name          varchar(100)  ,
	ag_login_id          uuid  ,
	CONSTRAINT pk_source UNIQUE ( source_id )
 );

The second table, stores the many-to-many relation between barcodes and surveys, where each of those connections points to one source, hence the source_id column.

CREATE TABLE ag.source_barcodes_surveys (
	source_id            uuid  NOT NULL,
	barcode              varchar  ,
	survey_id            varchar
 );

This design should allow for the creation of empty sources by only giving the source a name. Later, the user might fill out surveys for this source. And in a third step, the user can assign barcodes to a source (we should ensure that this assignment is only possible if the source has been assigned to a main survey, which is our way to ensure consent).

@josenavas @wasade please let me know your thoughts about this design.

@sjanssen2
Copy link
Contributor Author

The full SQL code to insert the two tables is:

CREATE TABLE ag."source" (
	source_id            uuid DEFAULT ag.uuid_generate_v4() ,
	source_name          varchar(100)  ,
	ag_login_id          uuid  ,
	CONSTRAINT pk_source UNIQUE ( source_id )
 );

CREATE INDEX idx_source_1 ON ag."source" ( ag_login_id );

COMMENT ON TABLE ag."source" IS 'A "source" is the human, animal or environment that was swapped for microbial analysis.';
COMMENT ON COLUMN ag."source".source_id IS 'A unique ID to identify the source.';
COMMENT ON COLUMN ag."source".source_name IS 'Sources can come with names, e.g. names for the humans that have been swapped.';
COMMENT ON COLUMN ag."source".ag_login_id IS 'Points to the "user" that owns this source.';

CREATE TABLE ag.source_barcodes_surveys (
	source_id            uuid  NOT NULL,
	barcode              varchar  ,
	survey_id            varchar
 );

CREATE INDEX idx_source ON ag.source_barcodes_surveys ( barcode );
CREATE INDEX idx_source_0 ON ag.source_barcodes_surveys ( survey_id );
CREATE INDEX idx_source_barcodes_surveys ON ag.source_barcodes_surveys ( source_id );

COMMENT ON COLUMN ag.source_barcodes_surveys.source_id IS 'Points to information about the source, like its name.';
COMMENT ON COLUMN ag.source_barcodes_surveys.barcode IS 'Points to barcode(s) that are assigned to this source.';
COMMENT ON COLUMN ag.source_barcodes_surveys.survey_id IS 'Points to survey(s) that are assigned to this source.';

ALTER TABLE ag.source_barcodes_surveys ADD CONSTRAINT fk_source_barcode FOREIGN KEY ( barcode ) REFERENCES barcodes.barcode( barcode );
ALTER TABLE ag.source_barcodes_surveys ADD CONSTRAINT fk_source_barcodes_surveys FOREIGN KEY ( source_id ) REFERENCES ag."source"( source_id );

@wasade
Copy link
Member

wasade commented May 6, 2017 via email

@sjanssen2
Copy link
Contributor Author

After thinking about it, I tend to agree with you about the identity of my "source" and the already existing ag.ag_login_surveys table.
Am I right that also samples from animals need to have a main survey in our system?

@wasade
Copy link
Member

wasade commented May 6, 2017 via email

@sjanssen2
Copy link
Contributor Author

table ag_animal_survey was one of those completely disconnected to our codebases. I dropped it with my last PR.

@wasade
Copy link
Member

wasade commented May 6, 2017 via email

@sjanssen2
Copy link
Contributor Author

Regarding the "source" table:
Table "ag.ag_login_surveys" does not directly have an ID for a source. It would be the combination of ag_login_id and participant_name (where the later is a user string). Is it a good design to not have some primary key like "source_id" for the concept of a source?

@wasade
Copy link
Member

wasade commented May 6, 2017 via email

@sjanssen2
Copy link
Contributor Author

ohm. I don't know the exact definition of primary key, but the DB schema tells me that "survey_id" in "ag.ag_login_surveys" is the primary key.

@josenavas
Copy link
Member

@wasade the problem with using tuples as primary key is that if you need to reference it you have to duplicate the tuple.

@sjanssen2 you're right, in the dbschema only one columns is labeled as primary key, but if the other column has a "unique" constraint that is making a "virtual" tuple primary key.

I would suggest adding a different id using bigserial rather than UUID to be able to easily reference it.

@sjanssen2
Copy link
Contributor Author

and note that "survey_id" is the primary key, which is not in the tuple of "ag_login_id, participant_name"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants