Tired: message.assignment_id; Wired: message.campaign_contact_id #1311

schuyler1d · 2019-12-18T22:53:13Z

move code from using message.assignment_id => message.campaign_contact_id; also adds message.messageservice_sid

TODO:

migration and schema defining the new indexes desired on message
migration documentation

Description

Schema changes for better scaling in the prototype scaling branch

Architecture Proposal

This creates a schema change where many places in the code joined message.assignment_id = campaign_contact.assignment_id AND message.contact_number = campaign_contact.cell

Motivation

This has some negative performance issues:

Updating assignments requires a bulk update (and lock) of the message table -- a costly and dangerous thing with high throughput
Large difficulties in creating a scaling structure that will lookup records based on campaign contact id

There is precedence and experience in the Politics Rewired branch

Proposal Details

This varies slightly with the Politics Rewired structure. In ours, we also add a message.messageservice_sid column. This allows us to track something in the table that can participate in lookup for an incoming message. In the future orgs (and/or contacts) can have different meessageservice_sids -- this can separate accounting aspects and also how to track different orgs texting the same contact. In Twilio, a messageservice_sid corresponds to the Message Service used. However, it could also be a 'from' number (and is for Nexmo in this PR) -- i.e. anything unique that is known before sending and is identifiable from an incoming message.

Deferred/Future Work

We do not yet create the multi-tenant algorithms or structure for looking up a messageservice_sid or a particular organization/setup.
We do not yet add any caching changes that this work will make easier
We do not address a discovered issue with Nexmo support (deprecated) of needing to provide/supply a phone number to send from in the current code base -- instead we just document it inline.

Checklist:

I have manually tested my changes on desktop and mobile
The test suite passes locally with my changes
[na] If my change is a UI change, I have attached a screenshot to the description section of this pull request
My change is 300 lines of code or less, or has a documented reason in the description why it’s longer
I have made any necessary changes to the documentation
I have added tests that prove my fix is effective or that my feature works
My PR is labeled [WIP] if it is in progress

…t_id; also adds message.messageservice_sid TODO: migration and schema defining the new indexes desired on message

… message.campaign_contact_id

ibrand · 2019-12-23T20:28:45Z

@schuyler1d noticing the test suite is failing here, do you know why?

The deleted test was replaced by __test__/server/api/createOptOut.test.js

schuyler1d · 2019-12-24T16:56:01Z

@schuyler1d noticing the test suite is failing here, do you know why?

@ibrand the main test test/server/texter.test/correctOptOut.test.js has some loading issues (basically we should fully deprecate getGql in tests and move to runGql). Larry did this in this PR where he writes a better test with runGql

To fix this in my branch, I git cherry-pick 22e2d727e3d3ad3d1457160f0cdc96442bc50fc7 (needs a remote add of Larry's fork) and pushed. I believe that will fix this in my branch and also work during merge. 🤞

schuyler1d · 2019-12-24T17:04:03Z

nope, it's something else:

FAIL __test__/server/api/lib/twilio.test.js
  ✕ postMessageSend success should save message and update contact state (269ms)
    error: duplicate key value violates unique constraint "pg_type_typname_nsp_index"
      at Connection.Object.<anonymous>.Connection.parseE (node_modules/pg/lib/connection.js:567:11)
      at Connection.Object.<anonymous>.Connection.parseMessage (node_modules/pg/lib/connection.js:391:17)
      at Socket.<anonymous> (node_modules/pg/lib/connection.js:129:22)

I think I see the issue (basically we need to use transactions in twilio.postMessageSave -- after trying to save 'just once'

schuyler1d · 2019-12-24T17:39:20Z

ok, tests passing -- letting the default migrations stay transactional in all cases which makes tests pass more easily, and also anyone that has a bigger db to migrate would have to be manually running these migrations anyway.

schuyler1d · 2019-12-24T18:05:05Z

@ibrand please review these instructions, but also they would go into wherever we want them for the RELEASE_NOTES.

Instructions for migrating large database instances (>1million messages/contacts)

Make sure SUPPRESS_MIGRATIONS=1 in your environment
Before deploying the code, you will want to do most of these changes (manually) beforehand, ideally while the system is down. If you are planning to do these migrations before deploying the new code, we strongly recommend merging this PR separate from the schema changes BEFORE these steps to your production instance. It removes a few bugs which caused postgres to deadlock on message table updates. Instructions below assume Postgres.

2.1 Create the additional columns:

ALTER TABLE message ADD COLUMN campaign_contact_id integer NULL REFERENCES campaign_contact(id);
ALTER TABLE message ADD COLUMN messageservice_sid text NULL;
ALTER TABLE message ALTER COLUMN assignment_id DROP NOT NULL;

These should be non-blocking and 'quick' -- i.e. downtime is probably unnecessary.

2.2 Once those complete, you will want to create the campaign_contact_id INDEX:

CREATE INDEX CONCURRENTLY message_campaign_contact_id_index ON message (campaign_contact_id)

The CONCURRENTLY adverb should make this possible without downtime, but sometimes caution is best.

2.3 (LOCKING: Best with system unstressed/down) Next you want to start filling in the campaign_contact_id. You may want to prioritize live or recent campaigns with additional qualifications or do this in batches with a command similar to:

UPDATE message
SET campaign_contact_id = campaign_contact.id
FROM campaign_contact
WHERE message.assignment_id = campaign_contact.assignment_id
  AND message.contact_number = campaign_contact.cell
  AND message.id IN (
    SELECT id
    FROM message
    WHERE campaign_contact_id IS NULL
    LIMIT 1000000
  )

Note the "LIMIT 100000" is doing it in batches of 1 million. Another strategy would be to find the lowest campaign_contact_id value for your live campaigns and add a WHERE campaign_contact.id > XXX where XXX is that value (or do the same thing for message.id > XXX).

This WILL LOCK the MESSAGE table -- and thus stop processing of events, so you will want to do this off-hours, ideally with the system down.

Once you have done this with the majority of messages, especially messages for live campaigns, you should now be ready for real downtime.

2.4 (LOCKING: Requires downtime) The final step is to definitely take the system down if you haven't already. Complete updates to messageservice_sid and campaign_contact_id, there are some final steps:

CREATE INDEX CONCURRENTLY cell_messageservice_sid_idx ON message (contact_number, messageservice_sid);
DROP INDEX message_contact_number_index;
INSERT INTO knex_migrations (name, batch, migration_time) VALUES 
  ( '20191217125726_add_message_campaign_contact_id.js', 3, now()),
  ( '20191217130355_change_message_indexes.js', 3, now()),
  ('20191217130000_message_migrate_data.js', 3, now());

2.5 Deploy the new code!
2.6 As soon as possible (after system is working) then run:

DROP INDEX message_assignment_id_index;

NOTES:

One of the main challenges besides migrating and indexing a large table is that many indexes on the message table with so many inserts, can tax the insertions too much and slow the system down. So, while we are indexing both assignment_id AND contact_number AND contact_number+messageservice_sid AND campaign_contact_id, the system is likely to be over stressed. This is why we stagger the index creation and then drop the unnecessary indexes as soon as we have the system up again. Obviously if ALL these steps are done during a single downtime, event, then that would work too -- but staggering them can allow shorter and stepped periods of downtime or migration.

ibrand · 2019-12-26T18:13:54Z

@schuyler1d Thank you for writing up such thorough instructions! I'm especially into the notes around what locks the tables and what doesn't. One thing that might be helpful to add is to give ballpark time estimates. For example:
2.1 (should only take around an hour for large systems)
2.3 (likely to take overnight for large systems)

…gMessage

…_contact_id

move code from using message.assignment_id => message.campaign_contac…

32c0b07

…t_id; also adds message.messageservice_sid TODO: migration and schema defining the new indexes desired on message

schuyler1d requested review from strangeways and ibrand December 18, 2019 22:53

schuyler1d added 2 commits December 19, 2019 14:10

comment out log.info logs on message sending

b484937

data migration and message index changes for message.assignment_id =>…

00b0e18

… message.campaign_contact_id

schuyler1d changed the title ~~[WIP] Tired: message.assignment_id; Wired: message.campaign_contact_id~~ Tired: message.assignment_id; Wired: message.campaign_contact_id Dec 20, 2019

strangeways previously approved these changes Dec 21, 2019

View reviewed changes

ibrand added the S-needs tests to pass Status: PRs with failing tests label Dec 23, 2019

schuyler1d and others added 2 commits December 24, 2019 11:50

Merge remote-tracking branch 'origin/main' into db-messagecontactid

63d02c0

Delete a bad test.

25ac5a1

The deleted test was replaced by __test__/server/api/createOptOut.test.js

schuyler1d dismissed strangeways’s stale review via 25ac5a1 December 24, 2019 16:51

schuyler1d added 2 commits December 24, 2019 12:32

simplify migration -- concurrent will have to be manual

6fc298b

remove unused var

4b11729

schuyler1d added 2 commits December 24, 2019 13:12

also drop assignment_id

f6e189d

only query first message -- not all of them!

9ffd11c

ibrand previously approved these changes Dec 26, 2019

View reviewed changes

ibrand temporarily deployed to spoke-review-db-message-db1x3j December 26, 2019 18:47 Inactive

ibrand added S-ready for stage-main (qa) Status (ADMINS ONLY): PR label for those ready to be added for stage: Approved, tests, etc and removed S-needs tests to pass Status: PRs with failing tests labels Dec 26, 2019

ibrand mentioned this pull request Dec 27, 2019

Stage main #1340

Closed

merge load-messages branch which fixes a performance issue on production

2f928ef

schuyler1d dismissed ibrand’s stale review via 2f928ef January 2, 2020 18:28

ibrand temporarily deployed to spoke-review-db-message-db1x3j January 2, 2020 18:28 Inactive

schuyler1d mentioned this pull request Jan 2, 2020

add job to load messages in from a csv and fix some crucial bugs in twilio.js #1346

Merged

6 tasks

schuyler1d added the S-qa staging round Status (ADMINS ONLY): PR label for being tested on stage (next step: testing in producgtion) label Jan 2, 2020

schuyler1d added 2 commits January 3, 2020 14:03

Merge remote-tracking branch 'origin/main' into db-messagecontactid

0078d6b

fixes to message.messageservice_sid saving and test for handleIncomin…

c5bfb6c

…gMessage

ibrand temporarily deployed to spoke-review-db-message-db1x3j January 6, 2020 21:51 Inactive

fix accurate sent/received counts on dashboard after message.campaign…

bd03a35

…_contact_id

ibrand temporarily deployed to spoke-review-db-message-db1x3j January 13, 2020 18:35 Inactive

schuyler1d mentioned this pull request Jan 13, 2020

Stage main2 #1367

Merged

ibrand merged commit bd03a35 into main Jan 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tired: message.assignment_id; Wired: message.campaign_contact_id #1311

Tired: message.assignment_id; Wired: message.campaign_contact_id #1311

schuyler1d commented Dec 18, 2019 •

edited

Loading

ibrand commented Dec 23, 2019

schuyler1d commented Dec 24, 2019

schuyler1d commented Dec 24, 2019

schuyler1d commented Dec 24, 2019

schuyler1d commented Dec 24, 2019 •

edited

Loading

ibrand commented Dec 26, 2019

Tired: message.assignment_id; Wired: message.campaign_contact_id #1311

Tired: message.assignment_id; Wired: message.campaign_contact_id #1311

Conversation

schuyler1d commented Dec 18, 2019 • edited Loading

Description

Architecture Proposal

Motivation

Proposal Details

Deferred/Future Work

Checklist:

ibrand commented Dec 23, 2019

schuyler1d commented Dec 24, 2019

schuyler1d commented Dec 24, 2019

schuyler1d commented Dec 24, 2019

schuyler1d commented Dec 24, 2019 • edited Loading

Instructions for migrating large database instances (>1million messages/contacts)

ibrand commented Dec 26, 2019

schuyler1d commented Dec 18, 2019 •

edited

Loading

schuyler1d commented Dec 24, 2019 •

edited

Loading