How to test channel related issues

This was originally written to help testing the channel broker, but it might be also useful for other channel related issues.

Pre-requisites

Get the Twilio Simulator up and running on STG 🆗
Get the LGWSim up and running on STG 🆗

Without these simulators it will not be possible to run a survey with a realistic scale. (> 20000 respondents)

Channel configuration

The sim channels should already exist in Verboice and Nuntium (if not go to Simulators README files to learn how to do it).

You can test different behaviors:

Set up a small project batch limit (e.g. 10), and channel capacity (e.g. 5) and a few respondents (e.g. 100) to have the survey run slowly, so you have time to investigate what's happening in the different interfaces (Surveda, Verboice / Nuntium) as well as application logs in Rancher.
Set up a larger project batch limit (e.g. 200), and channel capacity (e.g. 100) to force the survey to run much faster, and see if it behaves correctly to process a large number of respondents (e.g. 20000+).
You can setup parallel running surveys against distinct channels or using the same channel. Be aware that Surveda will push batch limit respondents at once for Survey A then for Survey B and so on, so each Survey shall advance one after the other when using the same channel!

You can also alter the channel limits in Verboice / Nuntium UI to process queues slowly (e.g. 2) or faster (e.g. 100). Note that modifying the channel limits may not take effect while there are messages to process in queue.

For a realistic setting, the channel capacity in Surveda should be around twice the channel limit in Nuntium / Verboice, as we want the queues to be capable to queue another call/sms message ASAP.

Create simulator friendly questionnaires

For LGWSim follow these rules
For Twilio simulator these rules

See the Basic quex for example: https://surveda-stg.instedd.org/projects/343/questionnaires/3066/edit

A better questionnaire should have more questions, that will lead to more break offs (no replies), leading to more retries and fallbacks to another mode.

Configure survey

Select a simulation questionnaire
Define the mode primary and fallback (or no fallback)
Add 50 to 100k respondents (make sure phone_numbers are unique and different from other tests so the contact attempts are easy to identify)
Select the SIM channels (for each mode)
Setup the schedule (the survey should run in a time window in which the testers can review the logs)
- The survey should last more than 1 day, ideally a few days 3 to 7.
- To guarantee that the survey will run several days we should take into consideration:
  - Simulator ENV variables:
    - SLEEP_SECONDS
    - DELAY_REPLY_PERCENT
    - DELAY_REPLY_MIN_SECONDS
    - DELAY_REPLY_MAX_SECONDS
  - Contact window (from, to in survey schedule)
  - Number of respondents
  - Adding more delay to the responses and a small contact window will help.
Define cutoff rules, at least number of completes, ideally some quotas to evaluate if the channel broker somehow impacts in the quota completion

Expectations

the number of active + queued calls in Verboice should be at most Surveda's channel capacity;
the number of queued calls in Nuntium should be at most a bit more than Surveda's channel capacity (in my tests, around +5%); this is by design: we immediately send replies regardless of the channel capacity, which can push over the channel capacity (we still block new replies).
multiple surveys with large number of respondents, running on different schedules (they don't start/stop at the same time) should be running correctly when other surveys are running or not (never blocked).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly