Simulations

Sealed Envelope can carry out simulations of the randomisation system using an automated testing programme. The randomisations generated by this approach are available for download on the specification page.

How are the simulations produced?

A data specification document is provided to the automated testing programme. This defines the data to be submitted to the randomisation form. The testing programme submits this data to the randomisation form to simulate a randomisation taking place. This process is repeated a set number of times (known as replications or reps) to produce the simulated dataset.

Data specification document

Here is an example of a data specification:

{
  "sample_size": 400,
  "fields": {
    "siteId": {
      "min": 1,
      "max": 10,
      "type": "int"
    },
    "dob": {
      "format": "d/m/Y",
      "min": "1 Jan 2000",
      "max": "31 Dec 2010",
      "type": "date"
    },
    "initials": {
      "type": "string",
      "length": 2
    },
    "eligible": {
      "value": ["Yes"],
      "type": "enum"
    },
    "gender": {
      "weight": [2, 1],
      "value": ["Male", "Female"],
      "type": "enum"
    },
    "consent": {
      "value": ["Yes"],
      "type": "enum"
    },
    "severity": {
      "weight": [1, 2],
      "value": [ "Low", "High"],
      "type": "enum"
    }
  },
  "stubName": "mytrial"
}

It is possible to alter the data submitted to the form to more closely reflect the expected distributions of individual variables in your trial by changing the weight parameter on categorical variables. For example if you expect twice as many women to be recruited compared to men, the weighting on gender would be set to [1, 2].

You can ask Sealed Envelope to make these changes and re-run the simulation.

Analysing the simulated data

You can download the simulated data and import into a spreadsheet or statistics package for analysis. You can check, for instance, that the randomisation protocol is balancing the treatment groups within strata. If you want to make changes to the randomisation protocol or carry out more simulations you should contact Sealed Envelope.

Example

In this example a simulation has been carried out using the data specification above. The randomisation protocol was minimisation on gender, severity and age-group with a 25% chance that a purely random allocation will be made (equivalent to using a biased coin with an 87.5% chance of choosing the treatment that reduces imbalance). The analysis was carried out using Stata.

First we import the simulated dataset.

insheet using mytrialRandom.2012-10-31.150000.tsv

Now lets start exploring the dataset.

. tab gender

     gender |      Freq.     Percent        Cum.
------------+-----------------------------------
     Female |        124       31.00       31.00
       Male |        276       69.00      100.00
------------+-----------------------------------
      Total |        400      100.00

We can see that gender has been allocated according to the weightings in the data specification (2:1 Male:Female).

. li initials gender severity dob agegroup in 1/5

     +---------------------------------------------------------------+
     | initials   gender   severity          dob            agegroup |
     |---------------------------------------------------------------|
  1. |       QO     Male       High   08/08/2001   6.5 years or over |
  2. |       MT     Male        Low   29/09/2002   6.5 years or over |
  3. |       YZ     Male       High   06/12/2003   6.5 years or over |
  4. |       PK     Male        Low   15/11/2009          <6.5 years |
  5. |       MH   Female       High   29/09/2003   6.5 years or over |
     +---------------------------------------------------------------+

Initials and date of birth (dob) have been generated with random strings and dates. The agegroup variable was calculated by the randomisation system from the date of birth so did not need to be included in the data specification.

. tab gender group

           |         group
    gender |    Active    Control |     Total
-----------+----------------------+----------
    Female |        62         62 |       124
      Male |       138        138 |       276
-----------+----------------------+----------
     Total |       200        200 |       400


. tab severity group

           |         group
  severity |    Active    Control |     Total
-----------+----------------------+----------
      High |       138        139 |       277
       Low |        62         61 |       123
-----------+----------------------+----------
     Total |       200        200 |       400


. tab agegroup group

                  |         group
         agegroup |    Active    Control |     Total
------------------+----------------------+----------
6.5 years or over |        94         96 |       190
       <6.5 years |       106        104 |       210
------------------+----------------------+----------
            Total |       200        200 |       400

The minimisation has clearly closely controlled the balance in the three minimisation factors. By way of contrast the balance within sites, which is not controlled by minimisation, can be seen to vary quite widely:

. tab siteid group

           |         group
    siteId |    Active    Control |     Total
-----------+----------------------+----------
         1 |        20         22 |        42
         2 |        21         23 |        44
         3 |        22         23 |        45
         4 |        14         17 |        31
         5 |        16          6 |        22
         6 |        18         22 |        40
         7 |        18         26 |        44
         8 |        26         27 |        53
         9 |        25         18 |        43
        10 |        20         16 |        36
-----------+----------------------+----------
     Total |       200        200 |       400

We can check the minimisation algorithm by calculating the marginal scores at each observation:

gen Active=0
gen Control=0
forvalues i=2/400 {
    foreach group of varlist Active Control {
        local total 0
        foreach factor of varlist gender severity agegroup {
            qui count if `factor'==`factor'[`i'] & group=="`group'" & _n<`i'
            local total = `total' + r(N)
        }
        qui replace `group'=`total' in `i'
    }
}

Control should be preferred by minimisation when its marginal total is lower than that for the Active group:

. tab group if Control < Active

      group |      Freq.     Percent        Cum.
------------+-----------------------------------
     Active |         20       11.70       11.70
    Control |        151       88.30      100.00
------------+-----------------------------------
      Total |        171      100.00

The proportion allocated to Control in this situation is very close to the expected value of 0.875. We can test this:

. cii 171 151

                                                         -- Binomial Exact --
    Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
             |        171    .8830409    .0245759         .825158    .9270753

The 95% confidence interval is consistent with 0.875. The same analysis for the Active group is:

. tab group if Active < Control

      group |      Freq.     Percent        Cum.
------------+-----------------------------------
     Active |        137       87.82       87.82
    Control |         19       12.18      100.00
------------+-----------------------------------
      Total |        156      100.00

. cii 156 137

                                                         -- Binomial Exact --
    Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
             |        156    .8782051    .0261849        .8163508    .9250541

So again the confidence interval includes the expected proportion 0.875.

Finally where the scores are tied, the group should be chosen at random:

. tab group if Active == Control

      group |      Freq.     Percent        Cum.
------------+-----------------------------------
     Active |         43       58.90       58.90
    Control |         30       41.10      100.00
------------+-----------------------------------
      Total |         73      100.00

. cii 73 43

                                                         -- Binomial Exact --
    Variable |        Obs        Mean    Std. Err.       [95% Conf. Interval]
-------------+---------------------------------------------------------------
             |         73    .5890411    .0575852        .4676846    .7029424

The confidence interval includes the expected value of 0.5.

Page updated 4 Oct 2017