CRF data may be downloaded in either CSV or Stata fixed format via the Download link in the top menu. The download page shows a list of forms in the CRF and provides links to download the data for each form individually or for all forms (as a zip file).
A data dictionary can be viewed which shows the fields for each table (there is one table per form). The field name, data type and label are displayed.
Fields containing personally identifiable information (PII) that have been configured in the CRF builder to be stored in an encrypted format will be downloaded with AES-256 encryption applied. This means these fields cannot be viewed or analysed without decryption. Decryption can be carried out using common decryption tools such as OpenSSL. Decryption of downloaded data provides detailed instructions.
The data for each form is provided in comma separated value format, which is a plain text file that can be opened in many spreadsheet or Statistical programs. The first row contains a header with the question labels for each column.
Every file contains a patient identifier field (identifier
) and subject ID field (patientId
) so that data stored on the same subject in different forms can be linked together. In general the id
field should be ignored - it simply records the order forms were added to the database and is not related to the subject.
Subforms store the data from repeating sections of forms. These are downloaded as separate files from the parent form. Records should be linked to the parent form via the column labelled Parent record. Foreign key: <parent-table-name>.id
. This should be matched to the id
field in the parent table. Although subforms also contain the subject ID field, and this could be used to match records to the parent form, it is not recommended in case the parent form is repeatable.
The data for each form is provided in Stata fixed format, which is a plain text file format with a dictionary ‘header’ that describes the format of the rows. Each row contains information from one saved form with a subject identifier field to identify the subject record it belongs to. The data can be easily imported into Stata using the infile command.
For example, to import the data from a baseline form called Interviewers questions, the following infile command would be used in Stata:
infile using InterviewersQuestionsVER1_Baseline.dct, clear
compress
where InterviewersQuestionsVER1_Baseline.dct
is the full filesystem path to the downloaded file. The compress
command is recommended to reduce the storage space allocated to each variable.
Some interview data has been downloaded in Stata fixed format. There are two rows below the dictionary header because only data on two subjects have been entered so far:
dictionary {
str244 identifier `"Patient identifier"'
long id `"id"'
long patientId `"Subject id"'
str244 userIdentifier `"User who created row"'
str244 lastUserIdentifier `"User who last updated row"'
str244 created `"Timestamp for row creation (UTC)"'
str244 updated `"Date & time of last update to row (UTC)"'
str244 question1 `"Sex - Questions"'
str244 question2 `"Marital status - Questions"'
str244 question3 `"If other, please specify - Questions"'
str244 question4 `"Have you had any previous episodes of depression? - Depression"'
str244 question5 `"If so, how many - Depression. Number (up to 2 digits)"'
str244 question6 `"Duration of current episode in weeks - Depression. Number (up to 3 digits)"'
str244 question7 `"Are you using any treatments for depression at the moment? - Depression"'
str244 question8 `"Treatment/Medication Name - Depression"'
str244 reasonForEdit `"Reason for editing row"'
str244 notes `"Notes"'
str244 validationOverrides `"Justifications for overriding validation"'
str244 validationStatus `"Validation status"'
str244 validationNotes `"Validation notes"'
str244 _dateEntered `"Date of study entry yyyy-mm-dd"'
str244 _dateWithdrew `"Date of withdrawal from follow-up - Withdrawal."'
str244 _site `"Site"'
str244 _country `"Country"'
str244 _visit `"Visit"'
}
"T5617" 1 1 "Sealed Envelope support (ID 1)" "Sealed Envelope support (ID 1)" "2016-03-23 11:36:19" "2016-03-23 11:36:19" "Male" "Partner - Living with" "" "Yes" "3" "3" "No" "" "" "" "{}" "Not validated" "" "2015-12-27" "" "1: UCL" "United Kingdom" "Baseline"
"T1719" 2 2 "Sealed Envelope support (ID 1)" "Sealed Envelope support (ID 1)" "2016-03-23 12:51:18" "2016-03-23 12:51:18" "Female" "Married" "" "No" "" "2" "No" "" "" "" "{}" "Not validated" "" "2016-01-31" "" "1: UCL" "United Kingdom" "Baseline"
The data is imported and compressed, and the output from Stata’s describe
command can be seen in the screenshot. The variable names and variable descriptions have been picked up automatically from the dictionary header.
Category variables are stored as strings so can be tabulated without needing variable labels. Category variables can be encoded if storage space is an issue.
This format provides a pair of Stata files per form: the raw data and a .do file to process the data. The data is imported by running the .do file within Stata. There are some differences to the Stata format described above to make analysis more convenient: categorical variables are stored as numeric values with value labels attached, and additional numeric variables are created for date fields.