CRF data may be downloaded in Stata fixed format via the ‘Download’ link in the sidebar. The download page shows a list of forms in the CRF and provides links to download the data for each form individually or for all forms (as a zip file).
The data for each form is provided in Stata fixed format, which is a plain text file format with a dictionary ‘header’ that describes the format of the rows. Each row contains information from one saved form with a patientId
field to identify the patient record it belongs to.
The data can be easily imported into Stata using the infile command.
For example, to import the withdrawal data the following infile command would be used in Stata:
infile using SeWithdrawal.dct, clear
compress
where SeWithdrawal.dct
is the full filesystem path to the downloaded file. The compress
command is recommended to reduce the storage space allocated to each variable.
The study entry data has been downloaded in Stata fixed format. There are two rows below the dictionary header because only data on two patients have been entered so far:
dictionary {
long id
long patientId `"Parent patient. Foreign key: patient table.id"'
str244 userIdentifier `"User who created row"'
str244 lastUserIdentifier `"User who last updated row"'
long siteId `"Site patient recruited at"'
str244 created `"Timestamp for row creation"'
str244 updated `"Date & time of last update to row"'
str244 reasonForEdit `"Reason for editing row"'
str244 notes `"Notes"'
str244 validationOverrides `"Justifications for overriding validation"'
str244 validationStatus `"Validation status"'
str244 validationNotes `"Validation notes"'
str244 question1 `"1. Identifier"'
str244 question2 `"2. Date of study entry. dd/mm/yyyy"'
str244 question3 `"3. Sex"'
str244 question4 `"4. Date of birth. dd/mm/yyyy"'
str244 question5 `"5. Weight. kg. Number (up to 3 digits)"'
str244 question6 `"6. Height. metres. Number to 2 decimal places"'
str244 question7 `"7. Waist measurement. cm. Number (up to 5 digits)"'
str244 question8 `"8. Medication"'
str244 question9 `"9. Systolic Blood Pressure. mmHg. Number (up to 3 digits)"'
str244 question10 `"10. Diastolic Blood Pressure. mmHg. Number (up to 3 digits)"'
str244 question11 `"11. Rough intake of calories. calories (kcal). Number (up to 5 digits)"'
str244 identifier "Patient identifier"
str244 dateEnteredStudy "Date entered study"
str244 dateWithdrew "Date withdrew"
str244 siteName "Site"
str244 countryName "Country"
}
1 1 "Preview Administrator account" "Preview Administrator account" 1 "2013-07-29 16:27:38" "2013-07-29 16:27:38" "" "" "{'question11':{'reason':'Correct value','error':'Please enter a value between 14000 and 25900.','value':'28000'}}" "Not validated" "" "J0085" "2013-06-12" "Female" "14/07/1982" "100" "1.6" "78" "None" "180" "" "28000" "J0085" "2013-06-12" "" "UCL" "United Kingdom"
2 2 "Preview Administrator account" "Preview Administrator account" 1 "2013-07-29 17:01:09" "2013-07-29 17:01:09" "" "" "{}" "Not validated" "" "JP0456" "2013-07-05" "Male" "19/03/1957" "104" "1.76" "97" "None" "160" "95" "24896" "JP0456" "2013-07-05" "" "UCL" "United Kingdom"
The data is imported and compressed, and the output from Stata’s describe
command can be seen in the screenshot. The variable names and variable descriptions have been picked up automatically from the dictionary header.
Category variables are stored as strings so can be tabulated without needing variable labels. Category variables can be encoded if storage space is an issue.
During conversion into Stata download format, note the following changes that are made to the data: