Stata downloads

CRF data may be downloaded in Stata fixed format via the ‘Download’ link in the sidebar. The download page shows a list of forms in the CRF and provides links to download the data for each form individually or for all forms (as a zip file).

The data for each form is provided in Stata fixed format, which is a plain text file format with a dictionary ‘header’ that describes the format of the rows. Each row contains information from one saved form with a patientId field to identify the patient record it belongs to. The data can be easily imported into Stata using the infile command.

For example, to import the withdrawal data the following infile command would be used in Stata:

infile using SeWithdrawal.dct, clear
compress

where SeWithdrawal.dct is the full filesystem path to the downloaded file. The compress command is recommended to reduce the storage space allocated to each variable.

Stata download page

Example

The study entry data has been downloaded in Stata fixed format. There are two rows below the dictionary header because only data on two patients have been entered so far:

dictionary {
  long id
  long patientId `"Parent patient. Foreign key: patient table.id"'
  str244 userIdentifier `"User who created row"'
  str244 lastUserIdentifier `"User who last updated row"'
  long siteId `"Site patient recruited at"'
  str244 created `"Timestamp for row creation"'
  str244 updated `"Date & time of last update to row"'
  str244 reasonForEdit `"Reason for editing row"'
  str244 notes `"Notes"'
  str244 validationOverrides `"Justifications for overriding validation"'
  str244 validationStatus `"Validation status"'
  str244 validationNotes `"Validation notes"'
  str244 question1 `"1. Identifier"'
  str244 question2 `"2. Date of study entry. dd/mm/yyyy"'
  str244 question3 `"3. Sex"'
  str244 question4 `"4. Date of birth. dd/mm/yyyy"'
  str244 question5 `"5. Weight. kg. Number (up to 3 digits)"'
  str244 question6 `"6. Height. metres. Number to 2 decimal places"'
  str244 question7 `"7. Waist measurement. cm. Number (up to 5 digits)"'
  str244 question8 `"8. Medication"'
  str244 question9 `"9. Systolic Blood Pressure. mmHg. Number (up to 3 digits)"'
  str244 question10 `"10. Diastolic Blood Pressure. mmHg. Number (up to 3 digits)"'
  str244 question11 `"11. Rough intake of calories. calories (kcal). Number (up to 5 digits)"'
  str244 identifier "Patient identifier"
  str244 dateEnteredStudy "Date entered study"
  str244 dateWithdrew "Date withdrew"
  str244 siteName "Site"
  str244 countryName "Country"
}
1 1 "Preview Administrator account" "Preview Administrator account" 1 "2013-07-29 16:27:38" "2013-07-29 16:27:38" "" "" "{'question11':{'reason':'Correct value','error':'Please enter a value between 14000 and 25900.','value':'28000'}}" "Not validated" "" "J0085" "2013-06-12" "Female" "14/07/1982" "100" "1.6" "78" "None" "180" "" "28000" "J0085" "2013-06-12" "" "UCL" "United Kingdom"
2 2 "Preview Administrator account" "Preview Administrator account" 1 "2013-07-29 17:01:09" "2013-07-29 17:01:09" "" "" "{}" "Not validated" "" "JP0456" "2013-07-05" "Male" "19/03/1957" "104" "1.76" "97" "None" "160" "95" "24896" "JP0456" "2013-07-05" "" "UCL" "United Kingdom"

The data is imported and compressed, and the output from Stata’s describe command can be seen in the screenshot. The variable names and variable descriptions have been picked up automatically from the dictionary header.

Study entry data imported into Stata

Category variables are stored as strings so can be tabulated without needing variable labels. Category variables can be encoded if storage space is an issue.

Study entry data imported into Stata

Conversion notes

During conversion into Stata download format, note the following changes that are made to the data:

  • All strings are truncated at 244 characters
  • Newlines are replaced by spaces
  • Double quotes are replaced by single quotes
  • Dates and times are imported as strings in Stata. You can use Stata’s data conversion functions as required
Page updated 27 Jan 2014