By Tony Brady
xtab is a generalization of the standard Stata
tabulate
command, that performs one-way tabulations
of longitudinal data.
Longitudinal data refers to information on clusters that is contained in multiple records. Examples are:
Cluster | Record |
---|---|
Family | Person (mother, father, child etc) |
Country | GDP by year |
Patient | Follow-up appointment |
The records in two of these examples are ordered within cluster (follow-up appointment and GDP by year), but in the family example they are not. In Stata, longitudinal data that is ordered by time is called cross-sectional time-series (xt) data. xtab is suited to both ordered and unordered longitudinal data.
To follow this example in Stata type:
use http://www.sealedenvelope.com/stata/long.dta
in the Stata command window.
Patients in a clinical trial were regularly monitored. Systolic
blood pressure (sbp
) was measured at each visit and
patients were asked whether they were currently taking
beta-blockers (beta
). Here's an extract of the data
(long.dta):
idnum | date | sex | sbp | beta | region |
---|---|---|---|---|---|
109 | 27 Feb 92 | Male | 180 | No | London |
109 | 24 Sep 92 | . | 140 | . | London |
109 | 25 Mar 93 | . | 156 | Yes | London |
109 | 23 Sep 93 | . | 150 | . | London |
110 | 27 Feb 92 | Male | 160 | No | Scotland |
110 | 22 Oct 92 | . | 120 | . | Scotland |
110 | 22 Apr 93 | . | 130 | . | Scotland |
110 | 28 Oct 93 | . | 130 | . | Scotland |
110 | 28 Apr 94 | . | 130 | . | Scotland |
110 | 27 Oct 94 | . | 152 | . | Scotland |
110 | 5 Jan 95 | . | 132 | . | Scotland |
110 | 27 Apr 95 | . | 164 | . | Scotland |
111 | 27 Feb 92 | Male | 130 | Yes | Scotland |
112 | 27 Feb 92 | Male | 148 | No | Scotland |
112 | 17 Dec 92 | . | 146 | No | Scotland |
Longitudinal datasets must always contain a variable that
identifies the clusters. In this example the variable is
idnum
, which contains a unique patient identifying
number. All records with the same idnum
belong to
the same patient. This is the variable you should name in the
i()
option of xtab and other xt
commands. Alternatively you can declare the unique cluster
identifier to Stata upfront using the iis command. This
is recommended because it means you don't have to keep typing
the i()
option every time you use xtab.
. iis idnum . xtab sex
is equivalent to:
. xtab sex, i(idnum)
Either way, we get the following output:
The tabulation is at the cluster level rather than individual record level. It tells us there are 15 clusters in this dataset; 14 men and one woman. It turns out that we get the same output from the usual Stata command:
. tab sex
because the sex
variable is missing for all
records except the first record within each cluster. This is not
the case for the region
variable, and using Stata's
tabulate command gives very different results to
xtab:
The xtab results tell us 11 patients are from Scotland, 3 are from London and 1 is from Leicester.
It's useful to distinguish between variables containing information that is constant within a cluster and those where the information can change within a cluster. We call these static and dynamic variables respectively.
In our example dataset idnum
, sex
and
region
are static variables, whilst all others are
dynamic (date
, sbp
and
beta
).
The default behaviour of xtab is to tabulate the number
of clusters where a value has ever appeared. This
produces the kind of table we would naturally expect for static
variables, like those we've already seen for sex
and region
. Missing values are ignored unless we
specifically ask for them with the missing
option.
When using xtab on dynamic variables, we need to remember that by default xtab is in 'ever' mode to interpret the output correctly:
Here we see that the numbers in the Yes and No categories of beta-blocker use sum to more than the total of 15. This is because some patients have either started or stopped using beta-blockers during the follow-up period. What we can say is that about a quarter of patients have used beta-blockers at some time during the trial. We might be interested in knowing how many patients have not taken beta-blockers at all during the trial:
So 11 patients have no experience of beta-blockers. The
occasion()
option can also be used to tabulate a
particular record within the cluster. This is only relevant for
dynamic variables. A common summary is of patient
characteristics at baseline:
Notice that the t()
option is required since
xtab needs to know how records are to be ordered within
cluster to be able to choose the first record. The time variable
can be specified in advance allowing the t()
option
to be omitted from xtab:
. tis date . xtab beta, occasion(1)
The number of patients using beta-blockers at the end of the
trial can be identified with the occasion(last)
option:
We can see that the beta-blocker variable is missing for most patients on the last follow-up visit. Two patients were followed-up only once or twice. Tabulating beta-blocker use at the third follow-up visit therefore excludes these two patients from the total:
To obtain xtab type the following into Stata:
net from https://www.sealedenvelope.com/
and follow the instructions on screen. This will ensure the files are installed in the right place and you can easily uninstall the command later if you wish.