A Stata program to tabulate clusters

By Tony Brady

xtab is a generalization of the standard Stata tabulate command, that performs one-way tabulations of longitudinal data.

Longitudinal data refers to information on clusters that is contained in multiple records. Examples are:

Cluster Record
Family Person (mother, father, child etc)
Country GDP by year
Patient Follow-up appointment

The records in two of these examples are ordered within cluster (follow-up appointment and GDP by year), but in the family example they are not. In Stata, longitudinal data that is ordered by time is called cross-sectional time-series (xt) data. xtab is suited to both ordered and unordered longitudinal data.


To follow this example in Stata type:

use http://www.sealedenvelope.com/stata/long.dta

in the Stata command window.

Patients in a clinical trial were regularly monitored. Systolic blood pressure (sbp) was measured at each visit and patients were asked whether they were currently taking beta-blockers (beta). Here's an extract of the data (long.dta):

idnum date sex sbp beta region
109 27 Feb 92 Male 180 No London
109 24 Sep 92 . 140 . London
109 25 Mar 93 . 156 Yes London
109 23 Sep 93 . 150 . London
110 27 Feb 92 Male 160 No Scotland
110 22 Oct 92 . 120 . Scotland
110 22 Apr 93 . 130 . Scotland
110 28 Oct 93 . 130 . Scotland
110 28 Apr 94 . 130 . Scotland
110 27 Oct 94 . 152 . Scotland
110 5 Jan 95 . 132 . Scotland
110 27 Apr 95 . 164 . Scotland
111 27 Feb 92 Male 130 Yes Scotland
112 27 Feb 92 Male 148 No Scotland
112 17 Dec 92 . 146 No Scotland

Longitudinal datasets must always contain a variable that identifies the clusters. In this example the variable is idnum, which contains a unique patient identifying number. All records with the same idnum belong to the same patient. This is the variable you should name in the i() option of xtab and other xt commands. Alternatively you can declare the unique cluster identifier to Stata upfront using the iis command. This is recommended because it means you don't have to keep typing the i() option every time you use xtab.

. iis idnum
. xtab sex

is equivalent to:

. xtab sex, i(idnum)

Either way, we get the following output:

xtab output

The tabulation is at the cluster level rather than individual record level. It tells us there are 15 clusters in this dataset; 14 men and one woman. It turns out that we get the same output from the usual Stata command:

. tab sex

because the sex variable is missing for all records except the first record within each cluster. This is not the case for the region variable, and using Stata's tabulate command gives very different results to xtab:


The xtab results tell us 11 patients are from Scotland, 3 are from London and 1 is from Leicester.

Static vs. dynamic variables

It's useful to distinguish between variables containing information that is constant within a cluster and those where the information can change within a cluster. We call these static and dynamic variables respectively.

In our example dataset idnum, sex and region are static variables, whilst all others are dynamic (date, sbp and beta).

The default behaviour of xtab is to tabulate the number of clusters where a value has ever appeared. This produces the kind of table we would naturally expect for static variables, like those we've already seen for sex and region. Missing values are ignored unless we specifically ask for them with the missing option.

When using xtab on dynamic variables, we need to remember that by default xtab is in 'ever' mode to interpret the output correctly:


Here we see that the numbers in the Yes and No categories of beta-blocker use sum to more than the total of 15. This is because some patients have either started or stopped using beta-blockers during the follow-up period. What we can say is that about a quarter of patients have used beta-blockers at some time during the trial. We might be interested in knowing how many patients have not taken beta-blockers at all during the trial:


So 11 patients have no experience of beta-blockers. The occasion() option can also be used to tabulate a particular record within the cluster. This is only relevant for dynamic variables. A common summary is of patient characteristics at baseline:

First record

Notice that the t() option is required since xtab needs to know how records are to be ordered within cluster to be able to choose the first record. The time variable can be specified in advance allowing the t() option to be omitted from xtab:

. tis date
. xtab beta, occasion(1)

The number of patients using beta-blockers at the end of the trial can be identified with the occasion(last) option:

Last record

We can see that the beta-blocker variable is missing for most patients on the last follow-up visit. Two patients were followed-up only once or twice. Tabulating beta-blocker use at the third follow-up visit therefore excludes these two patients from the total:

Third record


To obtain xtab type the following into Stata:

net from https://www.sealedenvelope.com/

and follow the instructions on screen. This will ensure the files are installed in the right place and you can easily uninstall the command later if you wish.