Randomise v2.0Permutation-based nonparametric inferencedeveloped in collaboration with Thomas Nichols intro - using randomise - examples - theory - references | ![]() |
Permutation methods (also known as randomisation methods) are used for inference (thresholding) on statistic maps when the null distribution is not known. The null distribution is unknown because either the noise in the data does not follow a simple distribution, or because non-statandard statistics are used to summarize the data. randomise is a simple permutation program enabling modelling and inference using standard GLM design setup as used for example in FEAT. It can output voxelwise and cluster-based tests, and also offers variance smoothing as an option. For more detail on permutation testing in neuroimaging see Nichols and Holmes (2002).
Test Statistics in Randomise
randomise produces a test statistic image
(e.g., tstat1
) and sets of P-value images (stored as 1-P
for more convenient visualization, as bigger is then "better"). The
table below shows the filename suffices for each of the different test
statistics available.
Voxel-wise uncorrected P-values are only appropriate when a single voxel is selected a priori (i.e., you don't need to worry about multiple comparisons across voxels). The significance of suprathreshold clusters (defined by the cluster-forming threshold) can be assessed either by cluster size or cluster mass. Size is just cluster extent measured in voxels. Mass is the sum of all statistic values within the cluster. Cluster mass has been reported to be more sensitive than cluster size (Bullmore et al, 1999; Hayasaka & Nichols, 2003).
Accounting for Repeated Measures
Permutation tests do not easily accommodate correlated datasets (e.g., temporally smooth timeseries), as null-hypothesis exchangeability is essential. However, the case of "repeated measurements", or more than one measurement per subject in a multisubject analysis, can sometimes be accommodated.
randomise allows the definition of exchangeability blocks, as
specified by the Confound Regressors
Unlike with the previous version of randomise, you no longer need
to treat confound regressors in a special way (e.g. putting them in
a separate design matrix). You can now include them in the main
design matrix, and randomise will work out from your contrasts how
to deal with them. For each contrast, an "effective regressor" is
formed using the original full design matrix and the contrast, as
well as a new set of "effective confound regressors", which are then
pre-removed from the data before the permutation testing begins. One
side-effect of the new, more powerful, approach is that the full set
of permutations is run for each contrast separately, increasing the
time that randomise takes to run.
More information on the theory behind randomise can be found
in the Background Theory section below.
A typical simple call to randomise uses the following
syntax:
There are two programs that make it easy to create the design
matrix and contrast files randomise has the following thresholding/output options:
"FWE-corrected" means that the family-wise error rate is
controlled. If only FWE-corrected P-values less than 0.05 are accepted,
the chance of one more false positives occurring over space space is
no more than 5%. Equivalently, one has 95% confidence of no false
positives in the image.
Note that these output images are 1-P images, where a value of 1 is
therefore most significant (arranged this way to make display and
thresholding convenient). Thus to "threshold at p<0.01", threshold
the output images at 0.99 etc.
If your design is simply all 1s (for example, a single group of
subjects) then randomise needs to work in a different way. Normally it
generates random samples by randomly permuting the rows of the design;
however in this case it does so by randomly inverting the sign of the
1s. In this case, then, instead of specifying design and contrast
matrices on the command line, use the You can potentially improve the estimation of the variance that
feeds into the final "t" statistic image by using the variance
smoothing option If randomise hangs or crashes when initially reading in your data,
the problem may be large data (or small amount of RAM). In this case
using the One-Sample T-test.
To perform a nonparametric 1-sample t-test (e.g., on COPEs created
by FEAT FMRI analysis), create a 4D image of all of the images.
There should be no repeated measures, i.e., there should only be one
image per subject. Because this is a single group simple design you
don't need a design matrix or contrasts. Just use: If you have fewer than 20 subjects (approx. 20 DF), then you will
usually see an increase in power by using variance smoothing, as in
Note also that randomise will automatically select one-sample mode for
appropriate design/contrast combinations.
Two-Sample Unpaired T-test
To perform a nonparametric 2-sample t-test, create
4D image of all of the images, with the subjects in the right order!
Create appropriate design.mat and design.con files.
Once you have your design files run:
Two-Sample Unpaired T-test with nuisance variables.
To perform a nonparametric 2-sample t-test in the presence of
nuisance variables, create a 4D image of all of the images. Create
appropriate Once you have your design files the call is as before:
Repeated measures ANOVA
Following
the ANOVA:
1-factor 4-levels (Repeated Measures) example from the FEAT
manual, assume we have 2 subjects with 1 factor at 4 levels. We
therefore have eight input images and we want to test if there is any
difference over the 4 levels of the factor. The design matrix looks
like
Create
a The number of permutations can be computed for each group, and then
multiplied together to find the total number of permutations. We use
the ANOVA computation for 4 levels, and hence (1+1+1+1)!/1!/1!/1!/1!
= 24 possible permutations for one subject, and hence 24 × 24
= 576 total permutations.
The call is then similar to the above examples: A standard nonparametric test is exact, in that the false
positive rate is exactly equal to the specified α level. Using
randomise with a GLM that corresponds to one of the following
simple statistical models will result in exact inference:
Permutation tests for the General Linear Model
For an arbitrary GLM randomise uses the method of
Kennedy (1995, as cited in Anderson & Robinson, 2001).
Based on the contrast (or set of contrasts defining an F
test), the design matrix is automatically partitioned into tested
effects and nuisance (confound) effects.
It is no longer necessary for you to create a separate confound design
matrix; just include the nuisance regressors (EVs) in your main design
matrix, as you would in FEAT.
The data are first fit to the nuisance effects alone and
nuisance-only residuals are formed. These residuals are permuted,
creating
an (approximate) realization of data under the null hypothesis. This
realization is fit to the full model and the desired test
statistic is computed as usual. This process is repeated to build a
distribution of test statistics equivalent under the null hypothesis
specified by the contrast(s). For the simple models above, this
method is equivalent to the standard exact tests; otherwise, it
accounts for nuisance variation present under the null.
This approximate permutation test is asymptotically exact, meaning that
the results become more accurate with an ever-growing sample size
(for a fixed number of regressors). For large sample sizes, with
50-100 or more degrees of freedom, the P-values should be highly
accurate. When the sample size is low and there are many
nuisance regressors, accuracy could be a problem. (The accuracy is
easily assessed by generating random noise data and fitting it to your
design; the uncorrected P-values should be uniformly spread between
zero and one; the test will be invalid if there is an excess of small
P-values and conservative if there is a deficit of small P-values.)
Monte Carlo Permutation Tests
A proper "exact" test arises from evaluating every possible
permutation. Often this is not feasible, e.g., a simple correlation
with 12 scans has nearly a half a billion possible permutations.
Instead, a random sample of possible permutations can be used,
creating a Monte Carlo permutation test. On average the Monte Carlo
test is exact and will give similar results to carrying out all
possible permutations.
If the number of possible permutations is
large, one can show that a true, exhaustive P-value of p will
produce Monte Carlo P-values between p ±
2√(p(1-p)/n) about 95% of the time,
where n is the number of Monte Carlo permutations.
The table below shows confidence limits for p=0.05 for various
n. At least 5,000 permutations are required to reduce the
uncertainty appreciably, though 10,000 permutations are required to
reduce the margin-of-error to below 10% of the nominal alpha.
Counting Permutations
Exchangeabilty under the null hypothesis justifies the permutation
of the data. For n scans, there are n! (n factorial,
n×(n-1)×(n-2)×...×2) possible ways of
shuffling the data. For some designs, though, many of these shuffles
are redundant. For example, in a two-sample t-test, permuting two
scans within a group will not change the value of the test statistic.
The number of possible permutations for different designs are given
below.
MJ Anderson & J Robinson.
Permutation Tests for Linear Models.
Aust. N.Z. J. Stat. 43(1):75-88, 2001.
Bullmore, ET and Suckling, J and Overmeyer, S and
Rabe-Hesketh, S and Taylor, E and Brammer, MJ
Global, voxel, and cluster tests, by theory and
permutation, for a difference between two groups of
structural MR images of the brain.
IEEE TMI, 18(1):32-42, 1999.
S Hayasaka & TE Nichols.
Validating cluster size inference: random field and permutation methods.
NeuroImage, 20:2343-2356, 2003
PE Kennedy.
Randomization tests in econometrics.
J. Bus. Econom. Statist. 13:84-95, 1995.
TE Nichols and AP Holmes.
Nonparametric Permutation Tests for Functional Neuroimaging: A
Primer with Examples.
Human Brain Mapping, 15:1-25, 2002.
group_labels
option. If specfied, the
program will only permute observations within block, i.e., only
observations with the same group label will be exchanged. See
repeated measures example below for more
detail.
USING randomise
randomise -i <4D_input_data> -o <output_rootname> -d design.mat -t
design.con -m <mask_image> -n 5000 -D
design.mat
and design.con
are text files
containing the design matrix and list of contrasts required; they
follow the same format as generated by FEAT (see below for
examples). The -n 5000
option tells randomise to generate
5000 permutations of the data when building up the null distribution
to test against. The -D
option tells randomise to demean
the data before continuing - this is necessary if you are not
modelling the mean in the design matrix.
design.mat / design.con
. The
first is the Glm
GUI which allows the specification of
designs in the same way as in FEAT, and the second is a simple script
to allow you to easily generate design files for the two-group
unpaired t-test case, called design_ttest2
.
These filename extensions are summarized in table below.
<output>_vox_tstat
<output>_max_tstat
<output>_maxc_tstat
/ <output>_maxc_fstat
To use this option,
use -c <thresh>
for t contrasts and -F
<thresh>
for F contrasts, where the threshold is used
to form supra-threshold clusters of voxels.
<output>_maxcmass_tstat
/
<output>_maxfmass_fstat
To use this option,
use -C <thresh>
for t contrasts and
-S <thresh>
for F contrasts.
-tfce
option in fslmaths
to test this
on an existing stats image. See
the TFCE
research page for more information. For the moment we recommend
leaving the default TFCE parameters unchanged.
Voxel-wise
Cluster-wise
TFCE
Extent
Mass
Test statistic
_tstat
_fstat
n/a
n/a
_tfce_tstat
1 - Uncorrected P
_vox_tstat
_vox_fstat
n/a
n/a
n/a
1 - FWE-Corrected P
_max_tstat
_max_fstat
_maxc_tstat
_maxc_fstat
_maxcmass_tstat_
_maxcmass_tstat_
_max_tfce_tstat
-1
option.
-v <std>
where you need to specify
the spatial extent of the smoothing in mm.
-M
option may help. This uses a slower, but
more memory efficient way to load the data.
EXAMPLES
randomise -i OneSamp4D -o OneSampT -1
Note you do not need the -D
option (as the mean is in the
model), and omit the -n
option, so that 5000 permutations
will be performed.
randomise -i OneSamp4D -o OneSampT -1 -v 5
which does a 5mm HWHM variance smoothing.
randomise -i TwoSamp4D -o TwoSampT -d design.mat -t design.con -m mask
design.mat
and design.con
files,
where your design matrix has additional nuisance variables that are
(appropriately) ignored by your contrast.
randomise -i TwoSamp4D -o TwoSampT -d design.mat -t design.con -m mask
1 0 1 0 0
1 0 0 1 0
1 0 0 0 1
1 0 0 0 0
0 1 1 0 0
0 1 0 1 0
0 1 0 0 1
0 1 0 0 0
where the first two columns model subject means and the 3rd through
5th column model the categorical effect (Note the different
arrangement of rows relative to the FEAT example). Three t-contrasts
for the categorical effect
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
are selected together into a single F-contrast
1 1 1
group.dat
file for the grouping variable that looks like
1
1
1
1
2
2
2
2
This will ensure that permutations will only occur within subject,
respecting the repeated measures structure of the data.
randomise -i TwoSamp4D -o TwoSampT -d design.mat -t design.con
-f design.fts -m mask -g group.dat
BACKGROUND THEORY
Use of almost any other GLM will result in approximately exact
inference. In particular, when the model includes both the effect
tested (e.g., difference in FA between two groups) and nuisance
variables (e.g., age), exact tests are not generally available.
Permutation tests rely on an assumption of exchangeability; with the
models above, the null hypothesis implies complete exchangeability of
the observations. When there are nuisance effects, however, the null
hypothesis no longer assures the exchangeability of the data
(e.g. even when the null hypothesis of no FA difference is true, age
effects imply that you can't permute the data without altering the
structure of the data).
n
Confidence limits
for p=0.05
100
0.0500 ± 0.0436
1,000
0.0500 ± 0.0138
5,000
0.0500 ± 0.0062
10,000
0.0500 ± 0.0044
50,000
0.0500 ± 0.0019
-n
option. If this number is greater than or equal
to the number of possible permutations, an exhaustive test is run. If
it is less than the number of possible permutations a Monte Carlo
permutation test is performed. The default is 5000, though if time
permits, 10000 is recommended.
Model
Sample Size(s)
Number of Permutations
One sample t-test on
difference measuresn
2n
Two sample t-test
n1,n2
(n1+n2)! / ( n1! ×
n2! )
One-way ANOVA
n1,...,nk
(n1+n2+ ... + nk)!
/
( n1! × n2! × ... × nk! )
Simple correlation
n
n!
REFERENCES
Copyright © 2004-2007, University of
Oxford. Written by T. Behrens, S. Smith, M. Webster and
T. Nichols.