C0316_PDF_C03
CHAPTER 3Two-Sided Tests:General Applications3.1 A Unified FormulationIn Chapter 2 we presented two-sided tests for a two-treatment comparison whennormally distributed observations of common known variance are collected inequally sized groups.The tests were defined in terms of significance levels fortesting the null hypothesis after each new group of observations,and power wasguaranteed by setting the maximum sample size as a multiple of that requiredby a fixed sample test.We shall see that group sequential tests described in thismannercanbeappliedtoothertestingproblemswithlittleornomodification.Thereason for this lies in the common form of the joint distribution of the sequence ofstandardized test statistics,Z1,.,ZK.Suppose a group sequential study with up to K analyses yields the sequence oftest statistics Z1,.,ZK.We say that these statistics have the canonical jointdistribution with information levels I1,.,IK for the parameter if:(i)(Z1,.,ZK)is multivariate normal,(ii)E(Zk)=Ik,k=1,.,K,and(iii)Cov(Zk1,Zk2)=(Ik1/Ik2),1 k1 k2 K.(3.1)Note this implies that Z1,.,ZK is a Markov sequence,and this will beimportant in simplifying calculations for group sequential tests.In this section weshow that several common testing problems give rise to sequences of test statisticswith this joint distribution.It should be noted that(3.1)specifies the conditional distribution ofZ1,.,ZK given I1,.,IK.When,for some reason,there is an elementof randomness in the sample sizes observed at analyses 1 to K,we considerproperties of tests conditional on the sequence of Ikvalues actually observed.Thus,as a group sequential test progresses,it is important to ensure that the valueof Ikis not influenced by the values of statistics Z1,.,Zk1seen previously,asthis could destroy the property(3.1)on which error rate calculations are based;this is a subtle point which need not trouble us now,but we shall consider it ingreater depth in Section 7.4 and Chapter 17.The following examples illustrate a general result that sequences ofstandardized test statistics obtained from maximum likelihood estimates of aparameter in a normal linear model follow the canonical joint distribution(3.1).We shall discuss this result in Section 3.4 and prove it later in Section 11.3.The general result also extends to the case of correlated observations,hence thesame joint distribution arises in the group sequential analysis of longitudinal datac?2000 by Chapman&Hall/CRCin which repeated measurements on the same subject are correlated.We shallsee in Section 11.6 that(3.1)arises,approximately,in the maximum likelihoodanalysis of data following many parametric models,including generalized linearmodels.Further,(3.1)applies asymptotically in other special situations,includingthe analysis of right-censored survival data using a group sequential log-ranktest or by repeatedly fitting Coxs proportional hazards regression model,and theanalysis of accumulating data in multiple 2 2 tables using a repeated Mantel-Haenszel test.3.1.1 Parallel Two-Treatment ComparisonWe generalize the example of Chapter 2 and allow different variances andunequal numbers of subjects on the two treatment arms.Thus,we have responsesXAi N(A,2A),i=1,2,.,for subjects allocated to treatment A andXBi N(B,2B),i=1,2,.,for those on treatment B.For k=1,.,K,letnAkand nBkdenote the cumulative number of observations on treatments A andB,respectively,at the time of the kth analysis.The natural estimate of ABisX(k)AX(k)B=1nAknAk?i=1XAi1nBknBk?i=1XBiN(A B,2AnAk+2BnBk).Note that,here,the stated distribution ofX(k)AX(k)Bis simply the marginaldistribution for given nAkand nBk,not,for example,the conditional distributionofX(k)AX(k)Bgiven that the test continues up to analysis k,which arises incalculations of a group sequential tests error probabilities.We shall follow thesame practice elsewhere,so,unless explicitly stated otherwise,the“distribution”of an estimate or test statistic should be taken to mean its marginal distribution.We define the information for A Bto beIk=(2A/nAk+2B/nBk)1,the reciprocal of our estimates variance,and use this to create the standardizedstatistic at analysis k for testing H0:A=B,Zk=(X(k)AX(k)B)Ik,k=1,.,K.The vector(Z1,.,ZK)is multivariate normal since each Zkis a linearcombination of the independent normal variates XAiand XBi,i=1,2,.,andmarginally,Zk N(Ik,1),k=1,.,K,where=AB.Itremainstoestablishthecovarianceofthe Zks.Fork1 k2,Cov(Zk1,Zk2)=Cov(X(k1)AX(k1)BIk1,X(k2)AX(k2)BIk2)=(1nAk11nAk2nAk12A+1nBk11nBk2nBk12B)Ik1Ik2=(Ik2)1Ik1Ik2=(Ik1/Ik2),c?2000 by Chapman&Hall/CRCas required.Thus,Z1,.,ZK have the canonical joint distribution withinformation levels I1,.,IK for =A B.3.1.2 Testing the Mean of a Single PopulationSuppose observations Xi N(,2),i=1,2,.,are independent,2isknown,and we wish to test the hypothesis H0:=0.If nkobservations areavailable at analysis k,we estimate byX(k)=1nknk?i=1Xi N(,2nk)and define Ik=Var(X(k)1=nk/2,the information for at analysis k.Thestandardized test statistics are thenZk=(X(k)0)Ik,k=1,.,K.Since each Zkis a linear combination of the independent normal variates Xi,the vector(Z1,.,ZK)is multivariate normal.Marginally,Zk N(Ik,1),k=1,.,K,where =0.Finally,for k1 k2,Cov(Zk1,Zk2)=Cov(X(k1)0Ik1,X(k2)0Ik2)=1nk11nk2nk12Ik1Ik2=(Ik1/Ik2)and we see that Z1,.,ZK have the canonical joint distribution withinformation levels I1,.,IK for =0.3.1.3 Paired Two-Treatment ComparisonIn a two-treatment comparison,it can be advantageous to control the variancein response attributable to known prognostic factors by using a“matched pairs”design.Subjects are paired so that both subjects in the same pair have similarvalues of the prognostic factors;one subject in each pair is randomly selected toreceive treatment A and the other receives treatment B.Let XAiand XBidenotethe responses of the subjects in pair i receiving treatments A and B,respectively.We suppose the differences within pairs are normally distributed,XAi XBi N(A B,2),i=1,2,.,(3.2)and the variance,2,is known.If the variance of an individual observation is 2and the correlation between the responses of subjects in the same pair is,then 2=2(1)2.Thus,ifmatchingofsubjectswithinpairsachievesamoderatelylarge positive correlation,2can be significantly less than the variance,22,ofthe difference in response between two randomly selected subjects allocated totreatments A and B.We consider the problem of testing the null hypothesis H0:A=Bin a group sequential test where observations are taken in up to Kc?2000 by Chapman&Hall/CRCgroups.If nkpairs of observations are available at the kth analysis,we estimate=A Bby1nknk?i=1(XAi XBi)N(A B,2nk),the information for is Ik=nk/2,the reciprocal of this estimates variance,andthe standardized test statistic isZk=1(nk 2)nk?i=1(XAi XBi),k=1,.,K.(3.3)The differences XAi XBiplay the same role as the observations Xiin thesingle-population situation of Section 3.1.2,and essentially the same argumentshows that Z1,.,ZK have the canonical joint distribution with informationlevels I1,.,IK for =A B.3.1.4 Two-Period Crossover TrialIf between-patient variation is high but it is possible to observe a single subjectsresponse to more than two treatments,the necessary sample size may be reducedby using a crossover design in which inferences are based on within-patientcomparisons.Here,we describe group sequential analysis of a two-treatmentcomparison in a two-period crossover trial.General results for group sequentialanalysis of linear models,which we shall present in Section 3.4,can be applied togive a similar sequential treatment of more complex crossover designs.In a two-treatment,two-period crossover trial each subject is allocatedtreatments A and B in a randomly chosen order.After a subjects response toeach treatment has been observed,one calculates the difference(Response ontreatment A)(Response on treatment B).Let Xi,i=1,2,.,denote the valuesof this difference for subjects receiving treatment A first and Yi,i=1,2,.,thevalues for subjects who receive treatment B first.The usual normal model is thenXi N(+,2)and Yi N(,2),i=1,2,.,where represents thetreatment difference and a period effect.We consider a group sequential test ofH0:=0 when the variance 2is assumed known.For each k=1,.,K,suppose observed values X1,.,XnXkandY1,.,YnYkare available at the kth analysis.We can estimate by12?X(k)+Y(k)?=12?1nXknXk?i=1Xi+1nYknYk?i=1Yi?N(,24nXk+24nYk),the information for is the reciprocal of the above variance,Ik=4?2nXk+2nYk?1,(3.4)c?2000 by Chapman&Hall/CRCand the standardized statistic for testing H0:=0 isZk=12(X(k)+Y(k)Ik.(3.5)It is straightforward to check that(Z1,.,ZK)is multivariate normal,Zk N(Ik,1),k=1,.,K,andCov(Zk1,Zk2)=(Ik1/Ik2),1 k1 k2 K.So we see,once again,that Z1,.,ZK have the canonical joint distributionwith information levels I1,.,IK for.3.2 Applying the Tests with Equal Group Sizes3.2.1 General Form of the TestsSuppose it is required to test a null hypothesis H0:=0 with two-sided Type Ierror probability and power 1 at =.We consider group sequentialtests in which up to K analyses are permitted and standardized statistics Zk,k=1,.,K,are available at these analyses.For a general testing problem,weassume Z1,.,ZK have the canonical joint distribution(3.1)with informationlevels I1,.,IK for.We start by considering the case of equal group sizeswhich produce equally spaced information levels.When a fixed sample test is based on a standardized statistic with distributionZ N(I,1),TypeIerror isobtainedbyrejecting H0if|Z|?1(1/2).Here?denotes the standard normal cdf.Setting I equal toIf,2=?1(1 /2)+?1(1 )2/2(3.6)ensuresthat E(Z)=?1(1/2)+?1(1)for=and,hence,power1 is attained at =.The subscript 2 in If,2is used to distinguish the two-sided case from the one-sided case,which we shall consider in Chapter 4.A groupsequential test requires a larger maximum sample size and we set a maximuminformation level,Imax=R If,2,where R is greater than one and depends on K,and the type of group sequential boundary being used.With equally spacedinformation levels,we then haveIk=(k/K)Imax=(k/K)R If,2=kRK?1(1 /2)+?1(1 )22,k=1,.,K.Under H0:=0,the standardized statistics Z1,.,ZK have the null jointdistribution:(i)(Z1,.,ZK)is multivariate normal,(ii)E(Zk)=0,k=1,.,K,and(iii)Cov(Zk1,Zk2)=(k1/k2),1 k1 k2 K.(3.7)c?2000 by Chapman&Hall/CRCThe following alternate distribution arises when =:(i)(Z1,.,ZK)is multivariate normal,(ii)E(Zk)=?1(1/2)+?1(1)(kR/K),k=1,.,K,and(iii)Cov(Zk1,Zk2)=(k1/k2),1 k1 k2 K.(3.8)Each of the tests introduced in Chapter 2 has the form:After group k=1,.,K 1if|Zk|ckstop,reject H0otherwisecontinue to group k+1,after group Kif|ZK|cKstop,reject H0otherwisestop,accept H0.(3.9)Since the null hypothesis is rejected at stage k if|Zk|,a standard normal variateunder H0,exceeds the critical value ck,such tests can be interpreted as“repeatedsignificance tests”,applying the two-sided significance level 21?1(ck)tothe data at analysis k.A tests Type I error probability isPr|Zk|ckfor some k=1,.,K,evaluated when Z1,.,ZK follow the null distribution(3.7).Each type of test,Pocock,OBrien&Fleming,etc.,uses a different sequence of critical values,c1,.,cK,but all are chosen to ensure the Type I error probability is equal tothe specified value,when(3.7)holds.The power of the test at =isPr?K?k=1?|Zj|ckusing the original sequence of critical valuesc1,.,cK.Table 3.1 shows the Type I error probability and power actually achieved bythesetestsforavarietyofsequencesofcumulativesamplesizesoneachtreatment,n1,.,n5.Results were obtained by numerical computation.The first examplefor each test follows the original design,Type I error is exactly 0.05 and power atAB=1isalittlegreaterthan0.9becauseofupwardroundingofthegroupsizes to integer values.Examples 2 and 3 for each test have equal group sizes and,since Z1,.,Z5 still have the standard null joint distribution(3.7),Type I erroris equal to 0.05;power is affected by the group sizes,increasing as nKincreases.In Examples 4 and 5,the final sample size is exactly as planned and power is closeto its value in Example 1 for each test.The unequal group sizes have a greatereffect on the Type I error:if the cumulative sample sizes are bunched together,asin each Example 4,test statistics at the five analyses are more highly correlatedand the Type I error rate decreases,whereas the patterns of cumulative samplesizes in each Example 5 reduce correlations and increase Type I error.The effectson Type I error are smallest for the OBrien&Fleming test in which values ckarehigh for small k and there is little probability of rejecting H0at the early analyses.Finally,Examples 6 and 7 for each test show haphazard variations in group size;in each case the Type I error probability is very close to 0.05 and,since the finalsample size is close to its design value,power is close to 0.9.It is evident from(3.1)that the observed information levels I1,.,IKdeterminethejointdistributionofZ1,.,ZKand,hence,atestsattainedTypeIerror probability and power.Thus,the results of Table 3.1 also apply to otherapplications of the same tests when the same sequences I1,.,I5 arise.Table 3.2 shows further results from a systematic study of the effects ofvariations in observed information on attained error probabilities.ExperimentsweredesignedforPocockorOBrien&FlemingtestswithTypeIerrorprobabilityc?2000 by Chapman&Hall/CRCTable3.1TypeIerrorprobabilityandpowerachievedbygroupsequential tests when group sizes are not equal to their designvalues.Tests are two-treatment comparisons with 2=4 andtotal numbers of observations in the first k groups on eachtreatment arm nAk=nBk=nk,k=1,.,K.The firstexample for each test has group sizes equal to their designvalues.Pocock testnk;k=1,.,KType I errorPower atAB=1121,42,63,84,1050.0500.910218,36,54,72,900.0500.860323,46,69,92,1150.0500.934430,50,55,86,1050.0460.909512,31,57,81,1050.0540.909613,42,56,78,990.0510.892726,40,63,96,1100.0490.923OBrien&Fleming testnk;k=1,.,KType I errorPower atAB=1118,36,54,72,900.0500.912216,32,48,64,800.0500.877320,40,60,80,1000.0500.937426,39,50,76,900.0490.911510,27,55,66,900.0510.912611,38,59,65,830.0490.888727,40,57,73,960.0510.928Wang&Tsiatis test,?=0.25nk;k=1,.,KType I errorPower atAB=1118,36,54,72,900.0500.901216,32,48,64,800.0500.864320,40,60,80,1000.0500.929426,39,50,76,900.0490.901510,27,55,66,900.0520.901611,38,59,65,830.0480.875727,40,57,73,960.0500.919c?2000 by Chapman&Hall/CRCTable 3.2Properties of two-sided tests designed for information levelsIk=(k/K)Imax,k=1,.,K,but applied with Ik=(k/K)rImax,k=1,.,K.Attained Type I error probabilities and power are shown forPocock and OBrien&Fleming tests with K analyses,designed to achieveType I error rate =0.05 and power 1 =0.9 at =.Pocock testsOBrien&FlemingrType I errorPowerType I errorPowerK=20.800.90.0480.8680.0490.8671.00.0480.9010.0490.9001.10.0480.9260.0490.9251.000.90.0500.8670.0500.8671.00.0500.9000.0500.9001.10.0500.9260.0500.9251.250.90.0520.8650.0500.8681.00.0520.8990.0500.9001.10.0520.9250.0500.925K=50.800.90.0470.8670.0480.8661.00.0470.9010.0480.8991.1