C0316_PDF_C06
CHAPTER 6Equivalence Tests6.1 IntroductionSometimes the goal of a clinical trial is to establish equivalence between twotreatments rather than the superiority in efficacy of one over the other.Forexample,if a new therapy is less toxic or less expensive than the standard,itmay not be necessary to prove it is also more effective;instead,it can sufficeto demonstrate that it is equally effective or that it is less effective than thestandard by at most a small amount.Dunnett&Gent(1977)give an exampleof a trial in which an objective was to determine whether certain aspects ofhealth care could be delivered equally well by a system involving a triageprocessbynurse-practitionersasbyconventionalprimaryphysiciancaremethods.Other examples include:lumpectomy versus radical mastectomy for breast cancertreatment,a local rice-based beverage versus WHO oral rehydration solution fordiarrhea in infants in Mexico,and low-dose AZT treatment versus standard doseduring pregnancy and delivery to prevent mother-to-child HIV transmission in adeveloping country.We call these“one-sided equivalence”problems.A“two-sided equivalence”problem can arise in the area of bioequivalencetesting.Here,a pharmaceutical manufacturer hopes to demonstrate that a newpreparation of a drug has the same bioavailability properties as a standard,withina small tolerance limit,as a step toward proving that the new and standardpreparations have equal therapeutic effects.Demonstrating“bioequivalence”inthis way can greatly reduce the amount of experimentation required for approvalof a new drug.Chinchilli(1996)classifies equivalence problems into the following categories:Population equivalence:the responses for both treatments have the sameprobability distribution;Average equivalence:the responses have the same mean;Individual equivalence:the responses are“approximately”the same for alarge proportion of subjects when they receive either treatment.In Sections 6.2 and 6.3 we shall be concerned principally with averagebioequivalence;we consider tests of individual equivalence in Section 6.4.Of course,to conclude equivalence,it is not sufficient that the data fail toreject a null hypothesis of equality.This might simply be due to a lack of powerof the study,and a practically significant difference could still exist despite thelack of statistically significant evidence.Thus,if a hypothesis test of equalityis to be employed as a means of establishing equivalence,this tests Type IIerror probability,representing the probability of wrongly declaring equivalencec?2000 by Chapman&Hall/CRCwhen a practically significant difference exists,must be restricted to an acceptablelevel.An alternative approach is to test a specific hypothesis of non-equality or“inequivalence”and conclude equivalence if this hypothesis is rejected.In thiscase,the error requirements have a more familiar form since priority is givento the tests Type I error rate,i.e.,the probability of rejecting the hypothesisof inequivalence when this hypothesis is true.We shall illustrate both theseapproaches:in the following section we adopt the second method,creating ahypothesis of inequivalence in order to test for one-sided equivalence,and inSection 6.3 we take the first approach,basing a test for two-sided equivalence ona test of a null hypothesis of no treatment difference.6.2 One-Sided Tests of EquivalenceLet Aand Bdenote the true means of the primary response variable for a newand standard treatment,respectively,where higher responses are more favorable.Suppose previous studies have shown that the new treatment has less toxic sideeffects than the standard.In view of this known advantage of the new treatment,we can define a hypothesis of inequivalence as HI:A B,where is apositiveconstantandadecreaseinmeanresponseupto isdeemedacceptableforthe new treatment in view of its reduced toxicity.The hypothesis of equivalence isthen HE:AB,and it is appropriate to conduct a one-sided equivalencetest to choose between HIand HE.Note that HEincludes instances where Aisconsiderably greater than Bsince in this one-sided case we intend“equivalence”of the two treatments to mean that the new treatment is at least as good,overall,as the standard.Inessence,thisproblemisthesameastheone-sidedhypothesistestingproblemof Chapter 4,and if we define =AB+,the methods of Chapter 4 can beapplied directly.Then,H0:=0 occurs at the upper limit of HI,AB=,and so the Type I error rate of the test of H0becomes the maximum probability ofwrongly declaring equivalence.The power at =is of particular interest as thisrepresents the probability of declaring equivalence when Aand Bare exactlyequal.Since the new treatment has other advantages over the standard,a highpower to decide in favor of the new treatment is desirable when both treatmentsare equal with regard to the primary response.6.3 Two-Sided Tests of Equivalence:Application to ComparativeBioavailability Studies6.3.1 The Testing ProblemAs mentioned above,two-sided equivalence tests arise most frequently inbioequivalence studies,so we shall frame our discussion here in that context.Westlake(1979)reviews comparative bioavailability trials for testing thebioequivalence of two competing formulations of a drug.In a typical trial,a singledose is administered to a subject and blood samples are drawn at various timesover the next 24,36 or 48 hours.This leads to a sequence of drug concentrationlevels in the blood,or“drug profile”,in which the drug level usually rises to ac?2000 by Chapman&Hall/CRCpeak during the initial absorption phase and then decays.A univariate statisticsummarizing this response might be the“area under the curve”(AUC)or thepeak concentration level(Cmax).It is often reasonable to assume these quantitiesare log-normally distributed.The experimental design may utilize two parallelpatient samples,as in Section 3.1.1 but it is more usual to employ a two-periodcrossover design as described in Section 3.1.4.The crossover design has theadvantage of removing the between-subject component of variability,resultingin a much more sensitive test,and this in turn permits use of smaller samplesizes.A“washout”interval between the administrations of the two drugs in agiven patient should be of sufficient length to eliminate any carry-over effect ofthe drug given first.In some cases it is possible to ascertain the presence of carry-overeffectsbymeasuringconcentrationlevelsduringthewashoutperiod.Anotherpossible complication is the existence of treatment-by-period interactions.Moresophisticated designs have been proposed which need to be analyzed by morecomplex linear models incorporating such carry-over and interaction effects see,for example,Jones&Kenward(1989,Section 1.6).In general,groupsequential versions of these designs can be adapted from those proposed in thissection using techniques for normal linear models described in Section 3.4 andChapter 11.Here we shall confine ourselves to the standard 22 crossover designof Section 3.1.4.Consider the design of a group sequential two-treatment,two-period crossovertrial.For i=1,2,.,let Xidenote the natural logarithm of the ratio ofresponse on treatment A to response on treatment B for the ith subject receivingtreatment A first.Similarly,define Yito be the logarithm of the same ratio for theith subject receiving treatment B first.Response here may be defined as the AUC,Cmax,or other summary feature of the drug profile.The usual model treats theresponses as log-normally distributed withXi N(+,2)and Yi N(,2),i=1,2,.,(6.1)where is the treatment difference and a period effect.In a two-sided test of equivalence,the treatments A and B are to be consideredequivalent if|.Here,should be“determined from the practical aspectsof the problem in such a way that the treatments can be considered for allpractical purposes to be equivalent if their true difference is unlikely to exceed thespecified”(Dunnett&Gent,1977,p.594).Let us impose the error probabilityrequirementsPr=Declare equivalence (6.2)andPr=0Do not declare equivalence .(6.3)Some authors would interchange the symbols and here,but we use the abovechoice for consistency with the notation for two-sided tests in Chapter 5.Theimportant point is that in(6.2)represents the“consumers risk”since wronglydeclaring equivalence may lead to an unsuitable preparation being allowed ontothe market.Values of and must be chosen to satisfy the appropriate regulatoryagency.Recommended choices are often =0.05 and =log(1.25)so thatc?2000 by Chapman&Hall/CRC|implies that,on the antilog scale,the treatment effect ratio lies in the range0.8 to 1.25.The probability in(6.3)is the“manufacturers risk”and,since thebenefits of proving equivalence are so great,one would expect a small value of to be chosen in order to ensure a high probability of a positive conclusion whentwo treatments really are equivalent.6.3.2 Using the Power Family of Inner Wedge TestsIf 2is known,we may directly utilize the techniques of Chapter 5 to design agroup sequential crossover trial to test for two-sided equivalence.The numberof analyses,K,and the shape parameter?of the power family test mustbe chosen.Suppose for now that the study can be organized so that equalnumbers of observations following each treatment sequence accrue betweensuccessive analyses.The factor RW(K,?),taken from Table 5.1,5.2 or 5.3is used to determine the required sample size.ConstantsCW1(K,?)andCW2(K,?)are also read from the table and used to determine critical valuesakand bkfor k=1,.,K.The stopping rule(5.1)is then applied with themodification that the decision“accept H0”is replaced by“declare equivalence”and“reject H0”is replaced by“declare non-equivalence”.As an example,suppose we specify =0.05,=log(1.25)=0.223,?=0 and a balanced design with K=4 equally sized groups.Supposealso that the within-subject coefficient of variation(CV)of the responses onthe original antilog scale is known to be 24%Hauck,Preston&Bois(1997,p.91)suggest this as a“moderate range”value for a CV for AUC.It follows that2 2 0.242=0.115.After nkobservations on each treatment sequence,the estimate(k)=(X(k)+Y(k)/2has variance 2/(2nk),so the information for is Ik=2nk/2.From(3.6),theinformation needed for a fixed sample size test isIf,2=(1.960+1.645)2/0.2232=261.3,which requires 261.32/2=261.30.115/2=15.0 subjects on each treatmentsequence.For our sequential test we multiply If,2byRW(4,0.05,0.05,0)=1.055 to obtain the maximum information level 275.7,and solving 2n4/0.115=275.7 gives the final sample size per treatment sequence n4=15.9.Rounding thisvalue of n4up to 16,we see the study should be designed with four subjects oneach treatment sequence in each of the four groups.The test is implemented byapplying rule(5.1)to the standardized statisticsZk=(k)Ik=(X(k)+Y(k)nk/(22),k=1,.,4,taking acceptance of H0to indicate declaration of equivalence.The critical valuesakand bkare obtained from(5.2)usingCW1(4,0.05,0.05,0)=1.995 andCW2(4,0.05,0.05,0)=1.708 from Table 5.3.With =0.223 this gives(a1,a2,a3,a4)=(1.56,0.21,1.25,2.01)c?2000 by Chapman&Hall/CRCand(b1,b2,b3,b4)=(3.99,2.82,2.30,1.995).Since a1is negative,it is not possible to stop and declare equivalence at the firstanalysis.Due to the rounding of n4to an integer sample size,the values of a4andb4are not exactly equal;although the difference is slight,one could choose to usethe lower value,b4,as the critical value for|Z4|in order to protect against thekey error of wrongly declaring equivalence.(This minor difficulty can be avoidedaltogetherbyfollowingthemethodfordealingwithunequalgroupsizesexplainedin the next section.)The result of applying this test is a noticeable saving inexpected total sample size across the range of values:expected samples sizesare 24.1,26.0 and 21.6 under =0,/2 and,respectively,compared to the30 subjects required by a fixed sample procedure.6.3.3 Adapting to Unequal Group Sizes and Unknown VarianceTo satisfy real practical needs,methods must be able to handle unequal numberson each treatment at any stage in a parallel design or unequal numbers on eachsequence(AB or BA)during a crossover design.Later,in Section 7.2.5,weshall discuss how unequal increments in information can be handled using an“error spending”approach to create two-sided tests with an inner wedge.Here,we describe an approximate approach using power family inner wedge tests.It is also important to deal with normal responses of unknown variance andpower family inner wedge tests can also be adapted to this problem too.Indealing with either unequal increments in information or unknown variance,special modifications are needed to ensure the probability of wrongly declaringequivalence remains within a specified limit.In Chapters 4 and 5 we adaptedpower family tests in a way which preserved the Type I error rate by expressingthe stopping boundary in terms of significance levels under H0:=0.We shallfollow a similar approach,working with significance levels against the hypotheses=and =in order to preserve the power of the original test.Ourdevelopment here parallels that of Jennison&Turnbull(2000a).Suppose we wish to construct a two-sided equivalence test satisfyingPr=Declare equivalence for specified and,using a group sequential design with K analyses based on apower family inner wedge test with shape parameter?.Supposing,for now,that2is known,we can also plan to satisfy a second error conditionPr=0Do not declare equivalence ,noting that this condition will not be met precisely if actual information levelsdiffer from their design values.We take as our starting point the two-sided test of H0:=0 with Type I errorprobability and power 1 at =.Thus,we plan to achieve informationlevelsIk=(k/K)RW(K,?)If,2at analyses 1 to K.If these information levels arise exactly,the test defined byc?2000 by Chapman&Hall/CRCrule(5.1)with akand bk,k=1,.,K,given by(5.2)will have Type I errorprobability and power 1 at =and the equivalence test will haveexactly the desired error probabilities.When observed information levels differ from their planned values,we use thesignificance level approach to maintain properties under =.Consider firstthe case =:since Zk Ikhas a standard normal distribution under =,we can proceed by expressing the boundary as a sequence of critical values forZk Ikin a form which does not explicitly involve the information levelsI1,.,IK.However,we must simultaneously treat the case =.Fortunately,a typical test generates little probability of crossing the lower boundary arms if=or the upper arms if =,and there is no serious loss of accuracyin defining the upper section by considering =and the lower section byconsidering =.To simplify notation,we shall omit the arguments ofCW1andCW2in thefollowing derivation.For =,the upper part of the stopping boundary definedby(5.1)with akand bkas specified in(5.2)is based on comparingZkversus?CW1(k/K)?1/2andIkCW2(k/K)?1/2,i.e.,comparingZk Ikversus?Ik+CW1(k/K)?1/2andCW2(k/K)?1/2,at each analysis k=1,.,K.Under the planned sequence of information levels,Ik=(k/K)CW1+CW22/2and we can wr