Identification
of
unknown
target
genes
using
ChIP1
ChIP
Identification of unknown target genes of human transcriptionfactors using chromatin immunoprecipitationAmy S.Weinmann and Peggy J.Farnham*McArdle Laboratory for Cancer Research,University of Wisconsin Medical School,1400 University Avenue,Madison,WI 53706,USAAccepted 15 January 2002AbstractThe standard chromatin immunoprecipitation(ChIP)assay is used to examine the specific association of transcription factorswith DNA in the context of living cells.Here we review two modifications to this protocol which are designed to identify novel targetgenes of transcription factors in mammalian cells.The main advantage to both of these approaches is that only DNA sequencesdirectly bound by a factor within the context of a living cell will be identified.Therefore,artifacts associated with overexpressionand/or alterations in signaling pathways are avoided.The first modification we describe,a ChIP cloning strategy,can be used toisolate any genomic fragment specifically associated with a particular factor.It requires no special equipment or reagents other thana high-affinity antibody to be used for immunoprecipitation of the factor of interest.However,it is most useful for the isolation of asmall number of genomic targets.In contrast,the second modification,which combines ChIP with specialized CpG microarrays,isideal for a more global analysis of target genes.Advantages,common problems,and detailed protocols for these two ChIP tech-niques are discussed.?2002 Elsevier Science(USA).All rights reserved.Keywords:Chromatin immunoprecipitation;Transcription factors;Target genes;CpG microarrays;Shotgun cloning;E2F1.IntroductionAs the sequencing of the human genome nears com-pletion,the challenge that faces the scientific communityis to decipher the underlying meaning behind theseprecisely ordered nucleotides.One portion of thissequence will provide the genetic information to createthe large number of proteins required for maintainingthe critical functions for diverse cell types.In addition,asignificant fraction of the genome will provide theinformation required to direct the precise timing andpattern of expression for these proteins.The preciseregulation of gene expression deserves great attentionbecause the inappropriate expression of a single geneproduct can result in dramatic consequences which caninclude uncontrolled cellular proliferation leading tocancer.New strategies focused on understanding gene regu-lation are being developed to exploit the vast amount ofinformation now available from the human genomesequencing efforts.Computer-assisted genome inspec-tion strategies are being used to predict genes and reg-ulatory regions.Although these approaches can providevaluable information,it is important to remember thatprimary DNA sequence provides only a small fractionof the information actually contained within the nuclearenvironment.Recent advances have highlighted thecritical role that the chromatin environment plays inregulating gene expression.The modification of histoneproteins can provide information to target transcriptionfactors to specific regions of DNA 1.For example,thespecific acetylation or methylation of the core histoneproteins can influence transcription factor binding.It iswidely believed that hyperacetylated regions of thegenome are more accessible to protein binding thanhypoacetylated regions.Therefore,the same primaryDNA sequence can be recognized and bound byproteins in one case(i.e.,hyperacetylated nucleosomes)whereas in the opposing case(i.e.,hypoacetylatednucleosomes)it is unavailable for protein recognition.This simple example makes it clear that to understandthe regulatory regions contained within the human*Corresponding author.Fax:(608)262-2824.E-mail address:farnhamoncology.wisc.edu(P.J.Farnham).Methods 26(2002)1046-2023/02/$-see front matter?2002 Elsevier Science(USA).All rights reserved.PII:S1046-2023(02)00006-3genome,strategies to examine transcription factorDNA interactions within the context of living cells willprovide the most accurate information possible.It has been predicted that at least 2000 transcriptionalactivators are encoded by the human genome 2.Tomake use of this information,it is now important todetermine the sets of genes regulated by each of thesefactors.A common approach used to identify the targetgenes that are regulated by an individual factor is tocouple the overexpression or underexpression of thatfactor to cDNA microarray analysis 3.Although thisapproach allows for the isolation of a large set of po-tential target genes,the data need to be interpreted withcaution for several reasons.First,the genes identifiedmay not be direct target genes of the overexpressedfactor,but instead may be isolated as the result ofindirect regulation due to overall alterations of geneexpression patterns.In addition,it is unclear that thegenes regulated by a factor at levels vastly greater thannormal,biologically relevant concentrations are in facttrue target genes.Therefore,our studies have focused onthe development of new approaches designed to examinethe direct targets of a site-specific transcription factor inthe context of physiologically relevant conditions.In this article,we describe the development of twomethods designed to identify direct target genes ofmammalian transcription factors.Each method is basedon the chromatin immunoprecipitation(ChIP)proce-dure,which allows for an examination of proteinDNAinteractions in the context of living cells.Briefly,in thestandard ChIP procedure,cells are treated with form-aldehyde to crosslink proteins that are in close associa-tion with DNA,and as the procedure proceeds,specificproteinDNA complexes are isolated by immunopre-cipitation.Following reversal of the crosslinks andpurification of the DNA specifically associated with theprotein of interest,specific DNA sequences can beexamined by PCR with gene-specific primers.Therefore,when using the standard ChIP procedure,one must firstsuspect that a promoter might be bound by the tran-scription factor of interest to be able to design primersto a specific DNA sequence.This approach is of greatuse when confirming that a protein is bound to a genepreviously characterized by other means,such as amutational analysis of a promoter.However,this stan-dard ChIP protocol cannot be used to identify unknowntarget promoters associated with a given factor.To modify the ChIP procedure for the isolation ofnovel target sites,a method needed to be developed toexamine DNA sequences specifically precipitated withan antibody to a desired protein with no prior knowl-edge of its target genes.For this means,we havedeveloped two separate procedures:the first allows forthe isolation of individual target genes 4 and the sec-ond provides a more global approach 5.Both strategieshave been designed to identify target genes that aredirectly bound by the factor of interest in the context ofthe natural cellular environment.2.Description of methods2.1.ChIP cloningOur first modification to the chromatin immunopre-cipitation procedure was designed to clone individualpromoter or enhancer fragments bound by a humantranscription factor(Fig.1).Although gene-specificprimers are commonly used to analyze the precipitatedchromatin,the precipitated samples contain a largesubset of the genomic fragments bound by a given fac-tor.Therefore,we reasoned that the preparation of aplasmid library containing the precipitated fragmentswould allow for the identification of novel binding sites.However,several changes to the standard chromatinimmunoprecipitationprotocolwererequiredwhenmoving from a primer-specific analysis to a shotguncloning strategy.The first problem that needed to be solved waseliminating as much of the nonspecific DNA in theimmunoprecipitation reaction as possible.In the stan-dard ChIP procedure highly abundant repeat regions ofthe genome are precipitated nonspecifically,as illus-trated by their presence in immunoprecipitation reac-tions that do not contain an antibody.This is generallynot a problem because gene-specific primers are used toamplify by polymerase chain reaction the desired target,and repetitive DNA elements will have no effect on thisanalysis.However,in the cloning procedure,nonspecificDNA will have an equal chance of being cloned.Therefore,we first performed two sequential immuno-precipitations with aliquots of the same transcriptionfactor-specific antibody in an attempt to decrease theamount of nonspecific DNA.It is important to confirmthat the second immunoprecipitation was successfulbefore proceeding with the cloning procedure.We havefound that some antibodies are unable to efficientlyrecognize protein complexes following elution and red-ilution.It is possible that some proteins do not renatureappropriately for antibody recognition.Therefore,thisstep must be closely monitored prior to attempting thecloning portion of the procedure.The amount of DNA following the second immuno-precipitationstepisverysmall.Somechromatinimmunoprecipitation cloning protocols modified for usein the yeast system have used PCR amplification steps toincrease the amount of DNA available for cloning 6.We have chosen not to perform a PCR amplificationstep to avoid cloning only easily amplified sequences.Ingeneral,it is very difficult to amplify sequences with ahigh GC content and a significant percentage of mam-malian promoter regions are GC-rich 7.Therefore,a38A.S.Weinmann,P.J.Farnham/Methods 26(2002)3747PCR amplification step may preferentially amplifyAT-rich,nonpromoter sequences,ultimately creating afalse abundance of these sequences in the cloning pool.To avoid this potential bias,we feel it best not to includea PCR amplification step.Therefore,to compensate forthe low yield after the second immunoprecipitation step,several identical,parallel immunoprecipitation reactionsare performed and then pooled following the DNApurification steps.Our next alteration to the ChIP procedure was tomodify the immunoprecipitated DNA fragments tobecome competent for cloning.During the sonicationstep,the DNA is sheared which creates random over-hangs at the 50and 30ends.These overhangs need to bemodified for efficient cloning.One potential method is todigest the DNA with a restriction enzyme.However,restriction enzyme digestion will exclude from cloningany DNA fragment that does not contain two sites forthat enzyme that flank the binding site of interest.Toavoid this problem,we used T4 DNA polymerase tocreate blunt-ended DNA fragments that could then becloned into a blunted vector for further characterization.Another consideration in adapting the standard ChIPprotocol for cloning target genes is the size of thechromatin.In a standard ChIP experiment,it is desirableto shear the chromatin to a relatively small size to ensuremonitoring of only the binding sites located in closeproximityto thegene-specific primers.In ourinitialChIPcloning experiments,we found that the majority of DNAfragments cloned were very small in size(200300 bp).When these fragments were sequenced,the vast majoritycorresponded to AT-rich repetitive sequences.Thevalidity of these sequences was tested in standard ChIPexperiments and they were shown to be nonspecificallyprecipitated(as determined by having a high signal inthe no antibody reaction(e.g.,see Fig.2A;nonspecific).Therefore,it appeared that the small fragments weremost likely due to nonspecific precipitation of highlyrepetitive elements.Although a large fraction of thenonspecific DNA was removed in the second immuno-precipition step,it is very difficult to completely elimi-nate it.In an attempt to more efficiently distinguishspecific from nonspecific clones,we prepared largerchromatin(12 kb)and then analyzed only cloned DNAfragments of at least 500 bp.Using this criterion as aninitial screen in an E2F ChIP cloning experiment,11 of14 genomic fragments were confirmed to be specificallybound by E2F family members in living cells 4.Thus,size selection appears to aid in the screening procedure.It is extremely important to validate each clone ob-tained in the ChIP cloning method.The first step tovalidate the clones is to perform independent,standardChIP experiments using clone-specific primers.Thisanalysis will eliminate false positives which can be iso-lated by random chance or due to nonspecific precipi-tation(Fig.2A).As noted above,it is impossible tocompletely eliminate all repetitive elements and somefalse positives will inevitably be due to the isolation ofthese elements.Despite these shortcomings,the additionof the second immunoprecipitation step and the selec-Fig.1.Chromatin immunoprecipitation cloning schematic.Flowchart outlining the uses for the chromatin immunoprecipitation assay including thestandard ChIP protocol and the two modifications designed to identify novel target genes.A.S.Weinmann,P.J.Farnham/Methods 26(2002)374739tion of large cloned inserts did allow for the successfulisolation of genomic fragments specifically bound bymembers of the E2F transcription factor family in livingcells 4.The confirmed positives in these experimentswere of two types:low-and high-level binding(see Fig.2B).We define high-level binding as those clones thathave a specific IP signal intensity greater than or equalto that of a standardized aliquot(0.2%)of the total.Low-level binding is defined as a clone that has areproducible signal in the specific IP sample greater thanthat of the no antibody control but less than that of thestandardized aliquot of the total.In addition,we foundthat 3 of the 11 confirmed genomic fragments mapped topromoter regions 4.Thus,we were able to successfullyisolate promoters specifically bound by E2F familymembers using the ChIP cloning method.Although theisolation of novel transcription factor targets wasaccomplished,it became clear that the strategy used waslimited to the identification of only a small number oftarget genes because of the difficulty encountered in thescreening procedure.If a more global analysis of tran-scription factor target genes is the goal of an analysis,anew method for screening the isolated targets needed tobe developed.2.2.ChIP-CpG microarrayThemostpromisinghigh-throughputscreeningmethod is a microarray-based approach.Recently,thecouplingofchromatinimmunoprecipitationwithgenomic microarray analysis has been successfully per-formed in the yeast system to uncover a significantnumber of genomic sites directly bound by specifictranscription factors in the context of living cells 6,8,9.However,a similar analysis in mammalian cells is moredifficult due to the lack of a comparable genomicmicroarray because of the vastly greater size associatedwith mammalian genomes.Therefore,to perform asimilar analysis in mammalian cells,a suitable genomicmicroarray first needed to be determined.Most commercially available microarrays containcDNA sequences.Therefore,if the goal is to isolateregulatory regions(i.e.,promoters or enhancers)boundby a specific factor,a cDNA microarray will not beuseful.To circumvent this proble