Computational tools for genome editing using CRISPR single guide RNAs.
Instructors: Sergey Prykhozhij and Vinothkumar Rajan
Link to the workshop presentation with exercise comments and answers
I. Targeting sgRNAs to the genome and gene inactivation
A. General aspects relevant to sgRNA targeting
1. Parameters for sgRNA target site searches
As you learned from the presentation, searches for sgRNA target sites require input sequence and specification of the Protospacer Associated Motif (PAM) sequence as well as the 5’-dinucleotide. However, most current applications use Cas9 so if a program does not give you a choice, you can simply assume that PAM sequence is “NGG” and the 5’-dinucleotide is “NN”. However, you often want to limit yourself to sgRNA sites beginning with “GG” because they will ensure perfect binding of T7-synthesized sgRNAs and you may also want to change your PAM sequence if you start using a different CRISPR system. Another parameter is the length of a sgRNA site.
Exercise 1:
Visit the following websites to see implicit sgRNA site definition:
To see more explicit approaches, please visit these websites:
You can clearly see the differences in approaches and it can be very beneficial to have more flexibility in specifying how a program should search.
Visit the following websites to see implicit sgRNA site definition:
To see more explicit approaches, please visit these websites:
You can clearly see the differences in approaches and it can be very beneficial to have more flexibility in specifying how a program should search.
2. Checking for possible off-targets
Expression of wild-type Cas9 with sgRNAs can result in cleavage of the intended and unintended (off-target) sites. Off-target sites tend to be very similar to the intended site. One can completely avoid off-target sites by generating a paired design targeting using Cas9 D10A (nickase) or dCas9-FokI constructs. We will do nickase targeting design in a later exercise. In case you are not planning or unable to use the nickase strategy, it may be very helpful to know how to predict the propensity of your newly identified sgRNAs to induce off-target cleavage by Cas9. Let us try this in the following exercise.
Exercise 2:
2.1
You can now either look at the CHOPCHOP webpage that you have just generated or at the following screenshot on the current webpage. The last column of the sgRNA table contains information on the off-target sgRNA sites. The numbers in the header cells indicate the number of mismatches compared to the current site and the numbers in the data cells are the actual numbers of mismatched sgRNA target sites. You can clearly see which sgRNAs are more specific to your sequence and choose some of them for testing.
2.2
2.1
The easiest way to assess the off-target potential of individual sgRNAs is to use a piece of software which has already implemented an off-target checking algorithm. CHOPCHOP is an example of a program which has such a functionality built in. Please go to the website CHOPCHOP and enter either your favourite gene symbol or nr6a1b. You can also use arbitrary genomic ranges and Refseq accession numbers.
You can now either look at the CHOPCHOP webpage that you have just generated or at the following screenshot on the current webpage. The last column of the sgRNA table contains information on the off-target sgRNA sites. The numbers in the header cells indicate the number of mismatches compared to the current site and the numbers in the data cells are the actual numbers of mismatched sgRNA target sites. You can clearly see which sgRNAs are more specific to your sequence and choose some of them for testing.
2.2
The second possibility is that you have a bunch of sgRNAs previously designed using another program or with a special workflow so it is not feasible to fully replicate this design in CHOPCHOP or a similar program.
What can one do in this situation? The answer is that you can use software designed to check for off-target sites for existing sgRNAs. To give you some practice doing this, you can follow this simple workflow:
- Open this text file with the already designed sgRNAs for gabrr1 gene and copy it to the computer memory.
- Go to Cas-OFFinder and paste your previously copied sequences. You also need to select the genome (Danio rerio in this case).
- Set the mismatch number initially to 2 and observe the number of mismatch instances the program returns. The last column is the number of mismatches. If it is 0, it is simply the original target site.
- Increase the mismatch number to 3 or 4 and observe an increase in number of mismatch instances. The output of the program is somewhat inconvenient because it does not allow one to figure out which sgRNAs are highly specific and does not tabulate mismatch statistics for each original sgRNA.
- The solution to this problem is that either you input very few sgRNA sequences or generate some simple script that can tabulate mismatch instances for each sgRNA.
3. Predicting sgRNAs efficacy
This is still one of the trickier problems in CRISPR/Cas9 technology – how to predict which sgRNAs are really worth making? This problem is still largely unsolved. There are two main ways to approach this:
- You just need to try more sgRNAs to get something working, but it may be very expensive for large projects or when viewed at a global scale. Since many of us are already doing this, this deserves no further attention.
- Statistics on large-scale quantitative readouts can help generate a scoring function or at least to identify some trends which may help in designing successful sgRNAs. We will now look at several examples of trends in sgRNA sequences:
- Adenines in the first 2 positions of sgRNA lower its efficacy up to several fold (Gagnon et al., 2014)
- When the last 2 nucleotides are GG, efficacy is much higher compared to sgRNAs shifted to 5’ by 3 nucleotides (Farboud & Meyer, 2015)
- T in the last 4 nucleotides reduces sgRNA efficacy, C either leads to a small reduction or improves efficacy, A is relatively neutral and G is beneficial to the efficacy. See this figure for an example (Wang et al., 2014).
Exercise 3:
There is currently one publicly available program which attempts to give numerical predictions for the efficacy of an sgRNA based on its sequence as well as its immediate neighbourhood. It was created by John Doench and co-workers (Doench et al., 2014). It is implemented in the sgRNA Designer website
Please go to sgRNA Designer and paste the contents of this file or any other DNA sequence into the input form of this website.
Download the results file, open it with Notepad or another program and look at the last column of the table. It contains scores for individual sgRNAs. These scores had some predictive value in cell culture model. Their value in zebrafish has not been firmly established but together with other identified trends, the scores may help you identify better sgRNAs.
B. Specific applications of CRISPR/Cas9 for gene inactivation
1. Generation of frameshift mutations using one sgRNA with wild-type Cas9 or using paired sgRNAs with Cas9 (nickase)
The start of any CRISPR/Cas9 mutational or knock-in strategy is identification of target sites. In this section we will only focus on the actual design of sgRNAs and will leave out other considerations such off-targets, scoring and design of PCR assays for checking efficiency of targeting.
Exercise 4:
This is a very simple exercise to give you experience with the basic procedure of sgRNA design and some variations of the design parameters.
This is a very simple exercise to give you experience with the basic procedure of sgRNA design and some variations of the design parameters.
- Copy the sequence of the smad2 cDNA from this file, go to the CRISPR MultiTargeter page and paste the sequence into the input text area. Without changing any parameters, go to the bottom of the page and press “Submit”.
- After getting the results page, please feel free to press all of the links to open up the graphical overview and tables with the target site descriptions.
- Now let’s change the 5’-dinucleotide to “GG” and run the program with the same parameters. You can observe that the number of sgRNA target sites decreased to 28.
- Let’s go back to “NN” and change the type of sgRNA design to “Two single-strand nicks by Nickase” and run the program again. Now you can see that you got several hundred possible pairs. The number of paired design combinations produces such a large number of designs. To limit this number, you can either submit a smaller sequence fragment or constrain the spacing between the target sites depending on your opinion about how best to position the individual target sites.
2. Region excision: small-scale (100s nt – several kb ) and large-scale (chromosomal deletions)
In many cases, introducing a small deletion or insertion into an exon of a gene is not sufficient to inactivate it. This can be the case for many non-coding RNAs or protein-coding genes with alternative start sites and many transcript isoforms as well as regulatory regions which span hundreds of nucleotides to many kilobases in the genome. In such cases, it is possible to generate inactivating mutations by deleting the targeted genomic regions via simultaneously inducing double-strand breaks (DSB) at 2 distant locations. This has been recently achieved in zebrafish using TALENs and CRISPR/Cas9 techniques (Xiao et al., 2013).
Exercise 5:
In this exercise, we will simulate generation of a deletion mutant of xbp gene.
Let us say that we have designed sgRNAs targeting exon 1 and 4 for this gene and we also have corresponding primers for detection. Please go to Ensembl, take a look at the exons and introns of this gene
ENSDARG00000035622
After checking the following diagram, please do a calculation related to a design of a deletion generation experiment.
What is the expected length of the PCR product if targeting works (insertions and small deletions can be disregarded)?
In this exercise, we will simulate generation of a deletion mutant of xbp gene.
Let us say that we have designed sgRNAs targeting exon 1 and 4 for this gene and we also have corresponding primers for detection. Please go to Ensembl, take a look at the exons and introns of this gene
ENSDARG00000035622
After checking the following diagram, please do a calculation related to a design of a deletion generation experiment.
What is the expected length of the PCR product if targeting works (insertions and small deletions can be disregarded)?
3. Inactivating duplicated genes with single sgRNAs and specific targeting of transcript isoforms
This section can be best covered by an exercise and is described in my paper (Prykhozhij et al., 2015).
Exercise 6:
6.1
We are now going to look at two special cases of sgRNA design. In particular, we will design sgRNAs which can match both copies of duplicated gene pairs. Below you can see an illustration of the sgRNA design principle for this workflow:
Example of how this can be done is provided on the demo page of the corresponding workflow of CRISPR MultiTargeter. You simply need to go to the following page and press “Submit” at the bottom: http://multicrispr.net/multigene_input_demo.html
6.2
Another application of CRISPR MultiTargeter is to design transcript isoform-specific sgRNAs. The concept of such a design is illustrated in this figure:
For a demonstration of transcript isoform-specific design, please go to the following page and press “Submit” at the bottom: http://multicrispr.net/transcripts_demo.html.
6.1
We are now going to look at two special cases of sgRNA design. In particular, we will design sgRNAs which can match both copies of duplicated gene pairs. Below you can see an illustration of the sgRNA design principle for this workflow:
Example of how this can be done is provided on the demo page of the corresponding workflow of CRISPR MultiTargeter. You simply need to go to the following page and press “Submit” at the bottom: http://multicrispr.net/multigene_input_demo.html
6.2
Another application of CRISPR MultiTargeter is to design transcript isoform-specific sgRNAs. The concept of such a design is illustrated in this figure:
For a demonstration of transcript isoform-specific design, please go to the following page and press “Submit” at the bottom: http://multicrispr.net/transcripts_demo.html.
4. Insertion of Stop-codon single-stranded DNA cassettes
To introduce the next section on defined mutations, we will look at a method of inactivating genes using combination of cleavage of DNA and homology-directed repair. It has been noticed that some sgRNAs fail to induce mutations capable of completely inactivating their target genes (Gagnon et al., 2014). The authors of this study proposed that this can be corrected by including an oligo with the stop codons and short homology arms into the injection mix. This idea is illustrated in the following figure from this paper:
II. Insertion of defined mutations
1. Finding sgRNA sites near the desired mutation siteThe motivation for inserting a defined mutation into the genome of a model system species or a cell line is relatively straightforward: you want to model the condition caused by this mutation as closely as possible to the original situation. Despite the apparent simplicity, it is still much harder to engineer defined mutations in the genome of model species than to simply generate a frameshift mutation. We will now look at what it takes to insert a precise mutation into the genome.
The first step is to find sgRNA sites as close as possible to the defined mutations. You can either find single sgRNAs or paired sgRNAs. The paired designs in this context provide both better specificity and strongly improved precision of targeting. In the remainder of this section, we will look at designing a strategy to insert defined mutations into the pycr1b zebrafish gene corresponding to the human PYCR2 mutations recently published (Nakayama et al., 2015).
Exercise 7:
In this exercise we will look at how to design sgRNAs near the R119C mutation in zebrafish pycr1b homologous to human PYCR2 identified in the paper mentioned above. The corresponding Arg residue in the zebrafish protein is conserved and therefore, the aim will be to generate Pycr1b R119C mutant. But first, let us look at designing sgRNAs in the neighbourhood of 50 nt of the mutation. Below is the exon sequence encoding R119 residue (the codon is labeled in red):
AAGCTGCTGCAGTACCGTGAGTCTCCTAAAGTGATGCGATGCATGACGAACACCCCGGTGGTGGTGCGCGAGGGGGCGACGGTGTACGCCACAGGCACACACGCACATCTGGAGGACGGCAAACTGCTGGAGCAGCTGATGGCCAGCGTGGGCTTCTGCACCGAGGTGGAGGAGGACCTGATCGATGCCGTCACTGGACTCAGCGGCAGCGGACCCGCATAT
- Please copy the above exon sequence and go to http://multicrispr.net/basic_input.html
- Run the algorithm for the individual sgRNA design. In the tutorial web page, copy 3 codons starting from the labeled residue and search on the CRISPR MultiTargeter output page to visualize the location of the mutation site relative to the sgRNA sites.
- You can also run the nickase algorithm on the same sequence by toggling the design type. Also locate the codon to be mutated. For the nickase designs, you can select designs that overlap the target mutation.
2. Choosing the strategy
To insert a defined mutation into a gene, we need some kind of DNA to enable efficient homology-directed repair. The most popular option for point mutations is to use relatively short (100-120 nt) single-stranded oligo DNA nucleotides (ssODN) containing the intended mutation and additional silent mutations capable of inactivating sgRNA sites. In practice this strategy works better if you can design an efficient sgRNA pair for targeting using Cas9 nickase because using one sgRNA with wild-type Cas9 results in most insertions containing additional indels at the sgRNA target site. The second option is to use a larger (1 kb) fragment with the intended defined mutation in the middle and with additional silent mutations as applicable. The advantage of this strategy is that it will work even if sgRNA target sites are further than 50 nucleotides from the defined mutation and it can work well even with single sgRNA designs. One interesting peculiarity is that this strategy seems to greatly benefit from excision of the targeting construct from its plasmid backbone using the same CRISPR/Cas9 technology inside the cell. Please see this figure for an example (Irion et al., 2015).
In our Pycr1b R119C example you probably noticed that it may be possible to use the ssODN strategy. However, you cannot make this decision based only on the bioinformatics analysis, but rather you should select several most promising sgRNAs in the proximity of the mutation site and test them either with wild-type or nickase Cas9 and decide which ones are sufficiently potent to support your targeting experiment. Only then can you go ahead and complete your strategy choice and design the construct.
3. Defining additional mutations in the targeting oligo/vector
A final consideration is to make sure that you can design silent mutations to inactivate sgRNA sites. The best site for sgRNA site inactivation is the PAM sequence. To make this step more obvious, we will do an exercise related to Pycr1b R119C mutation insertion.
Exercise 8:
Let us imagine that we did some testing of sgRNAs identified in the previous step and found a pair of efficient sgRNAs for nickase targeting strategy. Here is our preliminary ssODN with the mutation introduced but still without the silent mutations introduced. Since this ssODN will be made chemically, you have the absolute freedom within the constraints of the genetic code to make any modifications that will satisfy your purposes.
ssODN with R119C mutation (intron sequence is in lower case, sgRNA sites are in magenta and the mutation is in red)
Genetic code:
Look at the mutation site and sgRNA sites above. Keep in mind that the first sgRNA site is in the reverse orientation which means that the PAM sequence is at 5’ and has a “CCG” sequence.
4. Strategy implementation: expected efficiency, fidelity of mutagenesis and genotyping
Once you introduced all your mutations into a ssODN or a vector, you can start your targeting experiment. These are several considerations about the most important features of these targeting experiments.
Efficiency factors:
- sgRNA efficacy (percentage of DNA sites cut).
- For targeting using ssODNs, the actual lengths of homology arms depend on the sgRNAs chosen and strongly influence the efficiency.
- For vector-based targeting, excision of the targeting construct in vivo seems to have the strongest influence. The length of homology arms must also play a role but might be secondary since there was a report of successful knock-ins with just 40 bp of homology when using the excision technique (Hisano et al., 2015).
Fidelity:
Both nickase-based targeting with ssODNs and vector-based targeting with any CRISPR/Cas9 genomic targeting have been reported not to introduce any additional mutations. However, ssODNs with wild-type Cas9 generally lead to indel mutations in a high percentage of clones. However, about 20 % of HDR-repaired clones were still correct, meaning that it may still be possible to screen for the correct clones.
Genotyping:
It is one of the trickier aspects of this kind of targeting because of the small size of the introduced changes. There are two potential strategies to successfully genotype the defined mutations:
- Make a specific PCR assay which will work only when the mutation is present. This requires that one of the primers ends at the mutation site. You would also need to make sure that your PCR product can only be amplified from the successfully targeted genome and not from the targeting construct.
- Check if your intended mutation or silent mutations abolish existing or introduce new restriction sites. If they do not affect any sites, you can introduce such a mutation elsewhere yourself. You can use such an affected restriction enzyme to perform your genotyping after amplifying the desired genomic region by PCR. Make sure that your PCR product is specific to the targeted genome.
III. Insertion (knock-in) of additional gene parts
Homology-independent knock-ins:
- Exons
- Promoters
- Introns
Homologous recombination (HR) knock-ins:
- Considerations for sgRNA site(s) selection and its mutation inside the targeting construct, size of the homology arms, detection of knock-ins
- Efficiency and ways to improve it
Due to limited time, these topics will be covered mainly in the workshop presentation rather than in the exercises. We will first examine recent papers on development of new CRISPR/Cas9-based knock-in techniques. The only exercise I would like to provide is on how to use the E-CRISP website for designing sgRNAs for N- or C-terminal tagging using the HR knock-in technique shown in the last slide.
Exercise 9:
The possibility to label endogenous proteins with fluorescent proteins or other tags such as affinity tags has been an important tool in many model systems. It has not been directly available in zebrafish until recently. In this exercise we will be looking for sgRNAs which could help us perform N- or C-terminal tagging of the zebrafish six6a gene.
- Go to the E-CRISP web page: http://www.e-crisp.org/E-CRISP/designcrispr.html.
- Select “Danio rerio (Zebrafish, Zv9.77)” in the “1. Select organism:” section.
- Enter “six6a” in “Search by gene symbol”, press “SEARCH” button and then click on the link that appears.
- In the “3. Start an application.” select the medium stringency for your search.
- Press “Display advanced options” button under the previous section.
- In the “5. Design purpose:” select “C-terminal tagging” and click a small tick box above.
- Scroll to the end of the page and press “Start sgRNA search” button.
At the time of writing, this search identified 20 sgRNAs inside the last exon of six6a. Please use this software on your own favourite genes.
References:
- Efficient mutagenesis by Cas9 protein-mediated oligonucleotide insertion and large-scale assessment of single-guide RNAs.Gagnon JA, Valen E, Thyme SB, Huang P, Ahkmetova L, Pauli A, Montague TG, Zimmerman S, Richter C, Schier AF. PLoS One. 2014 May 29;9(5):e98186.
- Dramatic enhancement of genome editing by CRISPR/Cas9 through improved guide RNA design. Farboud B, Meyer BJ. Genetics. 2015 Apr;199(4):959-71.
- Genetic screens in human cells using the CRISPR-Cas9 system. Wang T, Wei JJ, Sabatini DM, Lander ES.Science. 2014 Jan 3;343(6166):80-4.
- Rational design of highly active sgRNAs for CRISPR-Cas9-mediated gene inactivation. Doench JG, Hartenian E, Graham DB, Tothova Z, Hegde M, Smith I, Sullender M, Ebert BL, Xavier RJ, Root DE. Nat Biotechnol. 2014 Dec;32(12):1262-7.
- Chromosomal deletions and inversions mediated by TALENs and CRISPR/Cas in zebrafish. Xiao A, Wang Z, Hu Y, Wu Y, Luo Z, Yang Z, Zu Y, Li W, Huang P, Tong X, Zhu Z, Lin S, Zhang B. Nucleic Acids Res. 2013 Aug;41(14):e141.
- CRISPR MultiTargeter: a web tool to find common and unique CRISPR single guide RNA targets in a set of similar sequences. Prykhozhij SV, Rajan V, Gaston D, Berman JN. PLoS One. 2015 Mar 5;10(3):e0119372.
- Mutations in PYCR2, Encoding Pyrroline-5-Carboxylate Reductase 2, Cause Microcephaly and Hypomyelination. Nakayama T, Al-Maawali A, El-Quessny M, Rajab A, Khalil S, Stoler JM, Tan WH, Nasir R, Schmitz-Abe K, Hill RS, Partlow JN, Al-Saffar M, Servattalab S, LaCoursiere CM, Tambunan DE, Coulter ME, Elhosary PC, Gorski G, Barkovich AJ, Markianos K, Poduri A, Mochida GH. Am J Hum Genet. 2015 May 7;96(5):709-19.
- Precise and efficient genome editing in zebrafish using the CRISPR/Cas9 system. Irion U, Krauss J, Nüsslein-Volhard C. Development. 2014 Dec;141(24):4827-30.







