Troubleshooting Your Data

As mentioned in other sections of this website, the two most common causes for failure to get good or any sequence data for your samples are purity and concentration of your template DNA. If you are having trouble getting good sequencing results for your samples, you may first want to look through our Sequencing Basics section for some recommendations on template preparation and quantitation. If it appears that you have done everything correctly and followed our suggestions, then look below for some additional reasons why you might obtain less than optimal DNA sequence data quality. We’ve listed various causes, solutions and, for some, pictorial representations of what these specific problems might look like. Many causes and solutions may look rather obvious and just involve common sense, but you’d be surprised how many times we’ve heard "how could I have done that?"...

No Sequence Data

Cause: priming site not present
Solutions:  if you’ve chosen one of the sequencing facility’s vector primers, make sure it is present in your vector. While many of the primers we provide are quite common to many different vectors (e.g. T7, M13-48R), others are specific to one particular type (e.g. GL primer 1 can be used with the pGL2 vector but not the pGL3 vector). Doublecheck your plasmid maps/sequences.

- if you’ve designed your own custom primer from previous sequence data, make sure you were using a reliable area of sequence - look for sharp, well-defined peaks with no ambiguity. Avoid areas where the peaks are broader and not well separated - this will occur towards the end of the sequence where the fragments are larger and the polymer cannot adequately resolve single nucleotides, causing inaccurate basecalling.

Cause:  not enough or no DNA/primer in tube
Solutions:  doublecheck your quantitations, stock concentrations and dilutions. Check our What kinds of DNA can we sequence and how much do we need? section to make sure you’ve provided the appropriate amount of DNA and/or primer. While our sequencers are very sensitive and can detect a range of DNA concentrations, there is still a "threshold" amount that must be reached to obtain any sequence data.

Cause:  inhibitory contaminant
Solutions:  the cycle sequencing reaction used to amplify samples for automated sequencing is very sensitive to the presence of certain contaminants, some of which will completely inhibit our sequencing enzyme. Please check the Contaminant chart in the Template preparation and purification section for a list of potential inhibitors and the amounts that are tolerable. You may need to reprep your sample to sufficiently remove one or more inhibitory components to obtain any sequence data.

Cause:  expired reagents
Solutions:  falls under common sense category

Noisy Data with Weak Signal

"Noisy" data can be identified by the presence of multiple peaks and numerous "N"s within your sequence. The Sequencing Analysis program assigns an "N’ as a base identification when there are two or more peaks present at one position. This "N" may signify the legitimate occurrence of two nucleotides, as in the case of a heterozygote, but may also be seen when background noise is high or when multiple products are present. When your sample exhibits weak signal, the software attempts to compensate by boosting up the signal of sample bands to detectable levels. However, the background noise will also be artificially amplified, giving a poor signal-to-noise ratio. Background noise appears as many smaller, undefined peaks under your sequence peaks of interest. This noise is always present, but with well-prepared samples of good signal strength, it will be undetectable. To determine if your noisy data may be due to weak signal, look at your ABI trace file. If you are looking at a paper chromatogram, look towards the top and middle of your trace for a line that says "Signal". If the file is on your computer, click the "A" radio button in the bottom left-hand corner, which is visible when you have opened up the trace file within a viewing program, such as EditView or Chromas. Scroll down to the line that says "Signal" and you will see the four nucleotides followed by numbers in parentheses. These numbers represent the average signal strength of each nucleotide and their values should, optimally, be between 200-400. If they are much less than 100, then you can assume your noisy data is at least partially due to its weak signal.

Cause:  not enough DNA
Solutions:  doublecheck your quantitations, stock concentrations, calculations and dilutions. Check our What kinds of DNA can we sequence and how much do we need? section to make sure you’ve provided the appropriate amount of DNA and/or primer.

Cause:  inhibitory contaminant e.g..salts, phenol
Solutions:  the cycle sequencing reaction used to amplify samples for automated sequencing is very sensitive to the presence of certain contaminants, some of which can partially or completely inhibit our sequencing enzyme. Please check the Contaminant chart in the Template preparation and purification section for a list of potential inhibitors and the amounts that are tolerable. You may need to re-purify your sample to sufficiently remove one or more inhibitory components to obtain better sequence data.

Cause: degraded DNA from nucleases, repeated freeze-thaw, excessive UV light exposure, bisulfite treatment.
Solutions:  Nuclease contamination in a template preparation as well as repeated freeze-thaw cycles can degrade DNA over time. Even low amounts of nucleases can extensively degrade DNA depending on storage conditions and temperatures, as well as the length of time the DNA is stored. Generally, re-isolation and purification of the template DNA will be necessary to obtain good DNA sequence. When extracting PCR products from a gel, prolonged exposure to UV light will degrade and nick the DNA. Limit the time and UV intensity as much as possible to prevent degradation. When treating DNA with bisulfite for methylation experiments, it is important to avoid long incubations at higher temperatures as substantial amounts of DNA will be degraded in this process.

Cause:  trend in worsening data?
Solutions:  if you have previously been able to obtain good sequence data but begin to see a deterioration in quality that gets progressively worse, you may have some contamination in one or more reagents, or have some reagents that have reached the end of their usefulness. Make up fresh stocks of commonly used reagents, such as buffers, and always use high quality distilled water in your preparations.

Cause:  inefficient primer binding (low Tm, degenerate primers, mismatch)
Solutions:  the Tm of a primer is defined as the temperature at which 50% of the oligonucleotide and its perfect complement are in duplex. The Tm of an oligo can be roughly calculated by using the formula:

Tm = 2°C(A+T) + 4°C(G+C)

This is the most commonly used formula for calculating Tm, though it is not the most accurate as it does not factor in salt or formamide concentrations. A good website to check out if you are interested in some detailed theory behind Tm calculations is http://www.sigma-genosys.com/oligo_meltingtemp.asp.

In our cycle sequencing reaction, our primer/template annealing step occurs at 50ºC. Thus, if your primer Tm is much lower than 50ºC, hybridization to its complementary template will be much less efficient and a lesser number of extending fragments will be generated. Increase your primer Tm by adding additional bases to the 5’ or 3’ end to raise the Tm to be within the range of 52ºC-58ºC. Degenerate primers and those with mismatched bases will also show decreased hybridization efficiency due to reduction of the stability of primer binding, and if degeneracy or mismatches occur at or near the 3’ end of your primer, it is highly likely that your sequencing attempt will fail.

Multiple Peaks Within Your Sequence

The presence of multiple peaks within your sequence can be caused by numerous factors. To help determine the cause, it can be useful to look at two aspects - where the multiple peaks begin, as well as the overall signal strength of your sample. As mentioned above in the Noisy data with weak signal section, samples with low signal strength can have artificially high background noise that can give the appearance of multiple peaks. However, if your average signal strength numbers are above 100 or so, it’s probably unlikely that background interference is your exclusive problem. We’ve broken down this section into two parts, based on where your multiple peaks begin.

From the beginning

Cause:  multiple priming sites involving vectors
Solution:  your primer may have a secondary hybridization site that may be identical or closely related, with different nucleotide sequences following each site, giving superimposed bands within your sequence. If the priming sites are identical, (such as when more than one T7 promoter site is present, for example), the double peaks will be strong from the outset. The fragments may also show shifted migration so that the double peaks are not directly on top of one another but will be offset to one side or the other due to the differing mobility patterns of the strands with dissimilar nucleotide composition. In other instances, a secondary priming site may not be exactly the same, but may differ by a few internal bases. In this case, the mismatched primer may not hybridize as efficiently but can still anneal and extend, and give rise to less intense fragments that can be seen underneath your peaks of interest. In both cases, it’s necessary to screen both your vector and insert carefully to look for sequences that may match or be similar to your proposed primer. You may need to choose another vector primer on the same end of the multiple cloning site or redesign your custom primer. When choosing another primer is difficult, such as when primer walking through a repetitive area, try to find a primer that has a 3’-base match specific to your area of interest which can help act as an "anchor".

Cause:  multiple priming sites in PCR
Solution:  this may occur when one or both of the PCR primers hybridizes to more than one position on the template DNA, giving rise to multiple PCR products. Often this will be obvious when visualizing the PCR products on an agarose gel as there will be more than one band present. In this case, gel purification of the desired product will be necessary. One can run into difficulty, however, when the products are very similar in size, which may arise when amplifying related or repetitive DNA, and do not separate well on the gel. In this case, optimization of the PCR reaction may be necessary or redesign of the PCR primers in order to choose a more specific priming site.

Cause:  PCR primers acting as both forward and reverse
Solution:  sometimes, a PCR product may be generated when one primer functions as both the forward and reverse primer in the PCR reaction, giving rise to an artifactual product. This is fairly easy to detect when sequencing the PCR product as one primer will give double peaks from the start, while the other fails to give any sequence data. Redesign your set of PCR primers.

Cause:  residual PCR primers and/or dNTPs
Solution:  as two primers are present in the PCR reaction, incomplete removal of these primers can lead to double peaks within the sequencing data. Both primers will act as sequencing primers and lead to superimposed bands which correspond to the complementary strands from opposite orientations. It is critical to remove excess primers and dNTPs from the PCR reaction by purification (look at our Template preparation and purification section for our recommendations on PCR purification). If attempting to do direct sequencing of PCR products without purification by diluting an aliquot of your PCR product with water to lower the concentration of residual primers and dNTPS (a method which we do not recommend), then it is imperative to optimize your PCR reaction so that primers and dNTPS are used in limiting amounts so that most are used up by the end of the PCR.

Cause:  primers with high Tm
Solution:  primers that have a Tm much higher (>65ºC) than our suggested 52ºC-58ºC often do not function well as sequencing primers. When primers have a Tm that high, it is often a result of increased G-C content or because the primer is quite long, both factors that can increase the potential for primer secondary structure formation. If possible, choose another primer with a lower Tm. If that is not optimal, let us know and we can perform a two-step cycle sequencing method that eliminates the lowest temperature 50ºC annealing step and proceeds from the 96ºC denaturing step to the 60ºC extension step. The 60ºC step, in this case, will function as both the annealing and extension step. This can sometimes improve sequencing results.

Cause:  primers with n-1 population

Solution:  this problem is not uncommon and can result from poor quality synthesis of sequencing primers. Primers are synthesized from the 3’end to the 5’end and when synthesis is inefficient, there can be a significant population of less than full-length primers - n-1s, which are full-length primers minus one base, plus other shorter derivatives. These primers have a common 3’end but different 5’ends, thus chains that terminate at the same position will have different lengths and will run at different positions on the gel. Primers that have degraded from the 3’end will also give this appearance. It is easy to spot this problem within the sequencing chromatogram as each position will contain the true peak as well as the peak immediately to the right of it, giving the appearance of "shadow" peaks. Whatever the cause of the n-1s, it will be necessary to resynthesize the primer to obtain an oligo of suitable quality for sequencing. When high-quality reagents and proper protocols are utilized during oligo synthesis, cartridge or HPLC purification of the primers is usually not necessary for typical oligos (<30 bp), but sometimes additional purification can be beneficial.

Begin farther into the sequence

Cause:  mixed plasmid prep
Solution:  a plasmid prep that is contaminated by more than one product, such as two vectors with different inserts or vector with insert and vector without, will generally show an early section of clean sequence data (common vector multiple cloning site sequence) followed by double peaks. Occasionally, a plasmid may contain more than one vector molecule or may encounter spontaneous deletions or insertions during growth. The point at which the double peaks begin corresponds to the start of the insert cloning site. To avoid this problem, it’s important to carefully pick a single colony from your growth plate, restreaking if necessary, to be sure that your colony is completely clonal. You should follow this up with a restriction digest of your plasmid run out on an agarose gel to ensure vector and insert are present as expected.

Cause:  homopolymeric regions

Solution:  regions that contain long stretches of a single nucleotide can be difficult to sequence through accurately. Short stretches of homopolymeric regions are generally not difficult to get through, but longer sections can be challenging. Sequence data up to and including the polynucleotide region may be fine, but the last base of the poly region and all peaks following it may show a wave-like, stuttering pattern of double peaks that cannot be interpreted. This tends to be more problematic in PCR products, but can also occur when sequencing plasmids, especially when trying to sequence the polyA region of cDNA. This difficulty is thought to arise due to enzyme "slippage" when the growing strand does not stay paired correctly with the template DNA during polymerization through the homopolymer region, thus giving rise to fragments of varying lengths that have the same sequence after this area. When sequencing cloned DNA with a homopolymer region, several options can be tried. In our BigDye Terminator sequencing chemistry, dTTP has been replaced with dUTP, which lowers the melting temperature of DNA. However, it also has the effect of increasing the predisposition of slippage to occur through polyT regions (where A is in the template strand). An alternative sequencing chemistry can be used - dRhodamine chemistry - where dTTP is still in the reaction, and generally gives better results through these polyA regions. Alternatively, an oligo dT(12-15) primer that contains a wobble base (A, G or C) on the 3’ end can be used to anchor the primer in place at the end of the polyA region and give clean sequence following. Sequencing the opposite strand can sometimes be more successful, especially when going through a polyG region as the polyC strand is often easier to get through. Sometimes designing a new primer that is closer to the homopolymeric region can help, as nucleotide concentration and enzyme activity will be in a more optimal range when extending the smaller fragments in the cycle sequencing reaction. And lastly, we can try adjusting our cycle sequencing conditions as higher annealing temperatures and longer extension times can sometimes be useful in cases like this. Similar approaches can be used when trying to sequence PCR products with homopolymeric regions, but, in the end, it may sometimes be necessary to clone the PCR product in order to read through the repetitive stretch.

Cause:  compression
Solution:  compressions can sometimes be observed when a region of secondary structure forms in the amplified strand of DNA, leading to an alteration in the electrophoretic mobility of the DNA strand. This can appear as overlapping fragments after a certain point and can resemble a contaminated plasmid prep, but the contaminated prep will show double peaks beginning at the insertion site. To relax this compression, we can sometimes alter cycle sequencing conditions or use additives to denature the secondary structure. Alternatively, you can linearize your DNA or use 7-deaza-dGTP in a PCR reaction to help relieve the compression.

Cause:  frame shift mutation
Solution:  a frame shift mutation can occur when one or more bases are inserted or deleted into the template DNA and if multiple products are present in your sample, whether it be plasmid DNA or PCR product, you will see clean sequence up to the point of the mutation, followed by double peaks caused by the shift in the nucleotide sequence. In the case of plasmid DNA, it will be necessary to re-isolate your DNA to get a pure clone containing only one of the molecules. With PCR products, you will need to gel purify the two products in order to separate them.

Truncated Sequences

Truncated sequences can be characterized as abrupt or gradual. Abrupt truncations will show strong, clean signal up to a point and then drop sharply down over the course of a few nucleotides to much weaker or no detectable signal. Gradual truncations will show good sequence data initially but then begins to taper off to progressively weaker, smaller peaks until there is nothing but background noise. The nature of the truncation can sometimes help to determine its cause.

Cause:  secondary structure


 
Solution:   G-C rich, and to a lesser degree, A-T rich, DNA is predisposed to secondary structure formation, as strong hydrogen bonding between G and C nucleotides can cause the template DNA to loop or bend and anneal to complementary sequences, forming hairpins that can restrict the passage of the sequencing polymerase and thus be very difficult to sequence through reliably. These hairpins may not melt at our cycle sequencing temperatures and can cause premature termination of sequence data. Secondary structure may appear as a sharp termination of signal with no sequence data after, or if the loop has been relaxed slightly, you may see strong signal that drops abruptly but may have some weaker peaks following that are still quite accurate. With the newest formulation of BigDye Terminator chemistries (v3.1), some G-C rich difficulties have improved dramatically, but unfortunately it hasn’t solved everything. There is not one solution that resolves every secondary structure problem, but there are several we can try and usually one will allow us to read through it. The first thing we usually try is to add a DNA denaturant such as betaine, formamide or DMSO to our sequencing reaction to help melt the duplex formation and allow the polymerase to pass through. Changing our cycle sequencing parameters to include a higher denaturation temperature (98ºC vs 96ºC) and elimination of the 50ºC annealing step is sometimes useful, as is a preincubation step at 96ºC-98ºC for 10 minutes though additional reagents must be used to compensate for inactivation of enzyme at higher temperatures. Placing a primer as close to the hairpin loop as possible to help force its unwinding has also worked in the past. Sequencing the opposite strand can sometimes lead to a huge improvement. If these solutions don’t work, we may suggest you try linearizing your DNA with restriction enzymes to help relax the hairpin. And if you are trying to PCR up a very G-C rich region, addition of betaine or DMSO to your PCR reaction can help, as can substitution of 7-deaza dGTP for 75% of the dGTP in your PCR reaction. And if all else fails, you can try manual radioactive sequencing as a last resort.

Cause:  linearized DNA
Solution:  if your DNA has been cut with one or more restriction enzymes, the sequence data will sharply end at the recognition site of the enzyme that cut at the 3’ end of your insert. Did you accidentally send us digested DNA? Run it out on a gel to see.

Cause:  too much DNA


 

Solution: while there is a range of DNA concentrations we can sequence reliably, too much DNA will cause premature termination of signal. Overloading of DNA will exhibit early top-heavy peaks followed by rapidly weakening peak height and strength. This occurs because the dNTPS in the cycle sequencing reaction will be distributed among too many extending chains and will be depleted early on, resulting in an excessive amount of short fragments. Overloading is of special concern when sequencing on our 3100 capillary system as it is much more sensitive to DNA concentration and less tolerant of DNA overloading and severe overloading will reduce the lifespan of our (very expensive) capillary arrays. In addition, if your template is impure, higher concentrations of DNA can be accompanied by higher amounts of contaminants that can further worsen your DNA sequence quality. So please quantitate your template DNA carefully and check our Methods for quantitation section for our recommendations.

Cause:  salts
Solution:  excessive amounts of salts will also give rise to premature termination and may look similar to DNA overloading, with strong signal followed by progressively weakening signal. Salts have an inhibitory effect on the processivity of the sequencing Taq polymerase, which can lead to an overabundance of short fragments, or if the salt concentration is too great, the enzyme will be completely inhibited with no sequence data obtained. If salts are potentially a problem, perform an ethanol precipitation for salt removal.

Cause:  repetitive regions

Solution:  the nucleotide composition, as well as the size, of a repetitive region can play a large role in the success of sequencing through such an area. In general, G-C and G-T (ofter seen in bisulfite-treated DNA) repeats tend to be the most troublesome though, as mentioned before, the newest version of Applied Biosystems BigDye Terminator v3.1 contains some modifications that have allowed for some striking improvements in certain previously difficult templates. However, there are still some that remain a pain. In general, one can sequence partially through the repetitive region and the signal begins to fade and eventually becomes unreadable. This may be due to premature dNTP depletion, secondary structure formation or enzyme slippage. Various methods can be tried to sequence the repeat entirely, and many are similar to those we would use for G-C rich templates that form secondary structures, including the addition of betaine or DMSO and/or alterations in cycle sequencing parameters. If the repeat region is not excessively large, sequencing from the opposite strand to complete the region can be successful, especially if the complementary strand has a nucleotide composition that is more efficiently extended. However, if the region is large, it may be difficult to complete its entire sequence and determine the exact number of repeats present. Alternative methods, such as directed deletions or the use of an in vitro transposon system may need to be utilized.

Sequencing Artifacts

With our 3100 capillary system it appears we are more susceptible to sequencing artifacts than we once were with slab-based gel sequencers and it’s mostly due to the increased sensitivity of this instrument as well as instrument design. While we do everything we can to minimize these artifacts, they occasionally due occur. HOWEVER, there is definitely a link between template preparation and the degree to which these artifacts can be a problem, SO... the cleaner and more accurately quantified your sample is, the less of a problem these artifacts will be! Clean samples with strong signal are generally not affected by these artifacts, and if they are, many times the true peaks can be identified and corrected. We do visually inspect every chromatogram and edit what we can. But when we do spot an artifact that was probably due to an instrument or post-cycle sequencing cleanup issue, and we can’t be sure of the correct basecalling, and your sample had good signal strength and no other obvious problems, we will repeat the sample, UPON REQUEST, at no charge. We want to maintain our standards of high quality and will do what we have to do to keep it that way. So we ask that you inspect your chromatogram as well and if there is a sequencing artifact that causes difficulty with your analysis, and meets the above criteria, please let us know right away and we will rerun the sample as soon as we can. We store reacted samples for several days (DNA and primers for 2-3 months) but then discard them, so please let us know as quickly as you can if you wish a repeat.

Artifact:  "dye blobs"

Solution:  dye blobs are unincorporated dye terminator molecules that have passed through the cleanup columns and remain in solution with the purified DNA loaded onto the sequencers. They are most often seen with samples that have low signal strength. Samples with weak signal usually either 1).did not have enough DNA so there was less starting template to amplify and label, thus leaving behind a greater proportion of unincorporated dye molecules or 2). contained contaminants that inhibited the sequencing reaction and it’s theorized that certain contaminants may have a predisposition to bind to these dye clumps. And we have noticed a pattern where certain customer samples, as a whole, are more likely to contain dye blobs regardless of signal strength. In general, dye blobs appear as broad, undefined peaks of a one or two colors (usually red and blue in 3100 data) with the true DNA peaks underneath, and tend to occur relatively early in the data - generally before 50-60 bp - so for many, they aren’t much of a problem as that is still vector sequence. Repetition of samples with dye blobs is generally not too successful, as they don’t often go away but sometimes do become less intense. With very weak samples, oftentimes there’s not much we can do to fix the data. With samples of average signal strength, however, they are usually easily correctable as the true peaks are often visible beneath.

Artifact: "spikes"

Solution: "spikes" are seen as multicolored peaks within the sequence that usually obscure just one or two nucleotides worth of data, and occur in samples run on the capillary-based sequencers such as the 3100. They are caused by tiny air bubbles within the liquid polymer or by small pieces of dried polymer that have flaked off and entered a capillary. Again, there seems to be a slight predisposition for some customer samples to experience these artifacts and, when they do occur, are much more pronounced in samples with weak signal. When a sample has strong signal, they are often not detectable, but there are times when they can be very visible. The good thing is that these are most often always correctable upon rerunning. So, please let us know if you want a repeat because of a spike - for those of you only interested in a small separate region that is not affected by something like this, there would be no need for rerunning, but for those who are looking at an entire reading frame, for example, we realize that this would be a problem. So, as we can’t know everybody’s experiments and regions of interest, we ask that you help us and let us know when this problem affects your analyses and we will quickly repeat it for you.

Artifact:   Loss of resolution

Solution:   samples that exhibit loss of resolution (or LOR) will initially show sharp, well-defined peaks that, after a time, begin to widen and progressively deteriorate in quality to a point where these broad peaks become unreadable. The occurrence of this problem may sometimes be instrument-related and at other times may be sample-related. When the cause is instrument-related, it is generally due to improper capillary filling when fresh polymer is being pumped through the array. By modifying a parameter in the software that led to slower, more complete filling of the capillaries, we have been able to dramatically reduce the incidence of LOR. When the problem is sample-related, it is thought to be due to a (currently) unknown contaminant. It’s been speculated that this contaminant is more likely to be present when columns available in some commercial miniprep kits are heavily overloaded. As it’s usually difficult to determine the exact cause of sporadic LOR, we automatically rerun samples that exhibit LOR and, almost always, the sample will run fine the second time.

Homopolymeric Regions

- see discussion about homopolymers above under Multiple peaks within your sequences, homopolymeric regions.

Missing/Extra Bases

When you are analyzing your sequence data and it appears that one or more bases have been inserted or deleted, the first thing you should always do is visually inspect your chromatogram. Oftentimes, the ambiguity is due to incorrect software analysis of the peaks and will usually occur either early in the sequence or much later when resolution of large fragments is less than optimal. Analysis issues are usually easily correctable. Read over our Interpreting your chromatograms section first for a basic overview on how to quickly evaluate a trace and get a sense of its overall quality. Then look below for some more specific examples of software issues, as well as some less common situations where nucleotides may really be missing or inserted.

Cause:  basecalling difficulties
Solution:  as mentioned above, our Sequencing Analysis software is not always 100% accurate when assigning base designations, and these errors will most often occur either very early in the sequence - first 50 bp or so - as well as much later in the sequence, for different reasons. Early in the sequence, the smallest fragments show some minor variability in migration that can throw off the calculations that the software uses to adjust proper peak spacing, with certain nucleotide combinations being more susceptible than others. As mentioned before, we visually inspect all trace files and we will manually edit the alterations that we spot, and most are very easy to fix. Our manual edits will appear as lowercase letters within the sequence. However, there are times when we may miss some so we always STRONGLY suggest YOU also look over your chromatogram files to doublecheck the sequence data. What we look for, early on, are peaks that migrate very close together with a spacing gap that may occur on either side. In cases like this, the computer may insert an "N" to compensate for the odd peak spacing, though there is really no nucleotide there. Alternatively, when there is a spacing gap like this, the software may interpret a small rise in background noise as a peak and mistakenly insert an extra base where there shouldn’t be. Sometimes when the peaks are very close together, there is also a tendency for the software to miss one of them entirely and leave out one nucleotide of the sequence. And lastly, some of the excess unincorporated dye molecules will migrate at the same position as the very first bases of your sequence, sometimes obscuring the first 10-20 bases. In addition, the smallest fragments are not always very sharply resolved, a problem that seems to be more pronounced on capillary-based sequencers. For all the reasons mentioned above, it’s always best to choose a primer that is at least 40-50 bases away from your sequence of interest so that you can be sure you are in a region of the highest accuracy. Certain nucleotide combinations are more likely to show odd migration in the first 40-50 bp and will sometimes jumble together or not separate well. Doublecheck sequence data where C’s are followed by A’s, where A’s are followed by G’s and where there are two or three A’s in a row.

You may also find that occurrences of extra or missing bases becomes more frequent towards the end of the sequence. There is a limitation to the resolving power of the polymer to separate out the largest fragments so while the signal intensity of the later bands may still be quite strong, the peaks will become broader and less sharp. The latest version of Sequencing Analysis software includes improved algorithms that allows for better interpretation of the spacing of these larger fragments, and has shown increased accuracy of the basecalling farther out, but there is still a limit. With the use of our modified POP-7 protocols and a 50cm array on our 3100s, we often get 900-950 bases, and often more, of >98% accuracy on well-prepared templates.

Cause:  site-directed mutagenesis primers
Solution:  when making a DNA primer, synthesis of the oligo proceeds from the 3’ end to the 5’ end. During the synthesis procedure, truncations may occur when a specific base fails to be added to the growing oligo chain. The DNA sequencing chemistry generally allows for these failure sequences to be capped and not extended any further, but this process is not always 100% efficient and some of these truncations will continue to elongate, with the internal base deleted. As a result, a completed DNA synthesis will contain not only the desired full-length product, but potentially a population of a combination of all possible internal single base deletion sequences as well. Purification of the primer will remove the majority of these failure sequences but a small proportion of these truncations may still remain. So, when using a synthetic primer for site-directed mutagenesis, the potential is there for picking a clone that contains an oligo that is one of these deletion products and not your full-length primer. If this should occur, you should try picking a few other clones for sequencing, and often you’ll find one that does contain the desired mutagenesis primer. If you find your clones consistently contain the same deletion in your primer region, an error may have occurred when programming the primer sequence and you should contact the synthesis company for a new primer. For site-directed mutagenesis primers, it’s always advisable to choose to have your oligo purified by HPLC to minimize the population of failure sequences.

Cause: rearrangements
Solution: occasionally, plasmids may experience rearrangements or deletions, particularly when the insert is lethal to the cells, when there are duplicated regions within the sequence, when CIP treated for dephosphorylation or sometimes even when grown in large culture volumes. The numerous solutions to these difficulties are beyond the scope of this website, but if you should run into problems like this, it may be useful to check out the archives of the Methods and Reagents electronic newsgroup located at http://www.bio.net/hypermail/methods/ and do a search for "rearrangements."