Complete Amplicon Sequencing: Data Retrieval

Upon completion of the NGS run, data are analyzed, demultiplexed and subsequently entered into our automated de novo assembly pipeline. PCR amplicons or DNA fragments are assembled using MGH CCIB's de novo assembler UltraCycler v1.0. (Brian Seed and Huajun Wang, unpublished). Once the assembly output has been manually inspected and passed our QC standards, results are made available through our secure data server. Respective researchers will receive an automated email notification as soon as the data can be accessed through our website.

Please note that your data will only be available for three months after it is released! We strongly encourage our users to download their data as soon as they are available. Our data server is only a temporary storage site which does not allow long-term archiving of NGS data. All Complete Amplicon Sequencing data generated at our core facility is subject to deletion without notice after three months.

Accessing Your Data:

To access our data server, please log into your account and click on the My Results button. Please note that the file format depends on the length of the PCR amplicon/DNA fragment!

>600 bp Amplicons

Results are provided in the exact same format as described for our Complete Plasmid Sequencing service . Under the assumption that the sample contains only one highly abundant amplicon, we try to assemble the NGS reads into one single contig. If multiple amplicons should be present, however, they cannot share significant sequence similarities as this could make the assembly results extremely difficult to interpret.

You have the following options to download your data files:

  • Download an uncompressed file *(.seq) with a single concatenated text file (in FASTA format) of all nucleotide sequences generated for the corresponding order.

  • Download a compressed file (*.sit) containing one FASTA format sequence file (*.seq) and one EXCEL file (with coverage information) for each sample of the corresponding order. For each sample, the raw NGS data (in FASTQ format) are also provided. Please note that there will be one FASTQ file for each sample. Paired-end reads (2 x 150 b) can be read from a single FASTQ file in which the entries for the first read (1) and second read (2) from each pair alternate. The first read in each pair comes before the second.

400-600 bp Amplicons

For amplicons with a size between 400 and 600 bp, we try to detect all possible variants with a frequency of >1%. Results are provided in a similar format as described for our CRISPR Sequencing service, with slight modifications.

For each sample of your corresponding order, you can download a compressed file (*.sit) containing the raw NGS data in FASTQ format (there will be one FASTQ file for each sample). Paired-end reads (2 x 150 b) can be read from a single FASTQ file in which the entries for the first read (1) and second read (2) from each pair alternate. The first read in each pair comes before the second. In addition, a FASTA file of algorithmically called "variants" as well as a Multiple Alignment file will be provided for each sample. Further, an EXCEL file providing sequencing depth and coverage information for each individual base can be downloaded. A sample data package illustrating the data delivery can be found here.

Note regarding heterogeneous lengths in samples submitted for NGS:
If you are submitting DNA samples of different lengths for NGS, please keep in mind that the analysis process does not guarantee the representation of input DNA of different lengths. For example, if the user submission is an equal mix by mass of a long (X) and a short PCR amplicon (Y), the sequence data will reflect the opposing consequences of two influences: (i) the input DNA contains fragments in a X:Y numerical proportion favoring the smaller fragments (Y), and (ii) the analysis results will underrepresent certain fragments because of size bias in the creation of the fragment library.

Decompression Software:
To open compressed *.sit files (which are actually .zip files) on your computer, you can use Stuffit Expander (free) or WinZip (trial version). To download, please select the appropriate link below. You can also use the Linux unzip command.

Stuffit Expander
WinZip for Windows
WinZip for Mac

Please note:
*.seq files are plain text files containing your sequence in FASTA format and can be opened with any software capable of viewing plain text or FASTA format files (text editor software such as Word, NotePad, etc.). You may also change the file extension from *.seq to *.txt.