MGH   
CCIB
 
CRISPR Sequencing: Data Retrieval

Upon completion of the NGS run, data are analyzed and subsequently made available through our secure data server. Our customers will receive an automated email notification as soon as the data can be accessed through our website.

Please note that your data will only be available for three months after it is released! We strongly encourage our users to download their data as soon as they are available. Our data server is only a temporary storage site, which does not allow long-term archiving of NGS data. All CRISPR Sequencing data generated at our core facility is subject to deletion without notice after three months.

Accessing Your Data:
To access our data server, please log into your account and click on the My Results button. For each sample of your corresponding order, you can now download a compressed file (*.sit) containing the raw NGS data in FASTQ format (there will be one FASTQ file for each sample). Paired-end reads (2 x 150 b) can be read from a single FASTQ file in which the entries for the first read (1) and second read (2) from each pair alternate. The first read in each pair comes before the second. In addition, a FASTA file of algorithmically called CRISPR "variants" as well as a Multiple Alignment file will be provided for each sample. A sample data package illustrating the data delivery can be found here. For a short description of our algorithm used to perform CRISPR variant detection, please see here.

If you chose to perform your own analysis and want to separate the FASTQ files into read 1 and read 2 prior to their import into your preferred analysis pipeline, you can use this simple Python program.

Note regarding heterogeneous lengths in samples submitted for NGS:
If you are submitting DNA samples of different lengths for NGS, please keep in mind that the analysis process does not guarantee the representation of input DNA of different lengths. For example, if the user submission is an equal mix by mass of 300 bp and 200 bp fragments (e.g. PCR amplicons), the sequence data will reflect the opposing consequences of two influences: (i) the input DNA contains fragments in a 3:2 numerical proportion favoring the 200 bp fragments, and (ii) the analysis method will underrepresent 200 bp fragments because of size bias in the creation of the fragment library.

Decompression Software:
To open compressed *.sit files (which are actually .zip files) on your computer, you can use Stuffit Expander (free) or WinZip (trial version). To download, please select the appropriate link below. You can also use the Linux unzip command.

Stuffit Expander
WinZip for Windows
WinZip for Mac


Please note:
*.seq files are plain text files containing your sequence in FASTA format and can be opened with any software capable of viewing plain text or FASTA format files (text editor software such as Word, NotePad, etc.). You may also change the file extension from *.seq to *.txt.