MGH   
CCIB
 
 
A simple Python program for separating paired-end NGS reads


For each sample of your order, we provide a compressed file (*.sit) containing the raw NGS data in FASTQ format (there will be one FASTQ file for each sample). Paired-end reads (2 x 150 b) can be read from a single FASTQ file in which the entries for the first read (1) and second read (2) from each pair alternate. The first read in each pair comes before the second. If you want to separate the FASTQ files into read 1 and read 2, you can use a simple Python program. It will create two files with the extension _R1 and _R2 added to the original file name.

The program is as follows:

import sys

input_file=sys.argv[1]
base_name=input_file.replace('.fastq','')

r1 = open('%s_R1.fastq' %base_name,'w+')
r2 = open('%s_R2.fastq' % base_name,'w+')

[r1.write(line) if (i % 8 < 4) else r2.write(line) for i, line in enumerate(open(input_file))]

r1.close()
r2.close()

Questions to hwang12@mgh.harvard.edu