MGH DNA Core

Research Services

Home > AAV_Genome_Sequencing > current

AAV Genome Sequencing: Data Retrieval Details

Explanation of Delivered Data Files

For each sample you will receive nine data files, all beginning with the TUBE_ID specified in your sample submission form. These files collectively provide raw data, processed results, and interpretive summaries to support comprehensive rAAV quality control and production assessment. Below is an overview of each file, its contents, and recommended uses.

prefix_combined_reference.fasta
Multi-FASTA file that includes all reference sequences associated with the production of the AAV vector:
- transgene plasmid
- Rep-Cap plasmid
- helper plasmid
- host genome
Suggested Use:
Load into genome viewers like IGV alongside the tagged BAM files to visualize how reads align across all reference components
prefix_output_cat.fastq
Raw, untrimmed FASTQ reads after demultiplexing
No trimming or filtering has been applied
prefix_tagged_bams
BAM files containing reads mapped to the combined reference and classified by AAV subgenome type
- Only primary alignments are shown (produced by minimap2).
- Supplementary alignments indicate split reads (single reads mapping to multiple, non-contiguous regions of the genome).
Suggested Use:
Open in IGV along with the combined reference to:
- Visualize alignment by subgenome class
- Inspect coverage, variants and structural features
prefix_trimmed_aav_per_read_info.tsv
TSV file listing each read and its classified AAV subgenome type
- Subgenome types include:
  - Full ssAAV
  - Partial ssAAV, including:
    - Genome Duplication Mutants (GDM)
    - 3' Incomplete Genome types (3' ICG)
    - 5' Incomplete Genome types (5' ICG)
    - Partial ICG (lacking ITRs)
  - Full scAAV
  - Partial scAAV, including:
    - 3' Snapback Genomes (3' SBG)
    - 5' Snapback Genomes (5' SBG)
    - SBG (unresolved orientation)
  - Backbone Contamination
  - Complex
  - Unknown
  Reference diagrams are available in the : EPI2ME AAV Workflow GitHub repository
Suggested Use:
- Assess rAAV quality and subgenomic structures
- Filter or summarize with spreadsheet tools or programmatically (e.g. Python/R)
prefix_trimmed_bam_info.tsv
Per-read alignment summary produced by seqkit bam. Includes:
- Read mapping reference, position, orientation
- Mapping quality, read length, alignment accuracy
- Clipping information and alignment flags
Suggested Use:
- Detect low-quality or split reads
- Analyze structural features or mapping artifacts
- Group by reference to compare across subgenomes
prefix_trimmed_nanostat_output.txt
Read quality summary after trimming, including:
- Total reads and bases
- Mean read length and quality
- N50 statistics
prefix_trimmed.transgene_plasmid_consensus.fasta
Consensus sequence of the transgene plasmid, polished using medaka
prefix_trimmed.transgene_plasmid_sorted.vcf
Variant calls showing differences between the input transgene plasmid and the transgene consensus sequence

Suggested Use:
- Validate plasmid integrity of the transgene
- Identify mutations, insertions, or deletions
- Visualize alongside tagged BAM file(s) and combined reference in IGV
prefix_wf-aav-qc-report.html
Comprehensive interactive HTML report summarizing the AAv analysis workflow.

Includes:
- Read quality: yield, length, and quality scores
- Contamination assessment: mapped vs. unmapped reads; breakdown by reference (host, helper, Rep-Cap, transgene)
- Truncation analysis: start/end mapping positions within the ITR-to-ITR region
- AAV subgenome summary: frequency of each subgenome class (from the per-read classification)