AAV Genome Sequencing: Data Retrieval Details
Explanation of Delivered Data Files
For each sample you will receive nine data files, all beginning with the
TUBE_ID
specified in your sample submission form. These files collectively provide raw data, processed results, and interpretive summaries to support comprehensive rAAV quality control and production assessment. Below is an overview of each file, its contents, and recommended uses.
- prefix_combined_reference.fasta
Multi-FASTA file that includes all reference sequences associated with the production of the AAV vector:
- transgene plasmid
- Rep-Cap plasmid
- helper plasmid
- host genome
Suggested Use:
Load into genome viewers like IGV alongside the tagged BAM files to visualize how reads align across all reference components
- prefix_output_cat.fastq
Raw, untrimmed FASTQ reads after demultiplexing
No trimming or filtering has been applied
- prefix_tagged_bams
BAM files containing reads mapped to the combined reference and classified by AAV subgenome type
- Only primary alignments are shown (produced by minimap2).
- Supplementary alignments indicate split reads (single reads mapping to multiple, non-contiguous regions of the genome).
Suggested Use:
Open in IGV along with the combined reference to:
- Visualize alignment by subgenome class
- Inspect coverage, variants and structural features
- prefix_trimmed_aav_per_read_info.tsv
TSV file listing each read and its classified AAV subgenome type
- Subgenome types include:
- Full ssAAV
- Partial ssAAV, including:
- Genome Duplication Mutants (GDM)
- 3' Incomplete Genome types (3' ICG)
- 5' Incomplete Genome types (5' ICG)
- Partial ICG (lacking ITRs)
- Full scAAV
- Partial scAAV, including:
- 3' Snapback Genomes (3' SBG)
- 5' Snapback Genomes (5' SBG)
- SBG (unresolved orientation)
- Backbone Contamination
- Complex
- Unknown
Reference diagrams are available in the : EPI2ME AAV Workflow GitHub repository
Suggested Use:
- Assess rAAV quality and subgenomic structures
- Filter or summarize with spreadsheet tools or programmatically (e.g. Python/R)
- prefix_trimmed_bam_info.tsv
Per-read alignment summary produced by seqkit bam. Includes:
- Read mapping reference, position, orientation
- Mapping quality, read length, alignment accuracy
- Clipping information and alignment flags
Suggested Use:
- Detect low-quality or split reads
- Analyze structural features or mapping artifacts
- Group by reference to compare across subgenomes
- prefix_trimmed_nanostat_output.txt
Read quality summary after trimming, including:
- Total reads and bases
- Mean read length and quality
- N50 statistics
- prefix_trimmed.transgene_plasmid_consensus.fasta
Consensus sequence of the transgene plasmid, polished using medaka
- prefix_trimmed.transgene_plasmid_sorted.vcf
Variant calls showing differences between the input transgene plasmid and the transgene consensus sequence
Suggested Use:
- Validate plasmid integrity of the transgene
- Identify mutations, insertions, or deletions
- Visualize alongside tagged BAM file(s) and combined reference in IGV
- prefix_wf-aav-qc-report.html
Comprehensive interactive HTML report summarizing the AAv analysis workflow.
Includes:
- Read quality: yield, length, and quality scores
- Contamination assessment: mapped vs. unmapped reads; breakdown by reference (host, helper, Rep-Cap, transgene)
- Truncation analysis: start/end mapping positions within the ITR-to-ITR region
- AAV subgenome summary: frequency of each subgenome class (from the per-read classification)