RNA-seq Workflow¶
RNA-seq workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
- A experimental config file (described below)
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz sample1 sample1_rep3 /path/to/sample1_rep2.fastq.gz sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz sample2 sample2_rep3 /path/to/sample1_rep2.fastq.gz
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz sample1 sample1_rep3 /path/to/sample1_rep3_R1.fastq.gz /path/to/sample1_rep3_R2.fastq.gz sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_rep2_R2.fastq.gz sample2 sample2_rep3 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep3_R2.fastq.gz
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
Experimental Config File A experimental config file is a tab separated text file that has sample and condition information. Your experimental config file must be ordered in the same way that your config file is. (sample1 in config file == sample1 in expInfo file). Additionally you must include the headers in the expInfo config file.
sample condition ctrl1 WT ctrl2 WT ctrl3 WT ko1 KO ko2 KO ko3 KO
The headers “sample” and “condition” must be included.
The columns represent:
- SampleID: Refers to the ID in your config file. Used to differentiate between different replicates.
- Condition: Refers to the condition or experimental variable of your sample. CIPHER currently only supports two condition DGE analysis.
- Simple RNA-seq Tutorial (single-end, 75 length reads, frFirstStrand)
nextflow run /path/to/main.nf --mode rna --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --strandInfo frFirstStrand --expInfo exp_config.txt
- Simple RNA-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode rna --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75 --strandInfo frFirstStrand --expInfo exp_config.txt
- Simple RNA-seq Tutorial (single-end, 75 length reads, use star aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode rna --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --strandInfo frFirstStrand --expInfo exp_config.txt --aligner star --threads 5