GRO-seq Workflow¶
All workflows require the following files:
- A config file (described below)
- Reference genome FASTA file
- Reference genome GTF file
Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.
Single-End Config This is the config file format for single-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path: The path to the fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
Pair-End Config This is the config file format for pair-ended data.
sample1 sample1_rep1 /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz - sample1 sample1 sample1_rep2 /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz - sample1 sample2 sample2_rep1 /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz - sample2 sample2 sample2_rep2 /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_R2_rep1.fastq.gz - sample2
The columns represent:
- MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
- ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
- Path1: The path to the first fastq file to be processed. Can be gzipped or not.
- Path2: The path to the second fastq file to be processed. Can be gzipped or not.
- ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
- Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.
- Simple MNase-seq Tutorial (single-end, 75 length reads)
nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75
- Simple MNase-seq Tutorial (pair-end, 75 length reads)
nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75
- Simple MNase-seq Tutorial (single-end, 75 length reads, use bowtie2 aligner instead of default bbmap, use 5 threads)
nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --aligner bowtie2 --threads 5