GRO-seq Workflow¶

All workflows require the following files:

A config file (described below)
Reference genome FASTA file
Reference genome GTF file

Config File A config file is a tab separated text file that includes information regarding the name, location, and input of your experiment.

Single-End Config This is the config file format for single-ended data.

sample1         sample1_rep1    /path/to/sample1_rep1.fastq.gz  -       sample1
sample1         sample1_rep2    /path/to/sample1_rep2.fastq.gz  -       sample1
sample2         sample2_rep1    /path/to/sample2_rep1.fastq.gz  -       sample2
sample2         sample2_rep2    /path/to/sample2_rep2.fastq.gz  -       sample2

The columns represent:

MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
Path: The path to the fastq file to be processed. Can be gzipped or not.
ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.

Pair-End Config This is the config file format for pair-ended data.

sample1         sample1_rep1    /path/to/sample1_rep1_R1.fastq.gz /path/to/sample1_rep1_R2.fastq.gz     -       sample1
sample1         sample1_rep2    /path/to/sample1_rep2_R1.fastq.gz /path/to/sample1_rep2_R2.fastq.gz     -       sample1
sample2         sample2_rep1    /path/to/sample2_rep1_R1.fastq.gz /path/to/sample2_rep1_R2.fastq.gz     -       sample2
sample2         sample2_rep2    /path/to/sample2_rep2_R1.fastq.gz /path/to/sample2_R2_rep1.fastq.gz     -       sample2

The columns represent:

MergeID: The merge ID that will be used should your files be merged together. Should be the same for all replicates.
ID: The ID that will be used to name the majority of your files that are not merged. Recommended to be used to differentiate between different technical replicates.
Path1: The path to the first fastq file to be processed. Can be gzipped or not.
Path2: The path to the second fastq file to be processed. Can be gzipped or not.
ControlID: The ID indicating what control file to be used for peak calling and other downstream analysis. Use “-” (without quotes) if there is no control for a particular sample.
Mark: The ID that signifies the type of mark or histone being processed. Use “input” if the line refers to a control. If the line is NOT a control, then use the MergeID name.

Simple MNase-seq Tutorial (single-end, 75 length reads)

nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75

Simple MNase-seq Tutorial (pair-end, 75 length reads)

nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib p --readLen 75

Simple MNase-seq Tutorial (single-end, 75 length reads, use bowtie2 aligner instead of default bbmap, use 5 threads)

nextflow run /path/to/main.nf --mode gro --config /path/to/config.txt --fasta /path/to/fasta.fa --gtf /path/to/gtf.gtf --lib s --readLen 75 --aligner bowtie2 --threads 5