Data Analyses Pipelines

We now support running lots of community provided and vendor supplied pipelines for projects which are sequenced by us.

FAQ

Can we get a custom reference genome for our project? Yes, we can build custom reference genome for collaborative projects.
How do you transfer data? We add a new analysis directory to the Globus collection for analysis files. Files are available only for 30 days.
Will you run these pipelines automatically? No, we run them for any specific project based on users requests.
How you run NF-core pipelines on HPC? We use a custom configuration file to run NF-core workflows on Imperial College's HPC and track runs via a local installation of Nextflow Tower (community edition)
Can we access your Nextflow Tower server? We are open for discussions.
How do you manage multiple pipeline runs on HPC? We use Apache Airflow as orchestration manager and queue multiple pipelines with correct queue and pool information.
Can you run these pipelines for externally sequenced fastq files? NO. We can run them only for the projects which are sequenced by us. Internally, we organise our files following ENA metadata model which helps us to configure these pipelines programmatically. But you can ask for our help if you are trying to run these pipelines on externally sequenced data.
Can you add XYZ pipeline to this list? Yes. We are continuously adding new pipelines to the above list. Feel free to suggest us if your project needs any new analysis pipeline.
Why some of these pipelines are marked Untested? We have checked them only with test data but haven't used them for any real project.
Go to Top

Snakemake RNA-Seq workflow

Pipeline name: snakemake-workflows/rna-seq-star-deseq2
Supported versions: Latest (>= 2.0.0)
Default reference genome: Ensembl species
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids
  • Reference genome to use from Ensembl(e.g. Homo sapiens)
  • Sample groups for DESeq2 analysis
Click here for more information

Reference genome

  • Species name (e.g., homo sapiens)
  • Ensembl release number (e.g., 110), for using any specific version of annotation
  • Genome build tag (e.g., GRCh38)

Sample metadata

Simple metadata:
  sample_id,condition
  IGF001,untreated
  IGF002,treated
Complex metadata:
  sample_id,treatment_1,treatment_2
  IGF001,untreated,untreated
  IGF002,untreated,treated
  IGF003,untreated,treated

Sample group info

Simple group: Check this example. For more information on simple contrast check this page.
  Group: treated-vs-untreated
      variable_of_interest: condition
      level_of_interest: treated
Complex group: Check this example. For more information about complex contrast specification, check this page.
  Group: treatment_1_alone
    variable_of_interest: treatment_1
    Slevel_of_interest: treated
Pipeline description:
  • Trim (if configured) and align individual fastq groups (i.e., sample + flowcell lane) using STAR
  • Combine gene counts using a custom tools, i.e., combine counts from results/star/{unit.sample_name}_{unit.unit_name}/ReadsPerGene.out.tab files and merge them to results/counts/all.tsv
  • Create normalized gene counts via DESeq2 uning this R script and output RDS file as well.
  • Compare individial groups vis DESeq2 using this R script.
Output results:
  • STAR aligned BAM files
  • Combined gene counts for all samples in .tsv format
  • MultiQC report
  • DESeq2 output files and PCA plots (based on analysis configuration)
Go to Top

NF-core RNA-Seq workflow

Pipeline name: nf-co.re/rnaseq
Supported versions: Latest (>= 3.12.0)
Default reference genome: iGenome. Please note that the default iGenome annotations are currently out-of-date. E.g., human iGenomes annotations are from Ensembl release 75 (February 2014). Check out our FAQ section for more information about custom reference genomes.
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids and replicate information
  • Reference genome from iGenome(e.g. GRCh38)
  • List of parameters for NF-core pipeline run
Click here for more information

List of sample IGF ids. For e.g.,

  IGF001
  IGF002

List of NF-core RNA-Seq pipeline parameters. For e.g.,

  --aligner star_rsem
  --deseq2_vst
Output results: NF-core RNA-Seq output
Go to Top

Single cell RNA-Seq data analysis for 10X genomics library

Pipeline name: Cellranger
Supported versions: Latest (>= 7.2)
Default reference genome: Reference genomes from 10X genomics.
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids and their feature types, i.e. Gene Expression, VDJ-B or VDJ-T
  • Cellranger multi group information
  • Reference genome information
Click here for more information

List of sample IGF ids and feature types. For e.g.,

  #igf_id,feature_types,cellranger_group
  IGF001,Gene Expression,Group1
  IGF002,VDJ-B,Group1
  IGF003,Gene Expression,Group2
  IGF004,VDJ-T,Group2
  IGF005,Gene Expression,Group3
Output results:
Go to Top

Single cell Gene expression and ATAC-Seq multiome data analysis for 10X genomics library

Pipeline name: Cellranger-ARC
Supported versions: Latest (>= 2.0.2)
Default reference genome: Reference genomes from 10X genomics.
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids and their library type, i.e. Gene Expression or Chromatin Accessibility
  • Cellranger-ARC group information
  • Reference genome information
Click here for more information

List of sample IGF ids and library type. For e.g.,

  #igf_id,feature_types,cellranger_group
  IGF001,Gene Expression,Group1
  IGF002,Chromatin Accessibility,Group1
  IGF003,Gene Expression,Group2
  IGF004,Chromatin Accessibility,Group2
Output results:
Go to Top

NF-core ATAC-Seq workflow

Pipeline name: nf-co.re/atacseq
Supported versions: Latest (>= 2.1.2)
Default reference genome: iGenome. Please note that the default iGenome annotations are currently out-of-date. E.g., human iGenomes annotations are from Ensembl release 75 (February 2014). Check out our FAQ section for more information about custom reference genomes.
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids
  • Control samples for peak calling (if any)
  • Reference genome from iGenome(e.g. GRCh38)
  • List of parameters for NF-core pipeline run
Click here for more information

List of sample IGF ids and replicates. For e.g.,

  #igf_id,sample_group,replicate_id
  IGF001,CONTROL,1
  IGF002,CONTROL,2
  IGF003,CONTROL,3

Peak calling control samples (if any). For e.g.,

  #igf_id,sample_group,replicate_id,control,control_replicate
  IGF001,CONTROL,1,,
  IGF002,CONTROL,2,,
  IGF003,CONTROL,3,,
  IGF004,TREATMENT,1,CONTROL,1
  IGF005,TREATMENT,2,CONTROL,2
  IGF006,TREATMENT,3,CONTROL,3

List of NF-core ATAC-Seq pipeline parameters. For e.g.,

  --trim_nextseq 20
  --aligner bwa
  --narrow_peak
Output results: NF-core ATAC-Seq output
Go to Top

GeoMx NGS Pipeline

Pipeline name: GeoMx NGS Pipeline
Supported versions: Latest (>= 2.3.3.10)
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids and DSP ids
  • GeoMx experiment configuration file
  • Probe assay metadata describing the gene targets present in the data, PKC files can be found here
Click here for more information

List of sample IGF ids and DSP ids. For e.g.,

  #igf_id,dsp_id
  IGF001,DSP001
  IGF002,DSP002
Output results:
  • DCC count files
  • DCC count QC based on latest version of these templates
Go to Top

NF-core Sarek workflow

Pipeline name: nf-co.re/sarek
Supported versions: Latest (>= 3.4.0)
Default reference genome: iGenome - GATK.GRCh38. Please note that the default iGenome annotations are currently out-of-date. E.g., human iGenomes annotations are from Ensembl release 75 (February 2014). Check out our FAQ section for more information about custom reference genomes.
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids
  • Patient ID: Custom patient ID for normal and tumor samples
  • Sex: Sex chromosomes of the patient; i.e. XX, XY or NA, only used for Copy-Number Variation analysis in a tumor/pair
  • Status: Normal/tumor status of sample; can be 0 (normal) or 1 (tumor)
  • Reference genome from iGenome(e.g. GRCh38)
  • List of parameters for NF-core pipeline run
Click here for more information

List of sample IGF ids and replicates. For e.g.,

  #igf_id,patient,sex,status
  IGF001,ID1,XX,0
  IGF002,ID2,XY,0
  IGF003,ID3,XX,0

List of NF-core Sarek pipeline parameters. For e.g.,

  --genome GATK.GRCh38
  --wes
  --intervals path_to_intervals_file
  --save_mapped
  --concatenate_vcfs
  --tools haplotypecaller,snpeff,vep
Output results: NF-core Sarek output
Go to Top

NF-core smRNASeq workflow

Pipeline name: nf-co.re/smrnaseq
Supported versions: Latest (>= 2.2.4)
Default reference genome: iGenome. Please note that the default iGenome annotations are currently out-of-date. E.g., human iGenomes annotations are from Ensembl release 75 (February 2014). Check out our FAQ section for more information about custom reference genomes.
Required inputs:

We need following details to configure and run this pipeline:

Click here for more information

List of sample IGF ids. For e.g.,

  #igf_id
  IGF001
  IGF002
  IGF003

List of NF-core smRNAseq pipeline parameters. For e.g.,

  --genome GRCh38
Output results: NF-core smRNAseq output
Go to Top

NF-core Methylseq workflow

Pipeline name: nf-co.re/methylseq
Supported versions: Latest (>= 2.5.0)
Default reference genome: iGenome. Please note that the default iGenome annotations are currently out-of-date. E.g., human iGenomes annotations are from Ensembl release 75 (February 2014). Check out our FAQ section for more information about custom reference genomes.
Required inputs:

We need following details to configure and run this pipeline:

  • List of sample IGF ids
  • Reference genome from iGenome(e.g. GRCh38)
  • List of parameters for NF-core pipeline run
Click here for more information

List of sample IGF ids. For e.g.,

  #igf_id
  IGF001
  IGF002
  IGF003

List of NF-core methylseq pipeline parameters. For e.g.,

  --genome GRCh38
Output results: NF-core methylseq output
Go to Top