differentialabundance: Parameters

Define where the pipeline should find input data and save output data.

A string identifier used to name result files in the output directory

required

type: string

default: study

A string identifying the technology used to produce the data

required

type: string

Path to CSV/TSV file containing information about the samples in the experiment.

required

type: string

pattern: ^\S+\.(csv|tsv)$

A CSV/TSV/YML/YAML file describing sample contrasts to compare groups.

type: string

pattern: ^\S+\.(csv|tsv|yml|yaml)$

The output directory where the results will be saved. You have to use absolute paths to storage on Cloud infrastructure.

required

type: string

Type of abundance measure used, platform-dependent.

required

type: string

To how many digits should numeric output in different modules be rounded? If -1 or null, will not round.

type: integer

default: 4

Ways of providing your abundance values

TSV/CSV-format abundance matrix

type: string

pattern: ^\S+\.(tsv|csv)$|\S*proteinGroups\.txt$

(RNA-seq only): optional transcript/gene length matrix with samples and transcript_ids/gene_ids as in the abundance matrix.

type: string

Alternative to matrix: a compressed CEL files archive such as often found in GEO

type: string

Use SOFT files from GEO by providing the GSE study identifier

type: string

Column in the sample sheet to be used as the primary sample identifier

required

type: string

default: sample

Type of observation

required

type: string

default: sample

Column in the sample sheet to be used as identifier for observations. If unset, the —observations_id_col is used.

type: string

Options related to features

Feature ID attribute in the abundance table as well as in the GTF file (e.g. the gene_id field)

required

type: string

default: gene_id

Feature name attribute in the abundance table as well as in the GTF file (e.g. the gene symbol field)

required

type: string

default: gene_name

Type of feature. Often ‘gene’

required

type: string

default: gene

When set, use the control features in scaling/ normalization (currently only supported for differential_method deseq2)

type: boolean

A text file listing technical features (e.g. spikes)

type: string

Comma-separated string, specifies feature metadata columns to be used for exploratory analysis, platform-specific

type: string

default: gene_id,gene_name,gene_biotype

Supply your own feature annotations. Can be derived from the GTF (rnaseq) or from the Bioconductor annotation package (affy arrays).

type: string

pattern: ^\S+\.(csv|tsv)$

Analysis options related to the use of paramsheet to run multiple combinations of analyses (see usage docs for details).

Name of the paramset to run. In profile mode, set by the analysis profile for output directory naming. In paramsheet mode, selects which paramset(s) to run (comma-separated).

type: string

Path to a paramsheet YAML file. Setting this activates multi-run (paramsheet) mode where paramsheet values take priority over CLI flags.

type: string

pattern: ^\S+\.(yaml|yml)$

Options for processing of affy arrays with justRMA()

Column of the sample sheet containing the Affymetrix CEL file name

type: string

default: file

logical value. If set to true, apply background correction using RMA.

type: boolean

default: true

integer value indicating which RMA background to use

type: integer

default: 2

logical value. If TRUE, then works on the PM matrix in place as much as possible, good for large datasets.

type: boolean

Used to specify the name of an alternative cdf package. If set to NULL, then the usual cdf package based on Affymetrix’ mappings will be used.

type: string

logical value. If TRUE, a matrix of probe annotations will be derived.

type: boolean

default: true

should the spots marked as ‘MASKS’ set to NA?

type: boolean

should the spots marked as ‘OUTLIERS’ set to NA?

type: boolean

if TRUE, then overrides what is in rm.mask and rm.oultiers.

type: boolean

Genome annotation file in GTF format

type: string

pattern: ^\S+\.gtf(\.gz)?

If a GTF file is supplied, which feature type to use

type: string

default: transcript

If a GTF file is supplied, which field should go first in the converted output table

type: string

default: gene_id

Options for processing of proteomics MaxQuant tables with the Proteus R package

Prefix of the column names of the MaxQuant proteingroups table in which the intensity values are saved; the prefix has to be followed by the sample names that are also found in the samplesheet. Default: ‘LFQ intensity’; will search for both the prefix as entered and the prefix followed by one whitespace.

type: string

default: LFQ intensity

Normalization function to use on the MaxQuant intensities.

type: string

Which method to use for plotting sample distributions of the MaxQuant intensities; one of ‘violin’, ‘dist’, ‘box’.

type: string

Should a loess line be added to the plot of mean-variance relationship of the conditions? Default: true.

type: boolean

default: true

Valid R palette name

type: string

default: Set1

Options related to filtering upstream of differential analysis

Minimum abundance value. Set to false to disable abundance filtering.

required

type: integer,boolean

Minimum observations that must pass the threshold to retain the row/ feature (e.g. gene).

type: number

default: 1

A minimum proportion of observations, given as a number between 0 and 1, that must pass the threshold. Overrides minimum_samples

type: number

An optional grouping variable to be used to calculate a min_samples value

type: string

A minimum proportion of observations, given as a number between 0 and 1, that must have a value (not NA) to retain the row/ feature (e.g. gene).

type: number

default: 0.5

Minimum observations that must have a value (not NA) to retain the row/ feature (e.g. gene). Overrides filtering_min_proportion_not_na.

type: number

Set to run IMMUNEDECONV

type: boolean

Set method to run with IMMUNEDECONV. Available options can be found in ‘https://omnideconv.org/immunedeconv/articles/immunedeconv.html’

type: string

default: quantiseq

Set function to run with IMMUNEDECONV. Available options can be found in ‘https://omnideconv.org/immunedeconv/articles/immunedeconv.html’

type: string

default: deconvolute

Options related to data exploration

Clustering method used in dendrogram creation

required

type: string

default: ward.D2

Correlation method used in dendrogram creation

required

type: string

default: spearman

Number of features selected before certain exploratory analyses. If -1, will use all features.

required

type: integer

default: 500

Length of the whiskers in boxplots as multiple of IQR. Defaults to 1.5.

type: number

default: 1.5

Threshold on MAD score for outlier identification

type: integer

default: -5

How should the main grouping variable be selected? ‘auto_pca’, ‘contrasts’, or a valid column name from the observations table.

required

type: string

default: auto_pca

Specifies assay names to be used for matrices, platform-specific.

hidden

type: string

default: raw,normalised,variance_stabilised

Specifies final assay to be used for exploratory analysis, platform-specific

hidden

type: string

default: variance_stabilised

Of which assays to compute the log2 during exploratory analysis. Not necessary for maxquant data as this is controlled by the pipeline.

type: string

default: raw,normalised

Valid R palette name

required

type: string

default: Set1

Options related to differential operations

Differential analysis method

type: string

Advanced option: the suffix associated tabular differential results tables. Will by default use the appropriate suffix according to the study_type.

type: string

The feature identifier column in differential results tables

required

type: string

default: gene_id

The fold change column in differential results tables

required

type: string

default: log2FoldChange

The p value column in differential results tables

type: string

default: pvalue

The q value column in differential results tables (adjust p values/ q values).

required

type: string

default: padj

Minimum fold change used to calculate differential feature numbers. Note that this number will be log2 transformed

required

type: number

default: 2

Maximum p value used to calculate differential feature numbers

required

type: number

default: 1

Maximum q value used to calculate differential feature numbers

required

type: number

default: 0.05

Where a features file (GTF) has been provided, what attribute to use to name features

type: string

default: gene_name

Indicate whether or not fold changes are on the log scale (default is to assume they are)

type: boolean

default: true

Valid R palette name

required

type: string

default: Set1

In differential analysis (DEseq2 or Limma), subset to the contrast samples before modelling variance?

type: boolean

test parameter passed to DESeq()

type: string

fitType parameter passed to DESeq()

type: string

sfType parameter passed to DESeq()

type: string

‘minReplicatesForReplace’ parameter passed to DESeq()

type: integer

default: 7

useT parameter passed to DESeq2

type: boolean

independentFiltering parameter passed to results()

type: boolean

default: true

lfcThreshold parameter passed to results()

type: number

altHypothesis parameter passed to results()

type: string

default: greaterAbs

pAdjustMethod parameter passed to results()

type: string

default: BH

alpha parameter passed to results()

type: number

default: 0.1

minmu parameter passed to results()

type: number

default: 0.5

variance stabilisation method to use when making a variance stabilised matrix

type: string

Shrink fold changes in results?

type: boolean

default: true

type: integer

blind parameter for rlog() and/ or vst()

type: boolean

default: true

nsub parameter passed to vst()

type: integer

default: 1000

passed to lmFit(), positive integer giving the number of times each distinct probe is printed on each array.

type: number

passed to lmFit(), positive integer giving the spacing between duplicate occurrences of the same probe, spacing=1 for consecutive rows.

type: string

Sample sheet column to be used to derive a vector or factor specifying a blocking variable on the arrays for limma::lmFit(); however, for random effects models, DREAM is the recommended approach in this pipeline

type: string

passed to limma::lmFit(), the inter-duplicate or inter-technical replicate correlation; however for random effects models, DREAM is the recommended approach in this pipeline

type: string

passed to lmFit(), the fitting method

type: string

passed to eBayes(), a numeric value between 0 and 1, assumed proportion of genes which are differentially expressed

type: number

default: 0.01

passed to eBayes(), logical, should an intensity-dependent trend be allowed for the prior variance?

type: boolean

passed to eBayes(), logical, should the estimation of df.prior and var.prior be robustified against outlier sample variances?

type: boolean

passed to eBayes, comma separated string of two values, assumed lower and upper limits for the standard deviation of log2-fold-changes for differentially expressed genes

type: string

default: 0.1,4

passed to eBayes, comma separated string of length 1 or 2, giving left and right tail proportions of x to Winsorize. Used only when robust=TRUE.

type: string

default: 0.05,0.1

passed to topTable(), minimum absolute log2-fold-change required

type: integer

passed to topTable(), logical, should confidence 95% intervals be output for logFC? Alternatively, can take a numeric value between zero and one specifying the confidence level required.

type: boolean

passed to topTable(), method used to adjust the p-values for multiple testing.

type: string

cutoff value for adjusted p-values. Only genes with lower p-values are listed.

type: number

default: 1

Turns on and off usage of voom normalization in the Limma module.

type: boolean

type: integer

default: 1

type: integer

type: boolean

type: number

default: 0.01

type: string

default: 0.1,4

type: boolean

type: string

default: 0.05,0.1

type: string

default: adaptive

type: boolean

type: string

default: BH

Functional analysis method

type: string

Gene sets in GMT or GMX-format; for GSEA: multiple comma-separated input files in either format are possible. For gprofiler2: A single file in GMT format is possible; this has lowest priority and will be overridden by —gprofiler2_token and —gprofiler2_organism.

type: string

Permutation type

type: string

Number of permutations

type: integer

default: 1000

Enrichment statistic

type: string

Metric for ranking genes

type: string

Gene list sorting mode

type: string

Gene list ordering mode

type: string

Max size: exclude larger sets

type: integer

default: 500

Min size: exclude smaller sets

type: integer

default: 15

Normalisation mode

type: string

Randomization mode

type: string

Make detailed geneset report?

type: boolean

default: true

Use median for class metrics

type: boolean

Number of markers

type: integer

default: 100

Plot graphs for the top sets of each phenotype

type: integer

default: 20

Seed for permutation

type: string

default: timestamp

Save random ranked lists

type: boolean

Make a zipped file with all reports

type: boolean

Short name of the organism that is analyzed, e.g. hsapiens for homo sapiens.

type: string

Should only significant enrichment results be considered?

type: boolean

default: true

Should underrepresentation be measured instead of overrepresentation?

type: boolean

The method that should be used for multiple testing correction.

type: string

On which source databases to run the gprofiler query

type: string

Whether to include evcodes in the results.

type: boolean

Maximum q value used for significance testing.

type: number

default: 0.05

Token that should be used as a query.

type: string

Path to CSV/TSV/TXT file that should be used as a background list of genes for the query; alternatively, ‘auto’ (default) or ‘false’.

type: string

default: auto

pattern: ^\S+\.(csv|tsv|txt)$|auto|false

Which column to use as gene IDs in the background matrix.

type: string

How to calculate the statistical domain size.

type: string

How many genes must be differentially expressed in a pathway for it to be considered enriched? Default 1.

type: integer

default: 1

Valid R palette name

type: string

default: Blues

Path to TSV file containing network file for decoupler

type: string

pattern: ^\S+\.(tsv)$

Removes sources of a net with less than min_n targets

type: integer

default: 5

Comma-separated list of methods to use (e.g., ‘ora,ulm’)

type: string

default: ulm

Should a Shiny app be built?

type: boolean

default: true

Should the app be deployed to shinyapps.io?

type: boolean

Your shinyapps.io account name

type: string

The name of the app to push to in your shinyapps.io account

type: string

Qmd report template from which to create the pipeline report

required

type: string

default: ${projectDir}/assets/differentialabundance_report.qmd

pattern: ^\S+\.(Rmd|qmd|ipynb)$

Email address for completion summary.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

A logo to display in the report instead of the generic pipeline logo.

hidden required

type: string

default: ${projectDir}/docs/images/nf-core-differentialabundance_logo_light.png

CSS to use to style the output, in lieu of the default nf-core styling

hidden required

type: string

default: ${projectDir}/assets/nf-core_style.css

A markdown file containing citations to include in the final report

type: string

default: ${projectDir}/CITATIONS.md

A title for reporting outputs

type: string

An author for reporting outputs

type: string

Semicolon-separated string of contributor info that should be listed in the report.

type: string

A description for reporting outputs

type: string

Whether to generate a scree plot in the report

type: boolean

default: true

Skip generation of reports

type: boolean

Reference genome related files and options required for the workflow.

Name of iGenomes reference.

type: string

Do not load the iGenomes reference config.

hidden

type: boolean

The base path to the igenomes reference files

hidden

type: string

default: s3://ngi-igenomes/igenomes/

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Less common options for the pipeline, typically set in a config file.

Display version and exit.

hidden

type: boolean

Method used to save pipeline results to output directory.

type: string

Email address for completion summary, only when pipeline fails.

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

Send plain-text email instead of HTML.

hidden

type: boolean

Do not use coloured log outputs.

hidden

type: boolean

Incoming hook URL for messaging service

hidden

type: string

Boolean whether to validate parameters against the schema at runtime

type: boolean

default: true

Base URL or local path to location of pipeline test dataset files

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/

Suffix to add to the trace report filename. Default is the date and time in the format yyyy-MM-dd_HH-mm-ss.

hidden

type: string

Display the help message.

type: boolean,string

Display the full detailed help message.

type: boolean

Display hidden parameters in the help message (only works when —help or —help_full are provided).

type: boolean

nf-core/differentialabundance