BCCDC-PHL auto-ncov

Indices and tables

auto_ncov.core

This module includes the core functionality for finding input datasets and running analyses.

auto_ncov.core.analyze_run(config: dict[str, object], run: dict[str, object]) None

Initiate an analysis on one directory of fastq files. We assume that the directory of fastq files is named using a sequencing run ID.

Runs the pipeline as defined in the config, with parameters configured for the run to be analyzed. Skips any analyses that have already been initiated (whether completed or not).

Some pipelines may specify that they depend on the outputs of another through their ‘dependencies’ config. For those pipelines, we confirm that all of the upstream analyses that we depend on are complete, or the analysis will be skipped.

Parameters
  • config (dict[str, object]) – Application config. Required keys: [“analysis_output_dir”, “analysis_work_dir”, “pipelines”]. Optional keys: [“send_notification_emails”, “notification_email_addresses”]

  • run (dict[str, object]) – Sequencing run to analyze. Required keys: [“run_id”, “fastq_directory”, “analysis_parameters”]

Returns

None

Return type

NoneType

auto_ncov.core.check_analysis_dependencies_complete(pipeline: dict[str, object], analysis: dict[str, object], analysis_run_output_dir: str) bool

Check that all of the entries in the pipeline’s dependencies config have completed. If so, return True. Return False otherwise.

Pipeline completion is determined by the presence of an analysis_complete.json file in the analysis output directory.

Parameters
  • pipeline (dict[str, object]) –

  • analysis (dictp[str, object]) –

  • analysis_run_output_dir (str) –

Returns

Whether or not all of the pipelines listed in dependencies have completed.

Return type

bool

auto_ncov.core.find_fastq_dirs(config, check_symlinks_complete=True) Iterator[Optional[dict[str, object]]]

Iterate over contents of fastq_by_run_dir from config. Identify directories that match the illumina run ID format. If check_symlinks_complete is True, then check that a symlinks_complete.json file exists in the directory. Also consult the list of excluded runs, and exclude any directory whose name appears on that list.

Parameters

config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”, “excluded_runs”]. Optional keys: [“analyze_runs_in_reverse_order”]

Returns

A run directory to analyze, or None. If not None, keys are: [“run_id”, “fastq_directory”, “analysis_parameters”]

Return type

Iterator[Optional[dict[str, object]]]

auto_ncov.core.scan(config: dict[str, object]) Iterator[Optional[dict[str, object]]]

Scanning involves looking for all existing runs, and passing them along to be analyzed.

Parameters

config (dict[str, object]) – Application config.

Returns

A run directory to analyze, or None. If not None, keys are: [“run_id”, “fastq_directory”, “analysis_parameters”]

Return type

Iterator[Optional[dict[str, object]]]

auto_ncov.config

This module defines the entities to be stored in the database, and their relationships with one another.

auto_ncov.config.load_config(config_path: str) dict[str, object]

Load the application config file.

Parameters

config_path (str) – Path to config file.

Returns

A dictionary containing configuration data.

Return type

dict[str, object]

auto_ncov.metadata

This module includes methods used to collect metadata necessary for running ncov-tools.

auto_ncov.metadata.collect_run_metadata(config: dict[str, object], run_id: str) list[dict[str, object]]

Collect the metadata needed by ncov-tools (Ct score, collection date) for a specific run. Metadata is collected from a pre-generated .csv file that includes metadata for all libraries.

Parameters
  • config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”, “metadata_file”].

  • run_id (str) – The identifier for the run whose metadata is to be collected.

Returns

A list of dictionaries representing metadata for the libraries on the current run. Keys: [“sample”, “ct”, “date”]

Return type

list[dict[str, object]]

auto_ncov.metadata.combine_ct_values(metadata: dict[str, dict[str, object]]) dict[str, dict[str, object]]

Take the available ct values and select one based on a pre-defined order of preference.

Parameters

metadata (dict[str, dict[str, object]]) – Dictionary of all available metadata, indexed by container ID.

Returns

The same metadata that is passed as input, with an additional ct_combo field on each metadata entry.

Return type

dict[str, dict[str, object]]

auto_ncov.metadata.get_run_library_ids(config: dict[str, object], run_id: str) list[str]

Iterate over fastq files in a directory. Identify all library IDs for the fastq files in the run. If an Undetermined fastq file is present, exclude it.

Parameters

config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”].

Returns

A list of library IDs for the run.

Return type

list[str]

auto_ncov.metadata.load_metadata(config: dict[str, object]) dict[str, dict[str, object]]

Load metadata from a pre-populated metadata csv file.

Parameters

config (dict[str, object]) – Application config. Required keys: [“metadata_file”].

Returns

All available metadata, indexed by container ID

Return type

dict[str, dict[str, object]]

auto_ncov.metadata.select_run_metadata(all_metadata: dict[str, dict[str, object]], run_library_ids) list[dict[str, object]]

Given all available metadata and a list of library IDs for the current run, select only the metadata for the current run. Any library ID starting with POS or NEG will have null metadata (represented with NA), as these represent positive and negative controls.

Parameters

all_metadata (dict[str, dict[str, object]]) – Dictionary, where keys are container ID and values are dictionary with keys: [“ct_combo”, “collection_date”]

Returns

A list of dictionaries representing metadata for the libraries on the current run. Keys: [“sample”, “ct”, “date”]

Return type

list[dict[str, object]]