BCCDC-PHL auto-ncov

Indices and tables

auto_ncov.core

This module includes the core functionality for finding input datasets and running analyses.

auto_ncov.core.analyze_run(config: dict[str, object], run: dict[str, object]) → None

Initiate an analysis on one directory of fastq files. We assume that the directory of fastq files is named using a sequencing run ID.

Runs the pipeline as defined in the config, with parameters configured for the run to be analyzed. Skips any analyses that have already been initiated (whether completed or not).

Some pipelines may specify that they depend on the outputs of another through their ‘dependencies’ config. For those pipelines, we confirm that all of the upstream analyses that we depend on are complete, or the analysis will be skipped.

Parameters

config (dict[str, object]) – Application config. Required keys: [“analysis_output_dir”, “analysis_work_dir”, “pipelines”]. Optional keys: [“send_notification_emails”, “notification_email_addresses”]
run (dict[str, object]) – Sequencing run to analyze. Required keys: [“run_id”, “fastq_directory”, “analysis_parameters”]

Returns

None

Return type

NoneType

auto_ncov.core.check_analysis_dependencies_complete(pipeline: dict[str, object], analysis: dict[str, object], analysis_run_output_dir: str) → bool

Check that all of the entries in the pipeline’s dependencies config have completed. If so, return True. Return False otherwise.

Pipeline completion is determined by the presence of an analysis_complete.json file in the analysis output directory.

Parameters

pipeline (dict[str, object]) –
analysis (dictp[str, object]) –
analysis_run_output_dir (str) –

Returns

Whether or not all of the pipelines listed in dependencies have completed.

Return type

bool

auto_ncov.core.find_fastq_dirs(config, check_symlinks_complete=True) → Iterator[Optional[dict[str, object]]]

Iterate over contents of fastq_by_run_dir from config. Identify directories that match the illumina run ID format. If check_symlinks_complete is True, then check that a symlinks_complete.json file exists in the directory. Also consult the list of excluded runs, and exclude any directory whose name appears on that list.

Parameters: config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”, “excluded_runs”]. Optional keys: [“analyze_runs_in_reverse_order”]
Returns: A run directory to analyze, or None. If not None, keys are: [“run_id”, “fastq_directory”, “analysis_parameters”]
Return type: Iterator[Optional[dict[str, object]]]

auto_ncov.core.scan(config: dict[str, object]) → Iterator[Optional[dict[str, object]]]

Scanning involves looking for all existing runs, and passing them along to be analyzed.

Parameters: config (dict[str, object]) – Application config.
Returns: A run directory to analyze, or None. If not None, keys are: [“run_id”, “fastq_directory”, “analysis_parameters”]
Return type: Iterator[Optional[dict[str, object]]]

auto_ncov.config

This module defines the entities to be stored in the database, and their relationships with one another.

auto_ncov.config.load_config(config_path: str) → dict[str, object]

Load the application config file.

Parameters: config_path (str) – Path to config file.
Returns: A dictionary containing configuration data.
Return type: dict[str, object]

auto_ncov.metadata

This module includes methods used to collect metadata necessary for running ncov-tools.

auto_ncov.metadata.collect_run_metadata(config: dict[str, object], run_id: str) → list[dict[str, object]]

Collect the metadata needed by ncov-tools (Ct score, collection date) for a specific run. Metadata is collected from a pre-generated .csv file that includes metadata for all libraries.

Parameters

config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”, “metadata_file”].
run_id (str) – The identifier for the run whose metadata is to be collected.

Returns

A list of dictionaries representing metadata for the libraries on the current run. Keys: [“sample”, “ct”, “date”]

Return type

list[dict[str, object]]

auto_ncov.metadata.combine_ct_values(metadata: dict[str, dict[str, object]]) → dict[str, dict[str, object]]

Take the available ct values and select one based on a pre-defined order of preference.

Parameters: metadata (dict[str, dict[str, object]]) – Dictionary of all available metadata, indexed by container ID.
Returns: The same metadata that is passed as input, with an additional ct_combo field on each metadata entry.
Return type: dict[str, dict[str, object]]

auto_ncov.metadata.get_run_library_ids(config: dict[str, object], run_id: str) → list[str]

Iterate over fastq files in a directory. Identify all library IDs for the fastq files in the run. If an Undetermined fastq file is present, exclude it.

Parameters: config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”].
Returns: A list of library IDs for the run.
Return type: list[str]

auto_ncov.metadata.load_metadata(config: dict[str, object]) → dict[str, dict[str, object]]

Load metadata from a pre-populated metadata csv file.

Parameters: config (dict[str, object]) – Application config. Required keys: [“metadata_file”].
Returns: All available metadata, indexed by container ID
Return type: dict[str, dict[str, object]]

auto_ncov.metadata.select_run_metadata(all_metadata: dict[str, dict[str, object]], run_library_ids) → list[dict[str, object]]

Given all available metadata and a list of library IDs for the current run, select only the metadata for the current run. Any library ID starting with POS or NEG will have null metadata (represented with NA), as these represent positive and negative controls.

Parameters: all_metadata (dict[str, dict[str, object]]) – Dictionary, where keys are container ID and values are dictionary with keys: [“ct_combo”, “collection_date”]
Returns: A list of dictionaries representing metadata for the libraries on the current run. Keys: [“sample”, “ct”, “date”]
Return type: list[dict[str, object]]