BCCDC-PHL auto-ncov
Indices and tables
auto_ncov.core
This module includes the core functionality for finding input datasets and running analyses.
- auto_ncov.core.analyze_run(config: dict[str, object], run: dict[str, object]) None
Initiate an analysis on one directory of fastq files. We assume that the directory of fastq files is named using a sequencing run ID.
Runs the pipeline as defined in the config, with parameters configured for the run to be analyzed. Skips any analyses that have already been initiated (whether completed or not).
Some pipelines may specify that they depend on the outputs of another through their ‘dependencies’ config. For those pipelines, we confirm that all of the upstream analyses that we depend on are complete, or the analysis will be skipped.
- Parameters
config (dict[str, object]) – Application config. Required keys: [“analysis_output_dir”, “analysis_work_dir”, “pipelines”]. Optional keys: [“send_notification_emails”, “notification_email_addresses”]
run (dict[str, object]) – Sequencing run to analyze. Required keys: [“run_id”, “fastq_directory”, “analysis_parameters”]
- Returns
None
- Return type
NoneType
- auto_ncov.core.check_analysis_dependencies_complete(pipeline: dict[str, object], analysis: dict[str, object], analysis_run_output_dir: str) bool
Check that all of the entries in the pipeline’s dependencies config have completed. If so, return True. Return False otherwise.
Pipeline completion is determined by the presence of an analysis_complete.json file in the analysis output directory.
- Parameters
pipeline (dict[str, object]) –
analysis (dictp[str, object]) –
analysis_run_output_dir (str) –
- Returns
Whether or not all of the pipelines listed in dependencies have completed.
- Return type
bool
- auto_ncov.core.find_fastq_dirs(config, check_symlinks_complete=True) Iterator[Optional[dict[str, object]]]
Iterate over contents of fastq_by_run_dir from config. Identify directories that match the illumina run ID format. If check_symlinks_complete is True, then check that a symlinks_complete.json file exists in the directory. Also consult the list of excluded runs, and exclude any directory whose name appears on that list.
- Parameters
config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”, “excluded_runs”]. Optional keys: [“analyze_runs_in_reverse_order”]
- Returns
A run directory to analyze, or None. If not None, keys are: [“run_id”, “fastq_directory”, “analysis_parameters”]
- Return type
Iterator[Optional[dict[str, object]]]
- auto_ncov.core.scan(config: dict[str, object]) Iterator[Optional[dict[str, object]]]
Scanning involves looking for all existing runs, and passing them along to be analyzed.
- Parameters
config (dict[str, object]) – Application config.
- Returns
A run directory to analyze, or None. If not None, keys are: [“run_id”, “fastq_directory”, “analysis_parameters”]
- Return type
Iterator[Optional[dict[str, object]]]
auto_ncov.config
This module defines the entities to be stored in the database, and their relationships with one another.
- auto_ncov.config.load_config(config_path: str) dict[str, object]
Load the application config file.
- Parameters
config_path (str) – Path to config file.
- Returns
A dictionary containing configuration data.
- Return type
dict[str, object]
auto_ncov.metadata
This module includes methods used to collect metadata necessary for running ncov-tools.
- auto_ncov.metadata.collect_run_metadata(config: dict[str, object], run_id: str) list[dict[str, object]]
Collect the metadata needed by ncov-tools (Ct score, collection date) for a specific run. Metadata is collected from a pre-generated .csv file that includes metadata for all libraries.
- Parameters
config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”, “metadata_file”].
run_id (str) – The identifier for the run whose metadata is to be collected.
- Returns
A list of dictionaries representing metadata for the libraries on the current run. Keys: [“sample”, “ct”, “date”]
- Return type
list[dict[str, object]]
- auto_ncov.metadata.combine_ct_values(metadata: dict[str, dict[str, object]]) dict[str, dict[str, object]]
Take the available ct values and select one based on a pre-defined order of preference.
- Parameters
metadata (dict[str, dict[str, object]]) – Dictionary of all available metadata, indexed by container ID.
- Returns
The same metadata that is passed as input, with an additional ct_combo field on each metadata entry.
- Return type
dict[str, dict[str, object]]
- auto_ncov.metadata.get_run_library_ids(config: dict[str, object], run_id: str) list[str]
Iterate over fastq files in a directory. Identify all library IDs for the fastq files in the run. If an Undetermined fastq file is present, exclude it.
- Parameters
config (dict[str, object]) – Application config. Required keys: [“fastq_by_run_dir”].
- Returns
A list of library IDs for the run.
- Return type
list[str]
- auto_ncov.metadata.load_metadata(config: dict[str, object]) dict[str, dict[str, object]]
Load metadata from a pre-populated metadata csv file.
- Parameters
config (dict[str, object]) – Application config. Required keys: [“metadata_file”].
- Returns
All available metadata, indexed by container ID
- Return type
dict[str, dict[str, object]]
- auto_ncov.metadata.select_run_metadata(all_metadata: dict[str, dict[str, object]], run_library_ids) list[dict[str, object]]
Given all available metadata and a list of library IDs for the current run, select only the metadata for the current run. Any library ID starting with POS or NEG will have null metadata (represented with NA), as these represent positive and negative controls.
- Parameters
all_metadata (dict[str, dict[str, object]]) – Dictionary, where keys are container ID and values are dictionary with keys: [“ct_combo”, “collection_date”]
- Returns
A list of dictionaries representing metadata for the libraries on the current run. Keys: [“sample”, “ct”, “date”]
- Return type
list[dict[str, object]]