BCCDC-PHL Auto IRIDA Azure Upload

Indices and tables

auto_irida_azure_upload.core

Core functionality.

auto_irida_azure_upload.core.downsample_reads(config, run_id, samplesheet)

Downsample reads.

Parameters

config (dict[str, object]) – Application config.
run_id (str) – Sequencing run ID.
samplesheet – samplesheet: [ { ID: str, R1: str, R2: str , GENOME_SIZE: str, COVERAGE: str} ]

auto_irida_azure_upload.core.find_fastq(run, library_id, read_type)

Find the fastq file for a specific library on a specific run.

Parameters

run (dict[str, str]) – Sequencing run. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]
library_id (str) – Library ID
read_type (str) – Read type (‘R1’ or ‘R2’)

Returns

Path to fastq file

Return type

Optional[str]

auto_irida_azure_upload.core.find_run_dirs(config, check_upload_complete=True)

Find sequencing run directories under the ‘run_parent_dirs’ listed in the config.

Parameters

config (dict[str, object]) – Application config.
check_upload_complete (bool) – Check for presence of ‘upload_complete.json’ file.

Returns

Run directory. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]

Return type

Iterator[Optional[dict[str, str]]]

auto_irida_azure_upload.core.prepare_downsampling_samplesheet(config, run)

Prepare a SampleSheet to use for downsampling.

Parameters

config (dict[str, object]) – Application config.
run (dict[str, str]) – Sequencing run to prepare SampleSheet.csv file for. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]

Returns

Downsampling samplesheet [ { ID: str, R1: str, R2: str, GENOME_SIZE: str, COVERAGE: str } ]

Return type

list[dict[str, str]]

auto_irida_azure_upload.core.prepare_samplelist(config, run, downsampled_reads={})

Prepare a SampleList for a specific run.

Parameters

config (dict[str, object]) – Application config.
run (dict[str, str]) – Sequencing run to prepare SampleList.csv file for. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]

Returns

List of samples to upload. Keys: [‘Sample_Name’, ‘Project_ID’, ‘File_Forward’, ‘File_Forward_Absolute_Path’, ‘File_Reverse’, ‘’File_Reverse_Absolute_Path’]

Return type

list[dict[str, str]]

auto_irida_azure_upload.core.prepare_upload_dir(config, run, sample_list)

Prepare upload directory for run.

Parameters

config (dict[str, object]) –
run (dict[str, str]) – Sequencing run to prepare upload directory for. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]
sample_list (list[dict[str, str]]) – List of samples to upload. Keys: [‘Sample_Name’, ‘Project_ID’, ‘File_Forward’, ‘File_Forward_Absolute_Path’, ‘File_Reverse’, ‘’File_Reverse_Absolute_Path’]

Returns

Upload dir path

Return type

path

auto_irida_azure_upload.core.scan(config: dict[str, object]) → Iterator[Optional[dict[str, object]]]

Scanning involves looking for all existing runs and storing them to the database, then looking for all existing symlinks and storing them to the database. At the end of a scan, we should be able to determine which (if any) symlinks need to be created.

Parameters: config (dict[str, object]) – Application config.
Returns: A run directory to analyze, or None
Return type: Iterator[Optional[dict[str, object]]]

auto_irida_azure_upload.core.upload_run(config, run, upload_dir): Initiate an analysis on one directory of fastq files.

auto_irida_azure_upload.samplesheet

Functions for parsing Illumina SampleSheet files.

auto_irida_azure_upload.samplesheet.choose_samplesheet_to_parse(samplesheet_paths: list[str], instrument_type: str, run_id: str)

A run directory may have multiple SampleSheet.csv files in it. Choose only one to parse.

Parameters

samplesheet_paths (list[str]) – List of paths to SampleSheet.csv files
instrument_type (str) – Instrument type, should be one of: “miseq”, “nextseq”
run_id (str) – Sequencing run ID

auto_irida_azure_upload.samplesheet.find_samplesheets(run_dir, instrument_type)

Find SampleSheets in run directory.

Parameters

run_dir (str) – Path to sequencing run directory
instrument_type (str) – Instrument type (‘miseq’ or ‘nextseq’)

auto_irida_azure_upload.samplesheet.parse_samplesheet(samplesheet_path: str, instrument_type: str) → dict[str, object]

Parameters

samplesheet_path (str) –
instrument_type (str) – One of miseq or nextseq

auto_irida_azure_upload.samplesheet.parse_samplesheet_miseq(samplesheet_path)

Parse a MiSeq SampleSheet. Returns None if parsing fails.

Parameters: samplesheet_path (str) – Path to SampleSheet file.
Returns: Parsed SampleSheet
Return type: Optional[dict[str, object]]

auto_irida_azure_upload.samplesheet.parse_samplesheet_nextseq(samplesheet_path)

auto_irida_azure_upload.util

This module includes some useful utility functions that may be useful in other modules.