BCCDC-PHL Auto Illumina Run QC Check

Indices and tables

auto_illumina_run_qc_check.core

This module includes the core functionality for performing QC checks on an illumina sequencing run.

auto_illumina_run_qc_check.core.find_run_dirs(config, check_upload_complete=True)

Find sequencing run directories under the ‘run_parent_dirs’ listed in the config.

Parameters

config (dict[str, object]) – Application config.
check_upload_complete (bool) – Check for presence of ‘upload_complete.json’ file.

Returns

Run directory. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]

Return type

Iterator[Optional[dict[str, str]]]

auto_illumina_run_qc_check.core.get_sum_sample_fastq_file_sizes(run)

Get the sum of all sample fastq file sizes in the run directory.

Parameters: run (dict[str, str]) – Run directory. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]
Returns: Sum of all sample fastq file sizes in the run directory.
Return type: float

auto_illumina_run_qc_check.core.qc_check(config, run)

Initiate an analysis on one directory of fastq files.

Parameters

config (dict[str, object]) – Application config.
run (dict[str, str]) – Run directory. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]

Returns

None

Return type

None

auto_illumina_run_qc_check.core.scan(config: dict[str, object]) → Iterator[Optional[dict[str, object]]]

Scanning involves looking for all existing runs and storing them to the database, then looking for all existing symlinks and storing them to the database. At the end of a scan, we should be able to determine which (if any) symlinks need to be created.

Parameters: config (dict[str, object]) – Application config.
Returns: A run directory to analyze, or None
Return type: Iterator[Optional[dict[str, object]]]

auto_illumina_run_qc_check.parsers

This module includes functions for parsing the interop summary and RunParameters.xml files

auto_illumina_run_qc_check.parsers.parse_interop_summary(summary_lines)

Parse an interop summary csv file into a dict.

Parameters: summary_lines (list[str]) – A list of lines from an interop summary csv file.
Returns: A dict containing the parsed interop summary. Keys: [‘ClusterDensity’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentAligned’, ‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘Reads’, ‘LanesByRead’]
Return type: dict[str, object]

auto_illumina_run_qc_check.parsers.parse_lanes_by_read(summary_lines)

Parse a read summary csv file into a list of dicts.

Parameters: summary_lines (list[str]) – A list of lines from a read summary csv file.
Returns: A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]
Return type: list[dict[str, object]]

auto_illumina_run_qc_check.parsers.parse_read_line(read_line, read_number)

Parse a line from a read summary csv file into a dict.

Parameters

read_line (str) – A line from a read summary csv file.
read_number (int) – The read number.

Returns

A dict containing the parsed read line. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]

Return type

dict[str, object]

auto_illumina_run_qc_check.parsers.parse_read_summary(summary_lines)

Parse a read summary csv file into a list of dicts.

Parameters: summary_lines (list[str]) – A list of lines from a read summary csv file.
Returns: A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
Return type: list[dict[str, object]]

auto_illumina_run_qc_check.parsers.parse_read_summary_line(read_summary_line)

Parse a line from a read summary csv file into a dict.

Parameters: read_summary_line (str) – A line from a read summary csv file.
Returns: A dict containing the parsed read summary line. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
Return type: dict[str, object]

auto_illumina_run_qc_check.parsers.parse_run_parameters_xml(run_parameters_xml_path, instrument_type)

Parse a run parameters xml file into a dict.

Parameters

run_parameters_xml_path (str) – The path to the run parameters xml file.
instrument_type (str) – The instrument type. One of [‘miseq’, ‘nextseq’].

Returns

A dict containing the parsed run parameters. Keys: [‘flowcell_version’]

Return type

dict[str, object]

auto_illumina_run_qc_check.parsers.parse_run_stats(summary_lines)

Parse a run stats csv file into a dict.

Parameters: summary_lines (list[str]) – A list of lines from a run stats csv file.
Returns: A dict containing the parsed run stats. Keys: [‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘ErrorRate’, ‘PercentAligned’, ‘Occupancy’, ‘Reads’]
Return type: dict[str, object]

auto_illumina_run_qc_check.config

This module includes functions for loading the application config file.

auto_illumina_run_qc_check.parsers.parse_interop_summary(summary_lines)

Parse an interop summary csv file into a dict.

Parameters: summary_lines (list[str]) – A list of lines from an interop summary csv file.
Returns: A dict containing the parsed interop summary. Keys: [‘ClusterDensity’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentAligned’, ‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘Reads’, ‘LanesByRead’]
Return type: dict[str, object]

auto_illumina_run_qc_check.parsers.parse_lanes_by_read(summary_lines)

Parse a read summary csv file into a list of dicts.

Parameters: summary_lines (list[str]) – A list of lines from a read summary csv file.
Returns: A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]
Return type: list[dict[str, object]]

auto_illumina_run_qc_check.parsers.parse_read_line(read_line, read_number)

Parse a line from a read summary csv file into a dict.

Parameters

read_line (str) – A line from a read summary csv file.
read_number (int) – The read number.

Returns

A dict containing the parsed read line. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]

Return type

dict[str, object]

auto_illumina_run_qc_check.parsers.parse_read_summary(summary_lines)

Parse a read summary csv file into a list of dicts.

Parameters: summary_lines (list[str]) – A list of lines from a read summary csv file.
Returns: A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
Return type: list[dict[str, object]]

auto_illumina_run_qc_check.parsers.parse_read_summary_line(read_summary_line)

Parse a line from a read summary csv file into a dict.

Parameters: read_summary_line (str) – A line from a read summary csv file.
Returns: A dict containing the parsed read summary line. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
Return type: dict[str, object]

auto_illumina_run_qc_check.parsers.parse_run_parameters_xml(run_parameters_xml_path, instrument_type)

Parse a run parameters xml file into a dict.

Parameters

run_parameters_xml_path (str) – The path to the run parameters xml file.
instrument_type (str) – The instrument type. One of [‘miseq’, ‘nextseq’].

Returns

A dict containing the parsed run parameters. Keys: [‘flowcell_version’]

Return type

dict[str, object]

auto_illumina_run_qc_check.parsers.parse_run_stats(summary_lines)

Parse a run stats csv file into a dict.

Parameters: summary_lines (list[str]) – A list of lines from a run stats csv file.
Returns: A dict containing the parsed run stats. Keys: [‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘ErrorRate’, ‘PercentAligned’, ‘Occupancy’, ‘Reads’]
Return type: dict[str, object]