BCCDC-PHL Auto Illumina Run QC Check
Indices and tables
auto_illumina_run_qc_check.core
This module includes the core functionality for performing QC checks on an illumina sequencing run.
- auto_illumina_run_qc_check.core.find_run_dirs(config, check_upload_complete=True)
Find sequencing run directories under the ‘run_parent_dirs’ listed in the config.
- Parameters
config (dict[str, object]) – Application config.
check_upload_complete (bool) – Check for presence of ‘upload_complete.json’ file.
- Returns
Run directory. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]
- Return type
Iterator[Optional[dict[str, str]]]
- auto_illumina_run_qc_check.core.get_sum_sample_fastq_file_sizes(run)
Get the sum of all sample fastq file sizes in the run directory.
- Parameters
run (dict[str, str]) – Run directory. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]
- Returns
Sum of all sample fastq file sizes in the run directory.
- Return type
float
- auto_illumina_run_qc_check.core.qc_check(config, run)
Initiate an analysis on one directory of fastq files.
- Parameters
config (dict[str, object]) – Application config.
run (dict[str, str]) – Run directory. Keys: [‘sequencing_run_id’, ‘path’, ‘instrument_type’]
- Returns
None
- Return type
None
- auto_illumina_run_qc_check.core.scan(config: dict[str, object]) Iterator[Optional[dict[str, object]]]
Scanning involves looking for all existing runs and storing them to the database, then looking for all existing symlinks and storing them to the database. At the end of a scan, we should be able to determine which (if any) symlinks need to be created.
- Parameters
config (dict[str, object]) – Application config.
- Returns
A run directory to analyze, or None
- Return type
Iterator[Optional[dict[str, object]]]
auto_illumina_run_qc_check.parsers
This module includes functions for parsing the interop summary and RunParameters.xml files
- auto_illumina_run_qc_check.parsers.parse_interop_summary(summary_lines)
Parse an interop summary csv file into a dict.
- Parameters
summary_lines (list[str]) – A list of lines from an interop summary csv file.
- Returns
A dict containing the parsed interop summary. Keys: [‘ClusterDensity’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentAligned’, ‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘Reads’, ‘LanesByRead’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_lanes_by_read(summary_lines)
Parse a read summary csv file into a list of dicts.
- Parameters
summary_lines (list[str]) – A list of lines from a read summary csv file.
- Returns
A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]
- Return type
list[dict[str, object]]
- auto_illumina_run_qc_check.parsers.parse_read_line(read_line, read_number)
Parse a line from a read summary csv file into a dict.
- Parameters
read_line (str) – A line from a read summary csv file.
read_number (int) – The read number.
- Returns
A dict containing the parsed read line. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_read_summary(summary_lines)
Parse a read summary csv file into a list of dicts.
- Parameters
summary_lines (list[str]) – A list of lines from a read summary csv file.
- Returns
A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
- Return type
list[dict[str, object]]
- auto_illumina_run_qc_check.parsers.parse_read_summary_line(read_summary_line)
Parse a line from a read summary csv file into a dict.
- Parameters
read_summary_line (str) – A line from a read summary csv file.
- Returns
A dict containing the parsed read summary line. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_run_parameters_xml(run_parameters_xml_path, instrument_type)
Parse a run parameters xml file into a dict.
- Parameters
run_parameters_xml_path (str) – The path to the run parameters xml file.
instrument_type (str) – The instrument type. One of [‘miseq’, ‘nextseq’].
- Returns
A dict containing the parsed run parameters. Keys: [‘flowcell_version’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_run_stats(summary_lines)
Parse a run stats csv file into a dict.
- Parameters
summary_lines (list[str]) – A list of lines from a run stats csv file.
- Returns
A dict containing the parsed run stats. Keys: [‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘ErrorRate’, ‘PercentAligned’, ‘Occupancy’, ‘Reads’]
- Return type
dict[str, object]
auto_illumina_run_qc_check.config
This module includes functions for loading the application config file.
- auto_illumina_run_qc_check.parsers.parse_interop_summary(summary_lines)
Parse an interop summary csv file into a dict.
- Parameters
summary_lines (list[str]) – A list of lines from an interop summary csv file.
- Returns
A dict containing the parsed interop summary. Keys: [‘ClusterDensity’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentAligned’, ‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘Reads’, ‘LanesByRead’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_lanes_by_read(summary_lines)
Parse a read summary csv file into a list of dicts.
- Parameters
summary_lines (list[str]) – A list of lines from a read summary csv file.
- Returns
A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]
- Return type
list[dict[str, object]]
- auto_illumina_run_qc_check.parsers.parse_read_line(read_line, read_number)
Parse a line from a read summary csv file into a dict.
- Parameters
read_line (str) – A line from a read summary csv file.
read_number (int) – The read number.
- Returns
A dict containing the parsed read line. Keys: [‘ReadNumber’, ‘LaneNumber’, ‘Surface’, ‘TileCount’, ‘Density’, ‘DensityDeviation’, ‘PercentPf’, ‘PercentPfDeviation’, ‘Reads’, ‘ReadsPf’, ‘PercentGtQ30’, ‘Yield’, ‘CyclesError’, ‘PercentAligned’, ‘PercentAlignedDeviation’, ‘ErrorRate’, ‘ErrorRateDeviation’, ‘ErrorRate35’, ‘ErrorRate35Deviation’, ‘ErrorRate75’, ‘ErrorRate75Deviation’, ‘ErrorRate100’, ‘ErrorRate100Deviation’, ‘IntensityCycle1’, ‘IntensityCycle1Deviation’, ‘PhasingSlope’, ‘PhasingOffset’, ‘PrePhasingSlope’, ‘PrePhasingOffset’, ‘ClusterDensity’, ‘Occupancy’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_read_summary(summary_lines)
Parse a read summary csv file into a list of dicts.
- Parameters
summary_lines (list[str]) – A list of lines from a read summary csv file.
- Returns
A list of dicts containing the parsed read summary. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
- Return type
list[dict[str, object]]
- auto_illumina_run_qc_check.parsers.parse_read_summary_line(read_summary_line)
Parse a line from a read summary csv file into a dict.
- Parameters
read_summary_line (str) – A line from a read summary csv file.
- Returns
A dict containing the parsed read summary line. Keys: [‘ReadNumber’, ‘IsIndexed’, ‘TotalCycles’, ‘YieldTotal’, ‘ProjectedTotalYield’, ‘PercentAligned’, ‘ErrorRate’, ‘IntensityCycle1’, ‘PercentGtQ30’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_run_parameters_xml(run_parameters_xml_path, instrument_type)
Parse a run parameters xml file into a dict.
- Parameters
run_parameters_xml_path (str) – The path to the run parameters xml file.
instrument_type (str) – The instrument type. One of [‘miseq’, ‘nextseq’].
- Returns
A dict containing the parsed run parameters. Keys: [‘flowcell_version’]
- Return type
dict[str, object]
- auto_illumina_run_qc_check.parsers.parse_run_stats(summary_lines)
Parse a run stats csv file into a dict.
- Parameters
summary_lines (list[str]) – A list of lines from a run stats csv file.
- Returns
A dict containing the parsed run stats. Keys: [‘PercentGtQ30’, ‘ProjectedTotalYield’, ‘YieldTotal’, ‘ErrorRate’, ‘PercentAligned’, ‘Occupancy’, ‘Reads’]
- Return type
dict[str, object]