Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add calibration calculators #2609

Open
wants to merge 111 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 98 commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
58d868c
added stats extractor parent component
Apr 29, 2024
edfccc6
added stats extractor based on sigma clipping
Apr 29, 2024
d19168c
added cut of outliers
Apr 30, 2024
49f0f8a
update docs
May 2, 2024
9035fb4
formatted with black
May 2, 2024
8a0e89f
added pass for __call__ function
May 2, 2024
82b29e9
added unit tests
May 3, 2024
1805cb2
added changelog
May 3, 2024
a4cea6a
fix lint
May 3, 2024
16e5920
Small commit for prototyping
May 7, 2024
63d1774
Removed unneeded functions
May 10, 2024
5768ed7
Merge branch 'stats_extractor' of https://github.com/cta-observatory/…
May 23, 2024
e6ba177
I altered the class variables to th evariance statistics extractor
May 23, 2024
ddf342b
added a container for mean variance images and fixed docustring
May 24, 2024
0beb975
I changed the container type for the StarVarianceExtractor
May 27, 2024
b4df919
fix pylint
May 31, 2024
3483fd4
change __call__() to _extract()
May 31, 2024
5bbedda
minor renaming
May 31, 2024
14a0910
use pytest.fixture for Extractors
May 31, 2024
f126748
reduce duplicated code of the call function
May 31, 2024
0ec81b9
edit description of StatisticsContainer
TjarkMiener Jun 12, 2024
3910c22
added feature to shift the extraction sequence
TjarkMiener Jun 12, 2024
5c6595b
fix boundary case for the last chunk
TjarkMiener Jun 12, 2024
bc65984
I made prototypes for the CalibrationCalculators
Jun 12, 2024
d7a65a3
I made PSFModel a generic class
Jun 12, 2024
ba00bf3
fix tests
TjarkMiener Jun 12, 2024
d3aae32
fix ruff
TjarkMiener Jun 12, 2024
825b25b
I fixed some variable names
Jun 12, 2024
a5b878d
Added a method for calibrating variance images
Jun 13, 2024
e8ebdd8
Merge branch 'stats_extractor' into CalibrationCalculators
Jun 13, 2024
78481cc
Commit before push for tjark
Jun 14, 2024
41a4246
added StatisticsCalculator
TjarkMiener Jun 28, 2024
010aea6
make faulty_pixels_threshold and chunk_shift as traits
TjarkMiener Jul 5, 2024
26ff8a4
added stats extractor parent component
Apr 29, 2024
d142995
added stats extractor based on sigma clipping
Apr 29, 2024
0737b7f
added cut of outliers
Apr 30, 2024
225346b
update docs
May 2, 2024
74c72bb
formatted with black
May 2, 2024
faad542
added pass for __call__ function
May 2, 2024
9867431
Small commit for prototyping
May 7, 2024
3a85ffa
Removed unneeded functions
May 10, 2024
69ebd1c
added unit tests
May 3, 2024
96b2dbb
added changelog
May 3, 2024
a007773
fix lint
May 3, 2024
96fe563
I altered the class variables to th evariance statistics extractor
May 23, 2024
67d34c4
added a container for mean variance images and fixed docustring
May 24, 2024
f1429da
I changed the container type for the StarVarianceExtractor
May 27, 2024
c9ec02b
fix pylint
May 31, 2024
444e370
change __call__() to _extract()
May 31, 2024
1c4cadb
minor renaming
May 31, 2024
0ebb464
use pytest.fixture for Extractors
May 31, 2024
bbbd891
reduce duplicated code of the call function
May 31, 2024
5deef74
I made prototypes for the CalibrationCalculators
Jun 12, 2024
f09bbe1
I made PSFModel a generic class
Jun 12, 2024
b8f0a4f
I fixed some variable names
Jun 12, 2024
39fca21
Added a method for calibrating variance images
Jun 13, 2024
b114638
edit description of StatisticsContainer
TjarkMiener Jun 12, 2024
5d18f61
added feature to shift the extraction sequence
TjarkMiener Jun 12, 2024
4df8c94
fix boundary case for the last chunk
TjarkMiener Jun 12, 2024
86eb15c
fix tests
TjarkMiener Jun 12, 2024
6200653
fix ruff
TjarkMiener Jun 12, 2024
4f604c5
Commit before push for tjark
Jun 14, 2024
6dfce15
added StatisticsCalculator
TjarkMiener Jun 28, 2024
8e873d3
make faulty_pixels_threshold and chunk_shift as traits
TjarkMiener Jul 5, 2024
bf36b90
Merge branch 'CalibrationCalculators' of https://github.com/cta-obser…
Aug 7, 2024
f7d3223
solved merge conflicts
TjarkMiener Aug 7, 2024
91812da
I made prototypes for the CalibrationCalculators
Jun 12, 2024
326c202
I made PSFModel a generic class
Jun 12, 2024
fa029f7
I fixed some variable names
Jun 12, 2024
7c90e0f
Added a method for calibrating variance images
Jun 13, 2024
8c44a42
Commit before push for tjark
Jun 14, 2024
bff611a
added StatisticsCalculator
TjarkMiener Jun 28, 2024
d85dd50
make faulty_pixels_threshold and chunk_shift as traits
TjarkMiener Jul 5, 2024
c522434
adding interpolator and Merge branch 'CalibrationCalculators' of http…
Aug 8, 2024
9192790
Removed Pointing Calculator
Aug 20, 2024
aecc5dc
Removing PointingCalculator, PSF model and interpolators
Aug 20, 2024
e3113ab
implement the StatisticsCalculator
TjarkMiener Aug 23, 2024
b9dc929
removed the helper function to get the start and end slices
TjarkMiener Aug 24, 2024
ef920da
polish docstrings
TjarkMiener Aug 25, 2024
9bfc16a
further polishing of docstrings
TjarkMiener Aug 25, 2024
aee5c10
fix typo
TjarkMiener Aug 25, 2024
1484eca
move check of config settings before loading of data
TjarkMiener Aug 25, 2024
601cbb4
moved Interpolator outside
TjarkMiener Aug 25, 2024
0347153
removed Interpolator artifacts
TjarkMiener Aug 25, 2024
b2208d5
removed reading part with TableLoader
TjarkMiener Aug 25, 2024
61936e5
removed writing part
TjarkMiener Aug 25, 2024
52ecd8a
add unit tests for calculators
TjarkMiener Aug 26, 2024
4ffd39e
add unit tests for calculators
TjarkMiener Aug 26, 2024
c404a4a
split __call__ function into two function for the first and second pass
TjarkMiener Aug 26, 2024
a59664e
fix docs and add changelog
TjarkMiener Aug 26, 2024
0ed515d
removed check for any faulty chunk
TjarkMiener Aug 27, 2024
48c686b
removed base class CalibrationCalculator and only have one Statistics…
TjarkMiener Aug 27, 2024
3bae046
fix fstring in logging
TjarkMiener Aug 28, 2024
aa5666a
bug fix
TjarkMiener Aug 28, 2024
ef53ebb
removed print
TjarkMiener Aug 28, 2024
3ca7252
removed me from CODEOWNERS
TjarkMiener Sep 3, 2024
4af4e74
Merge branch 'main' into CalibrationCalculators
TjarkMiener Sep 23, 2024
1e684b1
renamed to PixelStatisticsCalculator
TjarkMiener Oct 8, 2024
0bd7702
remove option to pass stats aggregator in constructor
TjarkMiener Oct 15, 2024
83ee522
rename n_pix to n_pixels
TjarkMiener Oct 15, 2024
6a29808
factored out _find_outliers()
TjarkMiener Oct 15, 2024
55451d2
use fraction instead of percentage for faulty_pixels_threshold
TjarkMiener Oct 15, 2024
5b8ed0b
combine outlier mask for two gains in a more readable way
TjarkMiener Oct 15, 2024
6e0d00a
update tests to set everything from the config
TjarkMiener Oct 15, 2024
fea13eb
remove redundant import
TjarkMiener Oct 15, 2024
1a99eb4
use np.flatnonzero instead of np.where
TjarkMiener Oct 15, 2024
e8e7f37
polish docstrings
TjarkMiener Oct 15, 2024
bc4fac9
also store the outlier mask for each detector
TjarkMiener Oct 15, 2024
88977ed
fix ruff formatting
TjarkMiener Oct 15, 2024
9a0914d
validate outlier_detector_list trait
TjarkMiener Oct 16, 2024
f300846
rename faulty_pixels_threshold to faulty_pixels_fraction
TjarkMiener Oct 17, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/api-reference/monitoring/calculator.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. _calibration_calculator:

**********************
Calibration Calculator
**********************


Reference/API
=============

.. automodapi:: ctapipe.monitoring.calculator
3 changes: 2 additions & 1 deletion docs/api-reference/monitoring/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Monitoring data are time-series used to monitor the status or quality of hardwar

This module provides some code to help to generate monitoring data from processed event data, particularly for the purposes of calibration and data quality assessment.

Currently, only code related to :ref:`stats_aggregator` and :ref:`outlier_detector` is implemented here.
Code related to :ref:`stats_aggregator`, :ref:`calibration_calculator`, and :ref:`outlier_detector` is implemented here.


Submodules
Expand All @@ -20,6 +20,7 @@ Submodules
:maxdepth: 1

aggregator
calculator
interpolation
outlier

Expand Down
1 change: 1 addition & 0 deletions docs/changes/2609.features.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add calibration calculators which aggregates statistics, detects outliers, handles faulty data chunks.
1 change: 1 addition & 0 deletions src/ctapipe/calib/camera/calibrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
Definition of the `CameraCalibrator` class, providing all steps needed to apply
calibration and image extraction, as well as supporting algorithms.
"""

from functools import cache

import astropy.units as u
Expand Down
6 changes: 3 additions & 3 deletions src/ctapipe/monitoring/aggregator.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,16 +57,16 @@ def __call__(
and call the relevant function of the particular aggregator to compute aggregated statistic values.
The chunks are generated in a way that ensures they do not overflow the bounds of the table.
- If ``chunk_shift`` is None, chunks will not overlap, but the last chunk is ensured to be
of size `chunk_size`, even if it means the last two chunks will overlap.
of size ``chunk_size``, even if it means the last two chunks will overlap.
- If ``chunk_shift`` is provided, it will determine the number of samples to shift between the start
of consecutive chunks resulting in an overlap of chunks. Chunks that overflows the bounds
of the table are not considered.

Parameters
----------
table : astropy.table.Table
table with images of shape (n_images, n_channels, n_pix)
and timestamps of shape (n_images, )
table with images of shape (n_images, n_channels, n_pix), event IDs and
timestamps of shape (n_images, )
masked_pixels_of_sample : ndarray, optional
boolean array of masked pixels of shape (n_pix, ) that are not available for processing
chunk_shift : int, optional
Expand Down
327 changes: 327 additions & 0 deletions src/ctapipe/monitoring/calculator.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
"""
Definition of the ``PixelStatisticsCalculator`` class, providing all steps needed to
calculate the montoring data for the camera calibration.
"""

import numpy as np
from astropy.table import Table, vstack

from ctapipe.core import TelescopeComponent
from ctapipe.core.traits import (
ComponentName,
Dict,
Float,
Int,
List,
TelescopeParameter,
)
from ctapipe.monitoring.aggregator import StatisticsAggregator
from ctapipe.monitoring.outlier import OutlierDetector

__all__ = [
"PixelStatisticsCalculator",
]


class PixelStatisticsCalculator(TelescopeComponent):
"""
Component to calculate statistics from calibration events.

The ``PixelStatisticsCalculator`` is responsible for calculating various statistics from
calibration events, such as pedestal and flat-field data. It aggregates statistics,
detects outliers, and handles faulty data periods.
This class holds two functions to conduct two different passes over the data with and without
overlapping aggregation chunks. The first pass is conducted with non-overlapping chunks,
while overlapping chunks can be set by the ``chunk_shift`` parameter for the second pass.
The second pass over the data is only conducted in regions of trouble with a high percentage
of faulty pixels exceeding the threshold ``faulty_pixels_threshold``.
"""

stats_aggregator_type = TelescopeParameter(
trait=ComponentName(
StatisticsAggregator, default_value="SigmaClippingAggregator"
),
default_value="SigmaClippingAggregator",
help="Name of the StatisticsAggregator subclass to be used.",
).tag(config=True)

outlier_detector_list = List(
trait=Dict(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a sensible default configuration that actually performs outlier detection?

The default value seems to be "no outlier detection".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This component is used to aggregate the stats for the pedestal, flatfield, timecalibration, and pedestalvariance. Each of this aggregation has sensible distinct default configuration. Should we prioritize one of the aggregation over the others and set the default config for that aggregation?

default_value=None,
allow_none=True,
help=(
"List of dicts containing the name of the OutlierDetector subclass to be used, "
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
"the aggregated statistic value to which the detector should be applied, "
"and the validity range of the detector. "
"E.g. ``[{'apply_to': 'std', 'name': 'RangeOutlierDetector', 'validity_range': [2.0, 8.0]},]``."
),
).tag(config=True)

chunk_shift = Int(
default_value=None,
allow_none=True,
help=(
"Number of samples to shift the aggregation chunk for the calculation "
"of the statistical values. Only used in the second_pass(), since the "
"first_pass() is conducted with non-overlapping chunks (chunk_shift=None)."
),
).tag(config=True)

faulty_pixels_threshold = Float(
default_value=10.0,
allow_none=True,
help=(
"Threshold in percentage of faulty pixels over the camera "
"to identify regions of trouble."
),
).tag(config=True)

def __init__(
self,
subarray,
config=None,
parent=None,
stats_aggregator=None,
**kwargs,
):
"""
Parameters
----------
subarray: ctapipe.instrument.SubarrayDescription
Description of the subarray. Provides information about the
camera which are useful in calibration. Also required for
configuring the TelescopeParameter traitlets.
config: traitlets.loader.Config
Configuration specified by config file or cmdline arguments.
Used to set traitlet values.
This is mutually exclusive with passing a ``parent``.
parent: ctapipe.core.Component or ctapipe.core.Tool
Parent of this component in the configuration hierarchy,
this is mutually exclusive with passing ``config``
stats_aggregator: ctapipe.monitoring.aggregator.StatisticsAggregator
The ``StatisticsAggregator`` to use. If None, the default via the
configuration system will be constructed.
"""
super().__init__(subarray=subarray, config=config, parent=parent, **kwargs)
self.subarray = subarray

# Initialize the instances of StatisticsAggregator
self.stats_aggregator = {}
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
if stats_aggregator is None:
for _, _, name in self.stats_aggregator_type:
self.stats_aggregator[name] = StatisticsAggregator.from_name(
name, subarray=self.subarray, parent=self
)
else:
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
name = stats_aggregator.__class__.__name__
self.stats_aggregator_type = [("type", "*", name)]
self.stats_aggregator[name] = stats_aggregator

# Initialize the instances of OutlierDetector
self.outlier_detectors = {}
if self.outlier_detector_list is not None:
for outlier_detector in self.outlier_detector_list:
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
self.outlier_detectors[
outlier_detector["apply_to"]
] = OutlierDetector.from_name(
name=outlier_detector["name"],
validity_range=outlier_detector["validity_range"],
subarray=self.subarray,
parent=self,
)

def first_pass(
self,
table,
tel_id,
masked_pixels_of_sample=None,
col_name="image",
) -> Table:
"""
Calculate the monitoring data for a given set of events with non-overlapping aggregation chunks.

This method performs the first pass over the provided data table to calculate
various statistics for calibration purposes. The statistics are aggregated with
non-overlapping chunks (``chunk_shift`` set to None), and faulty pixels are detected
using a list of outlier detectors.


Parameters
----------
table : astropy.table.Table
DL1-like table with images of shape (n_images, n_channels, n_pix), event IDs and
timestamps of shape (n_images, )
tel_id : int
Telescope ID for which the calibration is being performed
masked_pixels_of_sample : ndarray, optional
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
Boolean array of masked pixels of shape (n_pix, ) that are not available for processing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

n_pixels? Or (n_channels, n_pixels)? In principle, high gain and low gain can be independently fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please use n_pixels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boolean mask corresponds to hardware failure, so i think it should be (n_pixels,). We treat high and low gain independently later on software level when aggregating the stats for each type of calibration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the hardware of the two channels is at least partly independent, at least in LST (same PMT but e.g. different DRS4 chips). So they can also fail independently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the shape is (n_channels, n_pixels) as you say. It's propagated to the aggregator (np.ma.array(images, mask=masked_pixels_of_sample)), where masked_pixels_of_sample should have the same shape as images (which is of shape (n_channels, n_pixels)). I will update all the docstrings accordingly.

With my current setup using lstchain-data_r0_to_dl1 script (for writing R1 files for calib events) and ctapipe-process (for integrating and extracting DL1a images for calib events), the information about broken pixels due to hardware failure is lost. Once we write in ACADA DL0 files (cosmic and calib events) we should have access to this information.

col_name : str
Column name in the table from which the statistics will be aggregated

Returns
-------
astropy.table.Table
Table containing the aggregated statistics, their outlier masks, and the validity of the chunks
"""
# Get the aggregator
aggregator = self.stats_aggregator[self.stats_aggregator_type.tel[tel_id]]
# Pass through the whole provided dl1 table
aggregated_stats = aggregator(
table=table,
masked_pixels_of_sample=masked_pixels_of_sample,
col_name=col_name,
chunk_shift=None,
)
# Detect faulty pixels with multiple instances of ``OutlierDetector``
outlier_mask = np.zeros_like(aggregated_stats["mean"], dtype=bool)
for aggregated_val, outlier_detector in self.outlier_detectors.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should store a bitmask / boolean array of which of the detectors rejected the sample as outlier?

Like this, you cannot really understand why an event was rejected.

So maybe, compute all masks for all detectors in an nd array, then apply or in the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not reject here on event-level but only on pixel-level. Currently we have for each calibration type (pedestal, flatfield, timecalibration, pedestalvariance) an outlier mask with shape of (n_channels, n_pixels). Storing per detectors for each calibration type seems a little bit heavy and would make the downstream code a little bit more complex. I would only add this if that is really needed for the monitoring of the data quality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not saying that you should use that mask for the analysis. Just for understanding why a pixel was deemed unusable. Which is very important information e.g. to distinguish a hardware error from just reduced HV due to a star.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, makes sense! I just pushed a commit. Does it look good to you?

outlier_mask = np.logical_or(
outlier_mask,
outlier_detector(aggregated_stats[aggregated_val]),
)
# Add the outlier mask to the aggregated statistics
aggregated_stats["outlier_mask"] = outlier_mask
# Get valid chunks and add them to the aggregated statistics
aggregated_stats["is_valid"] = self._get_valid_chunks(outlier_mask)
return aggregated_stats

def second_pass(
self,
table,
valid_chunks,
tel_id,
masked_pixels_of_sample=None,
col_name="image",
) -> Table:
"""
Conduct a second pass over the data to refine the statistics in regions with a high percentage of faulty pixels.

This method performs a second pass over the data with a refined shift of the chunk in regions where a high percentage
of faulty pixels were detected during the first pass. Note: Multiple first passes of different calibration events are
performed which may lead to different identification of faulty chunks in rare cases. Therefore a joined list of faulty
chunks is recommended to be passed to the second pass(es) if those different passes use the same ``chunk_size``.

Parameters
----------
table : astropy.table.Table
DL1-like table with images of shape (n_images, n_channels, n_pix), event IDs and timestamps of shape (n_images, ).
valid_chunks : ndarray
Boolean array indicating the validity of each chunk from the first pass.
Note: This boolean array can be a ``logical_and`` from multiple first passes of different calibration events.
tel_id : int
Telescope ID for which the calibration is being performed.
masked_pixels_of_sample : ndarray, optional
Boolean array of masked pixels of shape (n_pix, ) that are not available for processing.
col_name : str
Column name in the table from which the statistics will be aggregated.

Returns
-------
astropy.table.Table
Table containing the aggregated statistics after the second pass, their outlier masks, and the validity of the chunks.
"""
# Check if the chunk_shift is set for the second pass
if self.chunk_shift is None:
raise ValueError(
"chunk_shift must be set if second pass over the data is requested"
)
# Check if at least one chunk is faulty
if np.all(valid_chunks):
raise ValueError(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really an error? Or should it just return as a no-op?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The check is needed because if all chunks are valid, an empty list would be vstack later on, which results an error without a proper message. I would prefer to break here and put a check for the validity of chunks outside this function, to avoid having multiple returns in one function. I find it really hard to follow the logic of a function with multiple returns (maybe just a personal taste). If you insists I can modify it to return a no-op. However, if everything went well in the first_pass, the second_pass should not be called in my opinion.

"All chunks are valid. The second pass over the data is redundant."
)
# Get the aggregator
aggregator = self.stats_aggregator[self.stats_aggregator_type.tel[tel_id]]
# Conduct a second pass over the data
aggregated_stats_secondpass = []
faulty_chunks_indices = np.where(~valid_chunks)[0]
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
for index in faulty_chunks_indices:
# Log information of the faulty chunks
self.log.info(
"Faulty chunk detected in the first pass at index '%s'.", index
)
# Calculate the start of the slice depending on whether the previous chunk was faulty or not
slice_start = (
aggregator.chunk_size * index
if index - 1 in faulty_chunks_indices
else aggregator.chunk_size * (index - 1)
)
# Set the start of the slice to the first element of the dl1 table if out of bound
# and add one ``chunk_shift``.
slice_start = max(0, slice_start) + self.chunk_shift
# Set the end of the slice to the last element of the dl1 table if out of bound
# and subtract one ``chunk_shift``.
slice_end = min(len(table) - 1, aggregator.chunk_size * (index + 2)) - (
self.chunk_shift - 1
)
# Slice the dl1 table according to the previously calculated start and end.
table_sliced = table[slice_start:slice_end]
# Run the stats aggregator on the sliced dl1 table with a chunk_shift
# to sample the period of trouble (carflashes etc.) as effectively as possible.
# Checking for the length of the sliced table to be greater than the ``chunk_size``
# since it can be smaller if the last two chunks are faulty. Note: The two last chunks
# can be overlapping during the first pass, so we simply ignore them if there are faulty.
if len(table_sliced) > aggregator.chunk_size:
aggregated_stats_secondpass.append(
aggregator(
table=table_sliced,
masked_pixels_of_sample=masked_pixels_of_sample,
col_name=col_name,
chunk_shift=self.chunk_shift,
)
)
# Stack the aggregated statistics of each faulty chunk
aggregated_stats_secondpass = vstack(aggregated_stats_secondpass)
# Detect faulty pixels with multiple instances of OutlierDetector of the second pass
outlier_mask_secondpass = np.zeros_like(
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
aggregated_stats_secondpass["mean"], dtype=bool
)
for (
aggregated_val,
outlier_detector,
) in self.outlier_detectors.items():
outlier_mask_secondpass = np.logical_or(
outlier_mask_secondpass,
outlier_detector(aggregated_stats_secondpass[aggregated_val]),
)
# Add the outlier mask to the aggregated statistics
aggregated_stats_secondpass["outlier_mask"] = outlier_mask_secondpass
aggregated_stats_secondpass["is_valid"] = self._get_valid_chunks(
outlier_mask_secondpass
)
return aggregated_stats_secondpass

def _get_valid_chunks(self, outlier_mask):
"""
Identify valid chunks based on the outlier mask.

This method processes the outlier mask to determine which chunks of data
are considered valid or faulty. A chunk is marked as faulty if the percentage
of outlier pixels exceeds a predefined threshold ``faulty_pixels_threshold``.

Parameters
----------
outlier_mask : numpy.ndarray
Boolean array indicating outlier pixels. The shape of the array should
match the shape of the aggregated statistics.

Returns
-------
numpy.ndarray
Boolean array where each element indicates whether the corresponding
chunk is valid (True) or faulty (False).
"""
# Check if the camera has two gain channels
if outlier_mask.shape[1] == 2:
# Combine the outlier mask of both gain channels
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
outlier_mask = np.logical_or(
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
outlier_mask[:, 0, :],
outlier_mask[:, 1, :],
)
# Calculate the fraction of faulty pixels over the camera
faulty_pixels_percentage = (
np.count_nonzero(outlier_mask, axis=-1) / np.shape(outlier_mask)[-1]
) * 100.0
TjarkMiener marked this conversation as resolved.
Show resolved Hide resolved
# Check for valid chunks if the threshold is not exceeded
valid_chunks = faulty_pixels_percentage < self.faulty_pixels_threshold
return valid_chunks
Loading