-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add calibration calculators #2609
base: main
Are you sure you want to change the base?
Changes from 98 commits
58d868c
edfccc6
d19168c
49f0f8a
9035fb4
8a0e89f
82b29e9
1805cb2
a4cea6a
16e5920
63d1774
5768ed7
e6ba177
ddf342b
0beb975
b4df919
3483fd4
5bbedda
14a0910
f126748
0ec81b9
3910c22
5c6595b
bc65984
d7a65a3
ba00bf3
d3aae32
825b25b
a5b878d
e8ebdd8
78481cc
41a4246
010aea6
26ff8a4
d142995
0737b7f
225346b
74c72bb
faad542
9867431
3a85ffa
69ebd1c
96b2dbb
a007773
96fe563
67d34c4
f1429da
c9ec02b
444e370
1c4cadb
0ebb464
bbbd891
5deef74
f09bbe1
b8f0a4f
39fca21
b114638
5d18f61
4df8c94
86eb15c
6200653
4f604c5
6dfce15
8e873d3
bf36b90
f7d3223
91812da
326c202
fa029f7
7c90e0f
8c44a42
bff611a
d85dd50
c522434
9192790
aecc5dc
e3113ab
b9dc929
ef920da
9bfc16a
aee5c10
1484eca
601cbb4
0347153
b2208d5
61936e5
52ecd8a
4ffd39e
c404a4a
a59664e
0ed515d
48c686b
3bae046
aa5666a
ef53ebb
3ca7252
4af4e74
1e684b1
0bd7702
83ee522
6a29808
55451d2
5b8ed0b
6e0d00a
fea13eb
1a99eb4
e8e7f37
bc4fac9
88977ed
9a0914d
f300846
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
.. _calibration_calculator: | ||
|
||
********************** | ||
Calibration Calculator | ||
********************** | ||
|
||
|
||
Reference/API | ||
============= | ||
|
||
.. automodapi:: ctapipe.monitoring.calculator |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
Add calibration calculators which aggregates statistics, detects outliers, handles faulty data chunks. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,327 @@ | ||
""" | ||
Definition of the ``PixelStatisticsCalculator`` class, providing all steps needed to | ||
calculate the montoring data for the camera calibration. | ||
""" | ||
|
||
import numpy as np | ||
from astropy.table import Table, vstack | ||
|
||
from ctapipe.core import TelescopeComponent | ||
from ctapipe.core.traits import ( | ||
ComponentName, | ||
Dict, | ||
Float, | ||
Int, | ||
List, | ||
TelescopeParameter, | ||
) | ||
from ctapipe.monitoring.aggregator import StatisticsAggregator | ||
from ctapipe.monitoring.outlier import OutlierDetector | ||
|
||
__all__ = [ | ||
"PixelStatisticsCalculator", | ||
] | ||
|
||
|
||
class PixelStatisticsCalculator(TelescopeComponent): | ||
""" | ||
Component to calculate statistics from calibration events. | ||
|
||
The ``PixelStatisticsCalculator`` is responsible for calculating various statistics from | ||
calibration events, such as pedestal and flat-field data. It aggregates statistics, | ||
detects outliers, and handles faulty data periods. | ||
This class holds two functions to conduct two different passes over the data with and without | ||
overlapping aggregation chunks. The first pass is conducted with non-overlapping chunks, | ||
while overlapping chunks can be set by the ``chunk_shift`` parameter for the second pass. | ||
The second pass over the data is only conducted in regions of trouble with a high percentage | ||
of faulty pixels exceeding the threshold ``faulty_pixels_threshold``. | ||
""" | ||
|
||
stats_aggregator_type = TelescopeParameter( | ||
trait=ComponentName( | ||
StatisticsAggregator, default_value="SigmaClippingAggregator" | ||
), | ||
default_value="SigmaClippingAggregator", | ||
help="Name of the StatisticsAggregator subclass to be used.", | ||
).tag(config=True) | ||
|
||
outlier_detector_list = List( | ||
trait=Dict(), | ||
default_value=None, | ||
allow_none=True, | ||
help=( | ||
"List of dicts containing the name of the OutlierDetector subclass to be used, " | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"the aggregated statistic value to which the detector should be applied, " | ||
"and the validity range of the detector. " | ||
"E.g. ``[{'apply_to': 'std', 'name': 'RangeOutlierDetector', 'validity_range': [2.0, 8.0]},]``." | ||
), | ||
).tag(config=True) | ||
|
||
chunk_shift = Int( | ||
default_value=None, | ||
allow_none=True, | ||
help=( | ||
"Number of samples to shift the aggregation chunk for the calculation " | ||
"of the statistical values. Only used in the second_pass(), since the " | ||
"first_pass() is conducted with non-overlapping chunks (chunk_shift=None)." | ||
), | ||
).tag(config=True) | ||
|
||
faulty_pixels_threshold = Float( | ||
default_value=10.0, | ||
allow_none=True, | ||
help=( | ||
"Threshold in percentage of faulty pixels over the camera " | ||
"to identify regions of trouble." | ||
), | ||
).tag(config=True) | ||
|
||
def __init__( | ||
self, | ||
subarray, | ||
config=None, | ||
parent=None, | ||
stats_aggregator=None, | ||
**kwargs, | ||
): | ||
""" | ||
Parameters | ||
---------- | ||
subarray: ctapipe.instrument.SubarrayDescription | ||
Description of the subarray. Provides information about the | ||
camera which are useful in calibration. Also required for | ||
configuring the TelescopeParameter traitlets. | ||
config: traitlets.loader.Config | ||
Configuration specified by config file or cmdline arguments. | ||
Used to set traitlet values. | ||
This is mutually exclusive with passing a ``parent``. | ||
parent: ctapipe.core.Component or ctapipe.core.Tool | ||
Parent of this component in the configuration hierarchy, | ||
this is mutually exclusive with passing ``config`` | ||
stats_aggregator: ctapipe.monitoring.aggregator.StatisticsAggregator | ||
The ``StatisticsAggregator`` to use. If None, the default via the | ||
configuration system will be constructed. | ||
""" | ||
super().__init__(subarray=subarray, config=config, parent=parent, **kwargs) | ||
self.subarray = subarray | ||
|
||
# Initialize the instances of StatisticsAggregator | ||
self.stats_aggregator = {} | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
if stats_aggregator is None: | ||
for _, _, name in self.stats_aggregator_type: | ||
self.stats_aggregator[name] = StatisticsAggregator.from_name( | ||
name, subarray=self.subarray, parent=self | ||
) | ||
else: | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
name = stats_aggregator.__class__.__name__ | ||
self.stats_aggregator_type = [("type", "*", name)] | ||
self.stats_aggregator[name] = stats_aggregator | ||
|
||
# Initialize the instances of OutlierDetector | ||
self.outlier_detectors = {} | ||
if self.outlier_detector_list is not None: | ||
for outlier_detector in self.outlier_detector_list: | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.outlier_detectors[ | ||
outlier_detector["apply_to"] | ||
] = OutlierDetector.from_name( | ||
name=outlier_detector["name"], | ||
validity_range=outlier_detector["validity_range"], | ||
subarray=self.subarray, | ||
parent=self, | ||
) | ||
|
||
def first_pass( | ||
self, | ||
table, | ||
tel_id, | ||
masked_pixels_of_sample=None, | ||
col_name="image", | ||
) -> Table: | ||
""" | ||
Calculate the monitoring data for a given set of events with non-overlapping aggregation chunks. | ||
|
||
This method performs the first pass over the provided data table to calculate | ||
various statistics for calibration purposes. The statistics are aggregated with | ||
non-overlapping chunks (``chunk_shift`` set to None), and faulty pixels are detected | ||
using a list of outlier detectors. | ||
|
||
|
||
Parameters | ||
---------- | ||
table : astropy.table.Table | ||
DL1-like table with images of shape (n_images, n_channels, n_pix), event IDs and | ||
timestamps of shape (n_images, ) | ||
tel_id : int | ||
Telescope ID for which the calibration is being performed | ||
masked_pixels_of_sample : ndarray, optional | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
Boolean array of masked pixels of shape (n_pix, ) that are not available for processing | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, please use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This boolean mask corresponds to hardware failure, so i think it should be (n_pixels,). We treat high and low gain independently later on software level when aggregating the stats for each type of calibration. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. But the hardware of the two channels is at least partly independent, at least in LST (same PMT but e.g. different DRS4 chips). So they can also fail independently. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually the shape is With my current setup using |
||
col_name : str | ||
Column name in the table from which the statistics will be aggregated | ||
|
||
Returns | ||
------- | ||
astropy.table.Table | ||
Table containing the aggregated statistics, their outlier masks, and the validity of the chunks | ||
""" | ||
# Get the aggregator | ||
aggregator = self.stats_aggregator[self.stats_aggregator_type.tel[tel_id]] | ||
# Pass through the whole provided dl1 table | ||
aggregated_stats = aggregator( | ||
table=table, | ||
masked_pixels_of_sample=masked_pixels_of_sample, | ||
col_name=col_name, | ||
chunk_shift=None, | ||
) | ||
# Detect faulty pixels with multiple instances of ``OutlierDetector`` | ||
outlier_mask = np.zeros_like(aggregated_stats["mean"], dtype=bool) | ||
for aggregated_val, outlier_detector in self.outlier_detectors.items(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe we should store a bitmask / boolean array of which of the detectors rejected the sample as outlier? Like this, you cannot really understand why an event was rejected. So maybe, compute all masks for all detectors in an nd array, then apply There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We do not reject here on event-level but only on pixel-level. Currently we have for each calibration type (pedestal, flatfield, timecalibration, pedestalvariance) an outlier mask with shape of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not saying that you should use that mask for the analysis. Just for understanding why a pixel was deemed unusable. Which is very important information e.g. to distinguish a hardware error from just reduced HV due to a star. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, makes sense! I just pushed a commit. Does it look good to you? |
||
outlier_mask = np.logical_or( | ||
outlier_mask, | ||
outlier_detector(aggregated_stats[aggregated_val]), | ||
) | ||
# Add the outlier mask to the aggregated statistics | ||
aggregated_stats["outlier_mask"] = outlier_mask | ||
# Get valid chunks and add them to the aggregated statistics | ||
aggregated_stats["is_valid"] = self._get_valid_chunks(outlier_mask) | ||
return aggregated_stats | ||
|
||
def second_pass( | ||
self, | ||
table, | ||
valid_chunks, | ||
tel_id, | ||
masked_pixels_of_sample=None, | ||
col_name="image", | ||
) -> Table: | ||
""" | ||
Conduct a second pass over the data to refine the statistics in regions with a high percentage of faulty pixels. | ||
|
||
This method performs a second pass over the data with a refined shift of the chunk in regions where a high percentage | ||
of faulty pixels were detected during the first pass. Note: Multiple first passes of different calibration events are | ||
performed which may lead to different identification of faulty chunks in rare cases. Therefore a joined list of faulty | ||
chunks is recommended to be passed to the second pass(es) if those different passes use the same ``chunk_size``. | ||
|
||
Parameters | ||
---------- | ||
table : astropy.table.Table | ||
DL1-like table with images of shape (n_images, n_channels, n_pix), event IDs and timestamps of shape (n_images, ). | ||
valid_chunks : ndarray | ||
Boolean array indicating the validity of each chunk from the first pass. | ||
Note: This boolean array can be a ``logical_and`` from multiple first passes of different calibration events. | ||
tel_id : int | ||
Telescope ID for which the calibration is being performed. | ||
masked_pixels_of_sample : ndarray, optional | ||
Boolean array of masked pixels of shape (n_pix, ) that are not available for processing. | ||
col_name : str | ||
Column name in the table from which the statistics will be aggregated. | ||
|
||
Returns | ||
------- | ||
astropy.table.Table | ||
Table containing the aggregated statistics after the second pass, their outlier masks, and the validity of the chunks. | ||
""" | ||
# Check if the chunk_shift is set for the second pass | ||
if self.chunk_shift is None: | ||
raise ValueError( | ||
"chunk_shift must be set if second pass over the data is requested" | ||
) | ||
# Check if at least one chunk is faulty | ||
if np.all(valid_chunks): | ||
raise ValueError( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this really an error? Or should it just return as a no-op? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The check is needed because if all chunks are valid, an empty list would be |
||
"All chunks are valid. The second pass over the data is redundant." | ||
) | ||
# Get the aggregator | ||
aggregator = self.stats_aggregator[self.stats_aggregator_type.tel[tel_id]] | ||
# Conduct a second pass over the data | ||
aggregated_stats_secondpass = [] | ||
faulty_chunks_indices = np.where(~valid_chunks)[0] | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
for index in faulty_chunks_indices: | ||
# Log information of the faulty chunks | ||
self.log.info( | ||
"Faulty chunk detected in the first pass at index '%s'.", index | ||
) | ||
# Calculate the start of the slice depending on whether the previous chunk was faulty or not | ||
slice_start = ( | ||
aggregator.chunk_size * index | ||
if index - 1 in faulty_chunks_indices | ||
else aggregator.chunk_size * (index - 1) | ||
) | ||
# Set the start of the slice to the first element of the dl1 table if out of bound | ||
# and add one ``chunk_shift``. | ||
slice_start = max(0, slice_start) + self.chunk_shift | ||
# Set the end of the slice to the last element of the dl1 table if out of bound | ||
# and subtract one ``chunk_shift``. | ||
slice_end = min(len(table) - 1, aggregator.chunk_size * (index + 2)) - ( | ||
self.chunk_shift - 1 | ||
) | ||
# Slice the dl1 table according to the previously calculated start and end. | ||
table_sliced = table[slice_start:slice_end] | ||
# Run the stats aggregator on the sliced dl1 table with a chunk_shift | ||
# to sample the period of trouble (carflashes etc.) as effectively as possible. | ||
# Checking for the length of the sliced table to be greater than the ``chunk_size`` | ||
# since it can be smaller if the last two chunks are faulty. Note: The two last chunks | ||
# can be overlapping during the first pass, so we simply ignore them if there are faulty. | ||
if len(table_sliced) > aggregator.chunk_size: | ||
aggregated_stats_secondpass.append( | ||
aggregator( | ||
table=table_sliced, | ||
masked_pixels_of_sample=masked_pixels_of_sample, | ||
col_name=col_name, | ||
chunk_shift=self.chunk_shift, | ||
) | ||
) | ||
# Stack the aggregated statistics of each faulty chunk | ||
aggregated_stats_secondpass = vstack(aggregated_stats_secondpass) | ||
# Detect faulty pixels with multiple instances of OutlierDetector of the second pass | ||
outlier_mask_secondpass = np.zeros_like( | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
aggregated_stats_secondpass["mean"], dtype=bool | ||
) | ||
for ( | ||
aggregated_val, | ||
outlier_detector, | ||
) in self.outlier_detectors.items(): | ||
outlier_mask_secondpass = np.logical_or( | ||
outlier_mask_secondpass, | ||
outlier_detector(aggregated_stats_secondpass[aggregated_val]), | ||
) | ||
# Add the outlier mask to the aggregated statistics | ||
aggregated_stats_secondpass["outlier_mask"] = outlier_mask_secondpass | ||
aggregated_stats_secondpass["is_valid"] = self._get_valid_chunks( | ||
outlier_mask_secondpass | ||
) | ||
return aggregated_stats_secondpass | ||
|
||
def _get_valid_chunks(self, outlier_mask): | ||
""" | ||
Identify valid chunks based on the outlier mask. | ||
|
||
This method processes the outlier mask to determine which chunks of data | ||
are considered valid or faulty. A chunk is marked as faulty if the percentage | ||
of outlier pixels exceeds a predefined threshold ``faulty_pixels_threshold``. | ||
|
||
Parameters | ||
---------- | ||
outlier_mask : numpy.ndarray | ||
Boolean array indicating outlier pixels. The shape of the array should | ||
match the shape of the aggregated statistics. | ||
|
||
Returns | ||
------- | ||
numpy.ndarray | ||
Boolean array where each element indicates whether the corresponding | ||
chunk is valid (True) or faulty (False). | ||
""" | ||
# Check if the camera has two gain channels | ||
if outlier_mask.shape[1] == 2: | ||
# Combine the outlier mask of both gain channels | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
outlier_mask = np.logical_or( | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
outlier_mask[:, 0, :], | ||
outlier_mask[:, 1, :], | ||
) | ||
# Calculate the fraction of faulty pixels over the camera | ||
faulty_pixels_percentage = ( | ||
np.count_nonzero(outlier_mask, axis=-1) / np.shape(outlier_mask)[-1] | ||
) * 100.0 | ||
TjarkMiener marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Check for valid chunks if the threshold is not exceeded | ||
valid_chunks = faulty_pixels_percentage < self.faulty_pixels_threshold | ||
return valid_chunks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a sensible default configuration that actually performs outlier detection?
The default value seems to be "no outlier detection".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This component is used to aggregate the stats for the pedestal, flatfield, timecalibration, and pedestalvariance. Each of this aggregation has sensible distinct default configuration. Should we prioritize one of the aggregation over the others and set the default config for that aggregation?