Migration of flatfield and pedestal components from lstchain #2519

TjarkMiener · 2024-02-27T14:57:00Z

This PR draft adds a migration proposal for the flatfield and pedestal components from lstchain into ctapipe. Shared functionality is moved into a parent component, while the different subcomponents hold the _process() function for the conventional processing of flatfield and pedestal events. Additional subcomponents with complex processing routines of flatfield and pedestal events can be added in the future.

This commit adds a migration proposal for the flatfield and pedestal components from lstchain into ctapipe. Shared functionality is moved into a parent component, while the different subcomponents hold the _process() function for the conventional processing of flatfield and pedestal events. Additional subcomponents with complex proccesing routines of flatfield and pedestal events can be added in the future.

Add TODOs for comments that need further dicussion.

src/ctapipe/calib/camera/extractor.py

maxnoe · 2024-02-27T16:59:46Z

Hi @TjarkMiener thanks, this is much needed.

However, I think we should first discuss a bit how we want the general API of these classes to look like.

I think we certainly don't want to keep the current one.

Do we need an abstract base class for the calibration like we have currently?

This would depend on whether we can use the same general structure and algorithm per telescope. Will all telescopes use the F-Factor method?
Do we use a custom tool to compute these calibration coefficients or do we integrate into ctapipe-process.

I think specific tool is the way to go.
What structure do we need?

Currently we have three classes, one for pedestal events, one for flatfield events and one tying these two together to compute the final calibration coefficients. That makes sense I think, but there is also a lot of duplication between the flatfield and the pedestal base class in lstchain currently. This should be refactored.
How do we store the output?
How do we read it back in for calibration?

There are a couple of related open issues:

maxnoe · 2024-02-27T17:01:51Z

src/ctapipe/calib/camera/extractor.py

+    sample_size : int
+        Minimal number of calibration events requested for the statistics
+    update_frequency : int
+        Calculation frequency of the statistics


This value is not clear to me from the explanation given.

I agree, this is not clear

maxnoe · 2024-02-27T17:07:53Z

src/ctapipe/calib/camera/extractor.py

+        self.broken_pixels = []  # masked pixels per event in sample
+
+        # load the waveform charge extractor
+        self.extractor = ImageExtractor.from_name(


Since you have image_extractor as a TelescopeParameter, you need to create all types requested, not just the one type.

See e.g.:

ctapipe/src/ctapipe/calib/camera/calibrator.py

Line 130 in eb43957

for _, _, name in self.image_extractor_type:

maxnoe · 2024-02-27T17:12:28Z

src/ctapipe/calib/camera/extractor.py

+        return {
+            "sample_time": (
+                trigger_times[0] + (trigger_times[-1] - trigger_times[0]) / 2
+            ).unix


Please don't convert to unix here, just keep it as astropy time object

maxnoe · 2024-02-27T17:14:27Z

src/ctapipe/calib/camera/extractor.py

+    """
+
+    def __init__(self, **kwargs):
+        super().__init__(**kwargs)


You can completely remove this method if it does nothing else than calling the base class init

maxnoe · 2024-02-27T17:14:46Z

src/ctapipe/calib/camera/extractor.py

+        self.arrival_times = self.arrival_times[self.update_frequency :]
+        self.broken_pixels = self.broken_pixels[self.update_frequency :]
+
+    @abstractmethod


private methods should not be abstract

maxnoe · 2024-02-27T17:15:54Z

src/ctapipe/calib/camera/extractor.py

+        mon.tel[self.tel_id].flatfield container
+        """
+
+        container = event.mon.tel[self.tel_id].flatfield


I don't think we should set the new container on the event.

Instead, let the class keep a list of computed calibrations and append to it when a new calibration is computed.

maxnoe · 2024-02-27T17:16:56Z

src/ctapipe/calib/camera/extractor.py

+            self.sample_masked_pixels,
+        )
+
+        flatfield_container = {


We made very bad experiences with modifying containers in place. You should create and fill a fresh container.

You can also directly pass the fields to the container itself.

maxnoe · 2024-02-27T17:17:45Z

src/ctapipe/calib/camera/extractor.py

+        ff_charge_failing_pixels = np.logical_or(
+            container.charge_median_outliers, container.charge_std_outliers
+        )
+        event.mon.tel[self.tel_id].pixel_status.flatfield_failing_pixels = (


Again, we shouldn't modify the input event. That's for the code that reads back the calibration coefficients and interpolates / matches them to events when applying the calibration.

maxnoe · 2024-02-27T17:19:24Z

src/ctapipe/calib/camera/extractor.py

+
+        # append valid dl1 events
+        self.n_events_in_buffer += 1
+        self.trigger_times.append(event.trigger.tel[self.tel_id].time)


self.tel_id seems not to be defined anymore.

Also: these components should be able to handle multiple telescopes I think, so we keep e.g. a dict from tel_id to the properties.

maxnoe · 2024-02-27T17:21:55Z

src/ctapipe/calib/camera/extractor.py

+
+        # append valid dl1 events
+        self.n_events_in_buffer += 1
+        self.trigger_times.append(event.trigger.tel[self.tel_id].time)


It also looks like you are implementing a deque with the logic in clean:
https://docs.python.org/3/library/collections.html#collections.deque

E.g. if you always want to keep the most recent N events, use a deque(maxsize=N) and just append to it.

Once you reach maxsize, the oldest event will be discarded.

Thanks for the comments @maxnoe ! I was unaware of such great python functionality. I will definitely start using deque for this task in the future!

FrancaCassol · 2024-02-29T11:17:00Z

Hi @TjarkMiener thanks, this is much needed.

However, I think we should first discuss a bit how we want the general API of these classes to look like.

Hi @maxnoe,

I think for the moment @TjarkMiener just plans to optimised the present code. However, I agree on the points you raised, it is probably worth to develop a more general view before starting.

maxnoe · 2024-02-29T12:35:59Z

@FrancaCassol I think it doesn't make much sense to apply any optimization to the existing code, as we know we need to replace it completely relatively soon.

maxnoe · 2024-02-29T12:36:49Z

The logic is already somewhat different to what is implemented in LST currently, so clearly this at least has some new development.

kosack

I have a general question: these classes are for computing running pedestals and flatfields, but they process each event at at time and have mechanisms for bunching events, and keeping running statistics. Now that we know all calibration events will be in their own file and that for offline processing we can do a first pass on that file to compute coefficents, is the current design with an event-loop really the best one? The only use case I can see for that is for real-time analysis, but that is not what we have, we can compute all calibration after the run has finished and can even loop many times if needed.

I would think it is much more efficient and easy to implement the computation of calibration coefficients using a procedure that first pre-processes and writes out the events to an HDF5 file, and then using TableLoader, apply functions to all events at once, similar to how ctapipe-apply-models works. Should we maybe have a design discussion about this? We can still of course integrate this method for the time being, but I suspect it would be much less code and allow fancier algorithms if we process column-wise rather than row-wise(per event). For example doing a boxcar average for the pedestals for example would be trivial to implement.

TjarkMiener · 2024-03-07T15:45:51Z

This PR draft was meant to refactor the current lstchain components for flatfield and pedestal to remove repeated code and start a discussion on the migration. In addition, I added this config parameter, which allows the calculation of the stats after a given iteration and not once the sample buffer is filled. Originally I had in mind to follow what @kosack suggested (pre-process/integrate charges and arrival times, dump dl1-like events, and then table read plus the calculation of coefficients). I dropped this idea because a presumed 2x 100Hz calibration rate and an obs block duration of 20-25 mins would exceed the RAM limitation of 2GB per job when reading the whole table. However, as we discussed in a separate thread, the calibration rate will be likely reduced in the near future. So I guess we could reconsider the suggestion of integrating and temporarily storing dl1-like calibration data to read them afterward using the TableLoader and apply functions to all events at once.

The idea I followed in this PR for the design of the components was to be able to flexibly add subcomponents that could host the "fancy algorithms" via the _process() function. You could in principle increase the sample_size and then apply the complex functions on the self.charges and self.arrival_times arrays as you would do for all the events read with the TableLoader. Those complex functions could even be applied on a pixel-wise level, i.e. we might want to treat pixels that are affected by stars differently. The current subcomponents FlatFieldExtractor and PedestalExtractor were targeting the cat-B processing, while fancy subcomponents can be added in the future for cat-C processing.

I like the idea proposed in #1226 for the on-the-fly computation of flatfield and pedestals, even though we might want to store the charge statistics monitoring of the calibration events for the data quality monitoring.

maxnoe · 2024-03-07T16:15:32Z

The table loader also supports chunked loading, which allows you to iterate through chunks and keeping consecutive chunks in memory, you only need roughly the amount of events of your windows sizes.

TjarkMiener · 2024-03-08T08:13:06Z

True, good point! My initial plan was to follow a workflow like: integrating charges and extracting arrival times via the process tool ($ ctapipe-process ... --write_images --no-write_parameters). And then code a tool for the calculation of the camera calibration coefficients based on the TableLoader. Therefore, I would need to make the process tool compatible with calibration events, i.e. allowing two gains integration. By using the process tool we would directly ensure to follow the standard ctapipe DL1 data model, which then probably needs some adjustments for the allowance of two gains images and peak_times. I think the dl1-like files for the calibration events can be stored temporarily. One advantage would also be that those dl1-like calibration files could be sent directly to the off-site data center after production rather than the waveforms of the calibration events. A naive guess from my side would be that we don't want to repeat the charge integration and arrival time extraction for the calibration events each time we are calculating camera calibration coefficients (on- or off-site).

I'm closing this draft for now and working on the suggestion above in a separate branch. In case you think we should follow the design of this PR feel free to comment and I will reopen this draft. Thanks for the discussion and suggestion!

Tjark Miener added 5 commits February 27, 2024 10:17

implemented comments

e013970

Add TODOs for comments that need further dicussion.

return DL1CameraContainer directly

af9d063

check validity of sample size and update frequency

98066fa

fix typo

4ed7915

Hckjs reviewed Feb 27, 2024

View reviewed changes

src/ctapipe/calib/camera/extractor.py Show resolved Hide resolved

maxnoe reviewed Feb 27, 2024

View reviewed changes

kosack reviewed Mar 7, 2024

View reviewed changes

TjarkMiener closed this Mar 8, 2024

TjarkMiener mentioned this pull request Mar 13, 2024

Enable processing of interleaved calibration events via the process tool #2522

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration of flatfield and pedestal components from lstchain #2519

Migration of flatfield and pedestal components from lstchain #2519

TjarkMiener commented Feb 27, 2024

maxnoe commented Feb 27, 2024

maxnoe Feb 27, 2024

FrancaCassol Feb 29, 2024

maxnoe Feb 27, 2024

maxnoe Feb 27, 2024

maxnoe Feb 27, 2024

maxnoe Feb 27, 2024

maxnoe Feb 27, 2024 •

edited

Loading

maxnoe Feb 27, 2024

maxnoe Feb 27, 2024

maxnoe Feb 27, 2024

maxnoe Feb 27, 2024

TjarkMiener Mar 8, 2024

FrancaCassol commented Feb 29, 2024

maxnoe commented Feb 29, 2024 •

edited

Loading

maxnoe commented Feb 29, 2024

kosack left a comment

TjarkMiener commented Mar 7, 2024

maxnoe commented Mar 7, 2024

TjarkMiener commented Mar 8, 2024 •

edited

Loading

Migration of flatfield and pedestal components from lstchain #2519

Migration of flatfield and pedestal components from lstchain #2519

Conversation

TjarkMiener commented Feb 27, 2024

maxnoe commented Feb 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxnoe Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

FrancaCassol commented Feb 29, 2024

maxnoe commented Feb 29, 2024 • edited Loading

maxnoe commented Feb 29, 2024

kosack left a comment

Choose a reason for hiding this comment

TjarkMiener commented Mar 7, 2024

maxnoe commented Mar 7, 2024

TjarkMiener commented Mar 8, 2024 • edited Loading

maxnoe Feb 27, 2024 •

edited

Loading

maxnoe commented Feb 29, 2024 •

edited

Loading

TjarkMiener commented Mar 8, 2024 •

edited

Loading