Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration of flatfield and pedestal components from lstchain #2519

Conversation

TjarkMiener
Copy link
Member

This PR draft adds a migration proposal for the flatfield and pedestal components from lstchain into ctapipe. Shared functionality is moved into a parent component, while the different subcomponents hold the _process() function for the conventional processing of flatfield and pedestal events. Additional subcomponents with complex processing routines of flatfield and pedestal events can be added in the future.

Tjark Miener added 5 commits February 27, 2024 10:17
This commit adds a migration proposal for the flatfield and pedestal components from lstchain into ctapipe.

Shared functionality is moved into a parent component, while the different subcomponents hold the _process() function for the conventional processing of flatfield and pedestal events.

Additional subcomponents with complex proccesing routines of flatfield and pedestal events can be added in the future.
Add TODOs for comments that need further dicussion.
@maxnoe
Copy link
Member

maxnoe commented Feb 27, 2024

Hi @TjarkMiener thanks, this is much needed.

However, I think we should first discuss a bit how we want the general API of these classes to look like.

I think we certainly don't want to keep the current one.

  • Do we need an abstract base class for the calibration like we have currently?

    This would depend on whether we can use the same general structure and algorithm per telescope. Will all telescopes use the F-Factor method?

  • Do we use a custom tool to compute these calibration coefficients or do we integrate into ctapipe-process.

    I think specific tool is the way to go.

  • What structure do we need?

    Currently we have three classes, one for pedestal events, one for flatfield events and one tying these two together to compute the final calibration coefficients. That makes sense I think, but there is also a lot of duplication between the flatfield and the pedestal base class in lstchain currently. This should be refactored.

  • How do we store the output?

  • How do we read it back in for calibration?

There are a couple of related open issues:

sample_size : int
Minimal number of calibration events requested for the statistics
update_frequency : int
Calculation frequency of the statistics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This value is not clear to me from the explanation given.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, this is not clear

self.broken_pixels = [] # masked pixels per event in sample

# load the waveform charge extractor
self.extractor = ImageExtractor.from_name(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you have image_extractor as a TelescopeParameter, you need to create all types requested, not just the one type.

See e.g.:

for _, _, name in self.image_extractor_type:

return {
"sample_time": (
trigger_times[0] + (trigger_times[-1] - trigger_times[0]) / 2
).unix
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't convert to unix here, just keep it as astropy time object

"""

def __init__(self, **kwargs):
super().__init__(**kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can completely remove this method if it does nothing else than calling the base class init

self.arrival_times = self.arrival_times[self.update_frequency :]
self.broken_pixels = self.broken_pixels[self.update_frequency :]

@abstractmethod
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

private methods should not be abstract

mon.tel[self.tel_id].flatfield container
"""

container = event.mon.tel[self.tel_id].flatfield
Copy link
Member

@maxnoe maxnoe Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should set the new container on the event.

Instead, let the class keep a list of computed calibrations and append to it when a new calibration is computed.

self.sample_masked_pixels,
)

flatfield_container = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We made very bad experiences with modifying containers in place. You should create and fill a fresh container.

You can also directly pass the fields to the container itself.

ff_charge_failing_pixels = np.logical_or(
container.charge_median_outliers, container.charge_std_outliers
)
event.mon.tel[self.tel_id].pixel_status.flatfield_failing_pixels = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, we shouldn't modify the input event. That's for the code that reads back the calibration coefficients and interpolates / matches them to events when applying the calibration.


# append valid dl1 events
self.n_events_in_buffer += 1
self.trigger_times.append(event.trigger.tel[self.tel_id].time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.tel_id seems not to be defined anymore.

Also: these components should be able to handle multiple telescopes I think, so we keep e.g. a dict from tel_id to the properties.


# append valid dl1 events
self.n_events_in_buffer += 1
self.trigger_times.append(event.trigger.tel[self.tel_id].time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also looks like you are implementing a deque with the logic in clean:
https://docs.python.org/3/library/collections.html#collections.deque

E.g. if you always want to keep the most recent N events, use a deque(maxsize=N) and just append to it.

Once you reach maxsize, the oldest event will be discarded.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments @maxnoe ! I was unaware of such great python functionality. I will definitely start using deque for this task in the future!

@FrancaCassol
Copy link
Contributor

Hi @TjarkMiener thanks, this is much needed.

However, I think we should first discuss a bit how we want the general API of these classes to look like.

Hi @maxnoe,

I think for the moment @TjarkMiener just plans to optimised the present code. However, I agree on the points you raised, it is probably worth to develop a more general view before starting.

@maxnoe
Copy link
Member

maxnoe commented Feb 29, 2024

@FrancaCassol I think it doesn't make much sense to apply any optimization to the existing code, as we know we need to replace it completely relatively soon.

@maxnoe
Copy link
Member

maxnoe commented Feb 29, 2024

The logic is already somewhat different to what is implemented in LST currently, so clearly this at least has some new development.

Copy link
Contributor

@kosack kosack left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a general question: these classes are for computing running pedestals and flatfields, but they process each event at at time and have mechanisms for bunching events, and keeping running statistics. Now that we know all calibration events will be in their own file and that for offline processing we can do a first pass on that file to compute coefficents, is the current design with an event-loop really the best one? The only use case I can see for that is for real-time analysis, but that is not what we have, we can compute all calibration after the run has finished and can even loop many times if needed.

I would think it is much more efficient and easy to implement the computation of calibration coefficients using a procedure that first pre-processes and writes out the events to an HDF5 file, and then using TableLoader, apply functions to all events at once, similar to how ctapipe-apply-models works. Should we maybe have a design discussion about this? We can still of course integrate this method for the time being, but I suspect it would be much less code and allow fancier algorithms if we process column-wise rather than row-wise(per event). For example doing a boxcar average for the pedestals for example would be trivial to implement.

@TjarkMiener
Copy link
Member Author

This PR draft was meant to refactor the current lstchain components for flatfield and pedestal to remove repeated code and start a discussion on the migration. In addition, I added this config parameter, which allows the calculation of the stats after a given iteration and not once the sample buffer is filled. Originally I had in mind to follow what @kosack suggested (pre-process/integrate charges and arrival times, dump dl1-like events, and then table read plus the calculation of coefficients). I dropped this idea because a presumed 2x 100Hz calibration rate and an obs block duration of 20-25 mins would exceed the RAM limitation of 2GB per job when reading the whole table. However, as we discussed in a separate thread, the calibration rate will be likely reduced in the near future. So I guess we could reconsider the suggestion of integrating and temporarily storing dl1-like calibration data to read them afterward using the TableLoader and apply functions to all events at once.

The idea I followed in this PR for the design of the components was to be able to flexibly add subcomponents that could host the "fancy algorithms" via the _process() function. You could in principle increase the sample_size and then apply the complex functions on the self.charges and self.arrival_times arrays as you would do for all the events read with the TableLoader. Those complex functions could even be applied on a pixel-wise level, i.e. we might want to treat pixels that are affected by stars differently. The current subcomponents FlatFieldExtractor and PedestalExtractor were targeting the cat-B processing, while fancy subcomponents can be added in the future for cat-C processing.

I like the idea proposed in #1226 for the on-the-fly computation of flatfield and pedestals, even though we might want to store the charge statistics monitoring of the calibration events for the data quality monitoring.

@maxnoe
Copy link
Member

maxnoe commented Mar 7, 2024

The table loader also supports chunked loading, which allows you to iterate through chunks and keeping consecutive chunks in memory, you only need roughly the amount of events of your windows sizes.

@TjarkMiener
Copy link
Member Author

TjarkMiener commented Mar 8, 2024

True, good point! My initial plan was to follow a workflow like: integrating charges and extracting arrival times via the process tool ($ ctapipe-process ... --write_images --no-write_parameters). And then code a tool for the calculation of the camera calibration coefficients based on the TableLoader. Therefore, I would need to make the process tool compatible with calibration events, i.e. allowing two gains integration. By using the process tool we would directly ensure to follow the standard ctapipe DL1 data model, which then probably needs some adjustments for the allowance of two gains images and peak_times. I think the dl1-like files for the calibration events can be stored temporarily. One advantage would also be that those dl1-like calibration files could be sent directly to the off-site data center after production rather than the waveforms of the calibration events. A naive guess from my side would be that we don't want to repeat the charge integration and arrival time extraction for the calibration events each time we are calculating camera calibration coefficients (on- or off-site).

I'm closing this draft for now and working on the suggestion above in a separate branch. In case you think we should follow the design of this PR feel free to comment and I will reopen this draft. Thanks for the discussion and suggestion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants