-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migration of flatfield and pedestal components from lstchain #2519
Conversation
This commit adds a migration proposal for the flatfield and pedestal components from lstchain into ctapipe. Shared functionality is moved into a parent component, while the different subcomponents hold the _process() function for the conventional processing of flatfield and pedestal events. Additional subcomponents with complex proccesing routines of flatfield and pedestal events can be added in the future.
Add TODOs for comments that need further dicussion.
Hi @TjarkMiener thanks, this is much needed. However, I think we should first discuss a bit how we want the general API of these classes to look like. I think we certainly don't want to keep the current one.
There are a couple of related open issues:
|
sample_size : int | ||
Minimal number of calibration events requested for the statistics | ||
update_frequency : int | ||
Calculation frequency of the statistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This value is not clear to me from the explanation given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, this is not clear
self.broken_pixels = [] # masked pixels per event in sample | ||
|
||
# load the waveform charge extractor | ||
self.extractor = ImageExtractor.from_name( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you have image_extractor
as a TelescopeParameter
, you need to create all types requested, not just the one type.
See e.g.:
for _, _, name in self.image_extractor_type: |
return { | ||
"sample_time": ( | ||
trigger_times[0] + (trigger_times[-1] - trigger_times[0]) / 2 | ||
).unix |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't convert to unix here, just keep it as astropy time object
""" | ||
|
||
def __init__(self, **kwargs): | ||
super().__init__(**kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can completely remove this method if it does nothing else than calling the base class init
self.arrival_times = self.arrival_times[self.update_frequency :] | ||
self.broken_pixels = self.broken_pixels[self.update_frequency :] | ||
|
||
@abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private methods should not be abstract
mon.tel[self.tel_id].flatfield container | ||
""" | ||
|
||
container = event.mon.tel[self.tel_id].flatfield |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should set the new container on the event.
Instead, let the class keep a list of computed calibrations and append to it when a new calibration is computed.
self.sample_masked_pixels, | ||
) | ||
|
||
flatfield_container = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We made very bad experiences with modifying containers in place. You should create and fill a fresh container.
You can also directly pass the fields to the container itself.
ff_charge_failing_pixels = np.logical_or( | ||
container.charge_median_outliers, container.charge_std_outliers | ||
) | ||
event.mon.tel[self.tel_id].pixel_status.flatfield_failing_pixels = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, we shouldn't modify the input event. That's for the code that reads back the calibration coefficients and interpolates / matches them to events when applying the calibration.
|
||
# append valid dl1 events | ||
self.n_events_in_buffer += 1 | ||
self.trigger_times.append(event.trigger.tel[self.tel_id].time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.tel_id
seems not to be defined anymore.
Also: these components should be able to handle multiple telescopes I think, so we keep e.g. a dict from tel_id to the properties.
|
||
# append valid dl1 events | ||
self.n_events_in_buffer += 1 | ||
self.trigger_times.append(event.trigger.tel[self.tel_id].time) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also looks like you are implementing a deque with the logic in clean:
https://docs.python.org/3/library/collections.html#collections.deque
E.g. if you always want to keep the most recent N events, use a deque(maxsize=N)
and just append
to it.
Once you reach maxsize, the oldest event will be discarded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comments @maxnoe ! I was unaware of such great python functionality. I will definitely start using deque
for this task in the future!
Hi @maxnoe, I think for the moment @TjarkMiener just plans to optimised the present code. However, I agree on the points you raised, it is probably worth to develop a more general view before starting. |
@FrancaCassol I think it doesn't make much sense to apply any optimization to the existing code, as we know we need to replace it completely relatively soon. |
The logic is already somewhat different to what is implemented in LST currently, so clearly this at least has some new development. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a general question: these classes are for computing running pedestals and flatfields, but they process each event at at time and have mechanisms for bunching events, and keeping running statistics. Now that we know all calibration events will be in their own file and that for offline processing we can do a first pass on that file to compute coefficents, is the current design with an event-loop really the best one? The only use case I can see for that is for real-time analysis, but that is not what we have, we can compute all calibration after the run has finished and can even loop many times if needed.
I would think it is much more efficient and easy to implement the computation of calibration coefficients using a procedure that first pre-processes and writes out the events to an HDF5 file, and then using TableLoader, apply functions to all events at once, similar to how ctapipe-apply-models
works. Should we maybe have a design discussion about this? We can still of course integrate this method for the time being, but I suspect it would be much less code and allow fancier algorithms if we process column-wise rather than row-wise(per event). For example doing a boxcar average for the pedestals for example would be trivial to implement.
This PR draft was meant to refactor the current lstchain components for flatfield and pedestal to remove repeated code and start a discussion on the migration. In addition, I added this config parameter, which allows the calculation of the stats after a given iteration and not once the sample buffer is filled. Originally I had in mind to follow what @kosack suggested (pre-process/integrate charges and arrival times, dump dl1-like events, and then table read plus the calculation of coefficients). I dropped this idea because a presumed 2x 100Hz calibration rate and an obs block duration of 20-25 mins would exceed the RAM limitation of 2GB per job when reading the whole table. However, as we discussed in a separate thread, the calibration rate will be likely reduced in the near future. So I guess we could reconsider the suggestion of integrating and temporarily storing dl1-like calibration data to read them afterward using the TableLoader and apply functions to all events at once. The idea I followed in this PR for the design of the components was to be able to flexibly add subcomponents that could host the "fancy algorithms" via the I like the idea proposed in #1226 for the on-the-fly computation of flatfield and pedestals, even though we might want to store the charge statistics monitoring of the calibration events for the data quality monitoring. |
The table loader also supports chunked loading, which allows you to iterate through chunks and keeping consecutive chunks in memory, you only need roughly the amount of events of your windows sizes. |
True, good point! My initial plan was to follow a workflow like: integrating charges and extracting arrival times via the I'm closing this draft for now and working on the suggestion above in a separate branch. In case you think we should follow the design of this PR feel free to comment and I will reopen this draft. Thanks for the discussion and suggestion! |
This PR draft adds a migration proposal for the flatfield and pedestal components from lstchain into ctapipe. Shared functionality is moved into a parent component, while the different subcomponents hold the _process() function for the conventional processing of flatfield and pedestal events. Additional subcomponents with complex processing routines of flatfield and pedestal events can be added in the future.