Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A time clustering algorithm for image cleaning #2401

Open
wants to merge 81 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
81 commits
Select commit Hold shift + click to select a range
0d9eb7c
Add more nitpick ignores to fix docs build after traitlets update
maxnoe Oct 19, 2023
bdf210e
Add ctapipe-train-disp-reconstructor config to quickstart tool
LukasBeiske Oct 20, 2023
02b3589
Add changelog
LukasBeiske Oct 20, 2023
d06051e
Add note about performance of provided ml configs
LukasBeiske Oct 20, 2023
56a22cd
Changed some numpy calls following the numpy 2.0 migration guide
Tobychev Sep 27, 2023
4fe8f7e
Changelog
Tobychev Sep 27, 2023
1541808
Fix import order
Tobychev Sep 27, 2023
9ed2d29
Fix import order
Tobychev Sep 27, 2023
4e77e6f
Do not use hidden attribute _models in apply tool
LukasBeiske Oct 23, 2023
92f26c0
Fix changelog entry
LukasBeiske Oct 23, 2023
3c27db1
Use double backticks in changelog
LukasBeiske Oct 23, 2023
7209cad
make definition of h_max consistent
kosack Sep 20, 2023
dd9f8ac
get tests to run
kosack Sep 20, 2023
a1b4132
revert change to conftest for now
kosack Sep 21, 2023
fc8c5db
deal with case where there is no reference_location
kosack Sep 21, 2023
786b84e
removed checks for reference_location (not needed)
kosack Sep 28, 2023
1570f51
added initial changelog
kosack Sep 28, 2023
a3499dd
remove one more redundant check for reference loc
kosack Sep 28, 2023
7c04f78
fix bug where return value was incorrect
kosack Sep 28, 2023
3499c16
better description of h_max
kosack Sep 29, 2023
d3c4784
fix doctring format
kosack Sep 29, 2023
6e8208e
use relative references in changelog
kosack Sep 29, 2023
45d3615
fixed phrasing
kosack Oct 3, 2023
cbafead
define magic number
kosack Oct 23, 2023
a6253f5
use np.average
kosack Oct 23, 2023
aeeb2f4
Add option to override obs_id in SimTelEventSource
maxnoe Oct 19, 2023
18e164b
Use intmin for invalid pixel positions, allow empty arguments
maxnoe Sep 19, 2023
4c56992
Fix broken urls in README
aknierim Oct 10, 2023
5c7159d
Fix navigation_with_keys warning
aknierim Oct 25, 2023
745feae
Check if tel_id exists when looking up TelescopeParameters
maxnoe Oct 26, 2023
936ea10
Check pointing coordinate frame in DispReconstructor
LukasBeiske Oct 26, 2023
02fa105
Use CoordinateFrameType enum
LukasBeiske Oct 26, 2023
28e8a15
Change warning message
LukasBeiske Oct 26, 2023
45bf05d
Just raise and error if training data not in altaz
LukasBeiske Oct 27, 2023
8be21c5
Move the check of the coordinate frame to the right place
LukasBeiske Oct 27, 2023
5eb2f06
Add docstring for ctapipe-train-disp-reconstructor
LukasBeiske Oct 24, 2023
02fdae7
Add changelog
LukasBeiske Oct 24, 2023
0f16707
Add disp train tool to sphinx docs
LukasBeiske Oct 27, 2023
ab18fcc
Fix docstrings for disp train tool
LukasBeiske Oct 27, 2023
7618775
Fix unwanted default columns in disp output
maxnoe Oct 31, 2023
b01c10e
Fix docstring
maxnoe Oct 31, 2023
903646c
More nitpick ignores to fix docs build
maxnoe Oct 31, 2023
ad9fdb1
run test only if labels don't contain 'documentation'
nbiederbeck Jan 19, 2023
24c32bd
Trying variation of the conditional
Tobychev Sep 27, 2023
abf99a8
Trying new magic conditional, see if this works as intended.
Tobychev Oct 5, 2023
c431de2
Trying new magic conditional, see if this works as intended.
Tobychev Oct 5, 2023
d2c0347
Trying new magic conditional, see if this works as intended.
Tobychev Oct 5, 2023
2e79dca
More tests of how to formulate conditional
Tobychev Oct 5, 2023
954d0fe
Think I understand the problem now
Tobychev Oct 5, 2023
40c0e44
Fixing syntax errors
Tobychev Oct 5, 2023
3dc4e9b
Messing about with syntax
Tobychev Oct 5, 2023
442f3dc
Adding skip to more stages
Tobychev Oct 5, 2023
3cf3d84
run test only if labels don't contain 'documentation'
nbiederbeck Jan 19, 2023
98657d2
Trying variation of the conditional
Tobychev Sep 27, 2023
eee886e
Trying new magic conditional, see if this works as intended.
Tobychev Oct 5, 2023
35f2d77
Trying new magic conditional, see if this works as intended.
Tobychev Oct 5, 2023
c70d797
Trying new magic conditional, see if this works as intended.
Tobychev Oct 5, 2023
3547162
More tests of how to formulate conditional
Tobychev Oct 5, 2023
ad2127c
Think I understand the problem now
Tobychev Oct 5, 2023
8f7d639
Rebasing on main
Tobychev Oct 5, 2023
60c406e
Messing about with syntax
Tobychev Oct 5, 2023
5c68368
Fix conflict with main
Tobychev Oct 5, 2023
87374fc
Adding skip to more stages
Tobychev Oct 5, 2023
7638195
Changed the label triggering skipping tests
Tobychev Oct 6, 2023
3a57132
Think I understand the problem now
Tobychev Oct 5, 2023
3aa8a73
Fix conflict with main
Tobychev Oct 5, 2023
065c3d2
Rebasing on main
Tobychev Oct 5, 2023
f51466b
Change label
Tobychev Oct 31, 2023
c743b04
Update ci.yml
Tobychev Oct 31, 2023
5fd6952
Update ci.yml
Tobychev Oct 31, 2023
f83615e
Propose cep3: remove image parameters in camera frame
maxnoe Sep 22, 2023
37e5ea8
added time clustering algortihm
clara-escanuela Sep 19, 2023
55cf8ba
minpts is integer
clara-escanuela Sep 19, 2023
7e8daf8
added unit tests and references
clara-escanuela Sep 19, 2023
895ff97
added unit tests and references
clara-escanuela Sep 19, 2023
a9f81e9
add docs
clara-escanuela Sep 20, 2023
357d02b
style changes
clara-escanuela Sep 20, 2023
722dfcf
improved documentation
clara-escanuela Sep 27, 2023
3c799aa
hard cut on pe
clara-escanuela Sep 30, 2023
b9ab65e
added more units tests
clara-escanuela Jan 17, 2024
afc1704
updated time cleaning
clara-escanuela Feb 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -75,28 +75,28 @@ jobs:
key: ctapipe-test-data

- name: Prepare mamba installation
if: matrix.install-method == 'mamba'
if: matrix.install-method == 'mamba' && contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
env:
PYTHON_VERSION: ${{ matrix.python-version }}
run: |
# setup correct python version
sed -i -e "s/- python=.*/- python=$PYTHON_VERSION/g" environment.yml

- name: mamba setup
if: matrix.install-method == 'mamba'
if: matrix.install-method == 'mamba' && contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
uses: mamba-org/setup-micromamba@v1
with:
environment-file: environment.yml
cache-downloads: true

- name: Python setup
if: matrix.install-method == 'pip'
if: matrix.install-method == 'pip' && contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
check-latest: true

- if: ${{ matrix.install-method == 'pip' && runner.os == 'macOS' }}
- if: ${{ matrix.install-method == 'pip' && runner.os == 'macOS' }} && contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
name: Fix Python PATH on macOS
# See https://github.com/actions/setup-python/issues/132 and
# https://github.com/actions/setup-python/issues/132#issuecomment-779406058
Expand All @@ -107,6 +107,7 @@ jobs:
tee -a ~/.bash_profile <<<'export PATH="$pythonLocation/bin:$PATH"'

- name: Install dependencies
if: contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
run: |
python --version
pip install pytest-cov restructuredtext-lint pytest-xdist 'coverage!=6.3.0'
Expand All @@ -115,14 +116,17 @@ jobs:
pip freeze

- name: Static codechecks
if: contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
run: |
restructuredtext-lint README.rst

- name: ctapipe-info
if: contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
run: |
ctapipe-info --all

- name: Tests
if: contains(github.event.pull_request.labels.*.name, 'documentation-only') == false
run: |
cd $(mktemp -d)
pytest -n auto --dist loadscope \
Expand Down
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,11 +71,11 @@ or via::

pip install ctapipe

**Note**: to install a specific version of ctapipe take look at the documentation `here <https://ctapipe.readthedocs.org/en/latest/getting_started_users/>`__.
**Note**: to install a specific version of ctapipe take look at the documentation `here <https://ctapipe.readthedocs.io/en/latest/user-guide/index.html>`__.

**Note**: ``mamba`` is a C++ reimplementation of conda and can be found `here <https://github.com/mamba-org/mamba>`__.

Note this is *pre-alpha* software and is not yet stable enough for end-users (expect large API changes until the first stable 1.0 release).

Developers should follow the development install instructions found in the
`documentation <https://ctapipe.readthedocs.org/en/latest/getting_started/>`__.
`documentation <https://ctapipe.readthedocs.io/en/latest/developer-guide/getting-started.html>`__.
10 changes: 8 additions & 2 deletions ctapipe/containers.py
Original file line number Diff line number Diff line change
Expand Up @@ -772,6 +772,7 @@ class SimulationConfigContainer(Container):
Configuration parameters of the simulation
"""

run_number = Field(np.int32(-1), description="Original sim_telarray run number")
corsika_version = Field(nan, description="CORSIKA version * 1000")
simtel_version = Field(nan, description="sim_telarray version * 1000")
energy_range_min = Field(
Expand Down Expand Up @@ -920,12 +921,17 @@ class ReconstructedGeometryContainer(Container):
"reconstructed core position uncertainty along tilted frame Y axis",
unit=u.m,
)
h_max = Field(nan * u.m, "reconstructed height of the shower maximum", unit=u.m)
h_max = Field(
nan * u.m,
"reconstructed vertical height above sea level of the shower maximum",
unit=u.m,
)
h_max_uncert = Field(nan * u.m, "uncertainty of h_max", unit=u.m)

is_valid = Field(
False,
(
"direction validity flag. True if the shower direction"
"Geometry validity flag. True if the shower geometry"
"was properly reconstructed by the algorithm"
),
)
Expand Down
4 changes: 3 additions & 1 deletion ctapipe/core/telescope_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -217,8 +217,10 @@ def __getitem__(self, tel: Optional[Union[int, str]]):
try:
return self._value_for_tel_id[tel]
except KeyError:
if tel not in self._subarray.tel:
raise KeyError(f"No telescope with id {tel} in subarray")
raise KeyError(
f"TelescopeParameterLookup: no "
"TelescopeParameterLookup: no "
f"parameter value was set for telescope with tel_id="
f"{tel}. Please set it explicitly, "
f"or by telescope type or '*'."
Expand Down
13 changes: 13 additions & 0 deletions ctapipe/core/tests/test_telescope_component.py
Original file line number Diff line number Diff line change
Expand Up @@ -342,3 +342,16 @@ class Foo(TelescopeComponent):

f = Foo(mock_subarray, bar=[("type", "*", 1), ("id", 1, None)])
assert f.bar.tel[1] is None


def test_telescope_parameter_nonexistent_telescope(mock_subarray):
class Foo(TelescopeComponent):
bar = IntTelescopeParameter(
default_value=None,
allow_none=True,
).tag(config=True)

foo = Foo(subarray=mock_subarray)

with pytest.raises(KeyError, match="No telescope with id 0"):
foo.bar.tel[0]
104 changes: 104 additions & 0 deletions ctapipe/image/cleaning.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
__all__ = [
"tailcuts_clean",
"dilate",
"time_clustering",
"mars_cleaning_1st_pass",
"fact_image_cleaning",
"apply_time_delta_cleaning",
Expand All @@ -27,6 +28,7 @@
from abc import abstractmethod

import numpy as np
from sklearn.cluster import DBSCAN

from ..core import TelescopeComponent
from ..core.traits import (
Expand Down Expand Up @@ -111,6 +113,71 @@ def tailcuts_clean(
)


def time_clustering(
geom,
image,
time,
minpts=5,
eps=1.0,
time_scale_ns=4.0,
space_scale_m=0.25,
hard_cut_pe=4,
):
"""
Clean an image by selecting pixels which pass a time clustering algorithm using DBSCAN.
Tobychev marked this conversation as resolved.
Show resolved Hide resolved
Previously used for HESS [timecleaning]_.
Tobychev marked this conversation as resolved.
Show resolved Hide resolved

As a neighbor-based image extractor algorithm can lead to biases in the time reconstruction of noise pixels,
specially those next to the shower, a cut in the minimum signal image is applied.

DBSCAN runs with the reconstructed times and pixel positions after scaling. Scaling is needed because eps
is not dimension dependent. If scaling is performed properly, eps can be set to 1. DBSCAN returns the
cluster IDs of each point. Pixels associated to cluster ID -1 are classified as noise.

Parameters
----------
geom: `ctapipe.instrument.CameraGeometry`
Camera geometry information
image: array
pixel charge information
time: array
pixel timing information
minpts: int
Minimum number of points to consider a cluster
eps: float
Minimum distance in dbscan
time_scale_ns: float
Time scale in ns
space_scale: float
Space scale in m
clara-escanuela marked this conversation as resolved.
Show resolved Hide resolved
hard_cut_pe: float
Hard cut in the number of signal pe

Returns
-------
A boolean mask of *clean* pixels.
"""
precut_mask = image > hard_cut_pe

arr = np.zeros(len(image), dtype=float)
arr[~precut_mask] = -1

pix_x = geom.pix_x.value[precut_mask] / space_scale_m
clara-escanuela marked this conversation as resolved.
Show resolved Hide resolved
pix_y = geom.pix_y.value[precut_mask] / space_scale_m

X = np.column_stack((time[precut_mask] / time_scale_ns, pix_x, pix_y))

labels = DBSCAN(eps=eps, min_samples=minpts).fit_predict(X)

# no_clusters = len(np.unique(labels))-1 # Could be used for gh separation

y = np.array(arr[(arr == 0)])
clara-escanuela marked this conversation as resolved.
Show resolved Hide resolved
y[(labels == -1)] = -1
arr[arr == 0] = y
mask = arr == 0 # we keep these events
return mask


def mars_cleaning_1st_pass(
geom,
image,
Expand Down Expand Up @@ -532,6 +599,43 @@ def __call__(
)


class TimeCleaner(ImageCleaner):
"""
Clean images using the time clustering cleaning method
"""

space_scale_m = FloatTelescopeParameter(
default_value=0.25, help="Pixel space scaling parameter in m"
Copy link
Member

@maxnoe maxnoe Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we determine the scale parameters from the information in CameraDescription?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure. That parameter needs optimization, it could 2x or 3x the pixel spacing of the camera and teh optimized number depends on the telescope

).tag(config=True)
time_scale_ns = FloatTelescopeParameter(
default_value=4.0, help="Time scale parameter in ns"
).tag(config=True)
minpts = IntTelescopeParameter(
default_value=5, help="minimum number of points to form a cluster"
).tag(config=True)
eps = FloatTelescopeParameter(
default_value=1.0, help="minimum distance in DBSCAN"
).tag(config=True)
hard_cut_pe = FloatTelescopeParameter(
default_value=2.5, help="Hard cut in the number of pe"
).tag(config=True)

def __call__(
self, tel_id: int, image: np.ndarray, arrival_times=None
) -> np.ndarray:
"""Apply HESS image cleaning. see ImageCleaner.__call__()"""
return time_clustering(
geom=self.subarray.tel[tel_id].camera.geometry,
image=image,
time=arrival_times,
eps=self.eps.tel[tel_id],
space_scale_m=self.space_scale_m.tel[tel_id],
time_scale_ns=self.time_scale_ns.tel[tel_id],
minpts=self.minpts.tel[tel_id],
hard_cut_pe=self.hard_cut_pe.tel[tel_id],
)


class MARSImageCleaner(TailcutsImageCleaner):
"""
1st-pass MARS-like Image cleaner (See `ctapipe.image.mars_cleaning_1st_pass`)
Expand Down
67 changes: 67 additions & 0 deletions ctapipe/image/tests/test_cleaning.py
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,73 @@ def test_apply_time_delta_cleaning(prod3_lst):
assert (test_mask == td_mask).all()


def test_time_cleaning():
geom = CameraGeometry.from_name("LSTCam")
charge = np.zeros(geom.n_pixels, dtype=np.float64)
peak_time = np.zeros(geom.n_pixels, dtype=np.float64)

core_pixel = 10
core_neighbors = geom.neighbors[core_pixel]

charge[core_pixel], charge[core_neighbors] = 15, 5
peak_time[core_pixel], peak_time[core_neighbors] = (0.0, 0.0)

mask = cleaning.time_clustering(
geom,
charge,
peak_time,
eps=1.0,
space_scale_m=0.25,
time_scale_ns=4.0,
hard_cut_pe=10,
minpts=5,
)

assert np.count_nonzero(mask) == 0

mask = cleaning.time_clustering(
geom,
charge,
peak_time,
eps=1.0,
space_scale_m=0.25,
time_scale_ns=4.0,
hard_cut_pe=1,
minpts=5,
)

assert np.count_nonzero(mask) == 7

mask = cleaning.time_clustering(
geom,
charge,
peak_time,
eps=1.0,
space_scale_m=0.25,
time_scale_ns=4.0,
hard_cut_pe=1,
minpts=8,
)

assert np.count_nonzero(mask) == 0

charge[core_pixel], charge[core_neighbors] = 15, 5
peak_time[core_pixel], peak_time[core_neighbors] = (0.0, 30.0)

mask = cleaning.time_clustering(
geom,
charge,
peak_time,
eps=1.0,
space_scale_m=0.25,
time_scale_ns=4.0,
hard_cut_pe=1,
minpts=5,
)

assert np.count_nonzero(mask) == 6


def test_time_constrained_clean():
geom = CameraGeometry.from_name("LSTCam")
charge = np.zeros(geom.n_pixels, dtype=np.float64)
Expand Down
20 changes: 7 additions & 13 deletions ctapipe/instrument/camera/geometry.py
Original file line number Diff line number Diff line change
Expand Up @@ -800,7 +800,7 @@ def pixel_moment_matrix(self):
x = self.pix_x.value
y = self.pix_y.value

return np.row_stack(
return np.vstack(
[
x,
y,
Expand Down Expand Up @@ -954,12 +954,13 @@ def position_to_pix_index(self, x, y):
pix_indices: Pixel index or array of pixel indices. Returns -1 if position falls
outside camera
"""

if not self._all_pixel_areas_equal:
logger.warning(
" Method not implemented for cameras with varying pixel sizes"
)
unit = x.unit
scalar = x.ndim == 0

points_searched = np.dstack([x.to_value(unit), y.to_value(unit)])
circum_rad = self._pixel_circumradius[0].to_value(unit)
kdtree = self._kdtree
Expand All @@ -969,8 +970,9 @@ def position_to_pix_index(self, x, y):
del dist
pix_indices = pix_indices.flatten()

invalid = np.iinfo(pix_indices.dtype).min
# 1. Mark all points outside pixel circumeference as lying outside camera
pix_indices[pix_indices == self.n_pixels] = -1
pix_indices[pix_indices == self.n_pixels] = invalid

# 2. Accurate check for the remaing cases (within circumference, but still outside
# camera). It is first checked if any border pixel numbers are returned.
Expand Down Expand Up @@ -1006,17 +1008,9 @@ def position_to_pix_index(self, x, y):
)
del dist_check
if index_check != insidepix_index:
pix_indices[index] = -1

# print warning:
for index in np.where(pix_indices == -1)[0]:
logger.warning(
" Coordinate ({} m, {} m) lies outside camera".format(
points_searched[0][index, 0], points_searched[0][index, 1]
)
)
pix_indices[index] = invalid

return pix_indices if len(pix_indices) > 1 else pix_indices[0]
return np.squeeze(pix_indices) if scalar else pix_indices

@staticmethod
def simtel_shape_to_type(pixel_shape):
Expand Down
Loading