Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LI L2 Accumulated products retrieved from archive ("ARC" 10-min files) have faulty reading #2859

Closed
ameraner opened this issue Jul 19, 2024 · 2 comments · Fixed by #2867
Closed

Comments

@ameraner
Copy link
Member

ameraner commented Jul 19, 2024

Describe the bug
The li_l2_nc reader currently produces faulty results when reading LI L2 accumulated products (AF, AFA, AFR) that are retrieved from the archive (containing ARC in the filename, with 10 minutes sensing window).

This is because the reader is unaware that these files contain actually 20 separate time steps of the accumulated products; currently, when used with the default with_area_definition=True regridding mode, the reader will just regrid all pixels inside the file to the 2km grid with a "last pixel on top" approach, giving faulty results.

NOTE: This issue is not impacting the 30-seconds files as disseminated on Eumetcast. The reading of these files is correct.

Expected behavior
When loading the main variable (e.g. accumulated_flash_area) without further inputs, the reader should sum up all time steps inside the file. UPDATE: this is fixed by #2867

Then, ideally, the reader should know how to split the data inside the files into 20 separate datasets, so that each time step can be accessed singularly. A clean solution for this would likely go over the DataID/DataQuery and/or a Scene.load kwarg. This is captured in issue #2878:

Actual results
One grid of accumulated products with a faulty and confusing "last pixel on top" mechanism.

@markjinz
Copy link

markjinz commented Aug 6, 2024

Hello @ameraner , this is indeed the issue I mentioned to you last time. Is it possible to gather the 20 body chunks without changing the reader ?
For example , we wait for 10 minutes and collect the 20 chunks to make it similar to the file found in Eumetcast (ARC) !

@ameraner
Copy link
Member Author

ameraner commented Aug 6, 2024

Hi @markjinz, the best way to do what you need currently, as far as I know, is to use the MultiScene with a custom blend, as I indicated in #2853
the relevant code would be

import xarray as xr

from satpy import Scene, MultiScene
from satpy.dataset import combine_metadata

def nan_sum(datasets):
    attrs = combine_metadata(*[data_arr.attrs for data_arr in datasets])
    # concatenate datasets in list order to apply sum function later
    concat_ds = xr.concat(datasets, dim="sum_dim")
    # apply sum, using skipna True to keep all valid pixels, min_count=1 to return to NaN if no active pixel found
    # keep_attrs=True since we need e.g. the area attribute later on
    sum_ds = concat_ds.sum(dim="sum_dim", skipna=True, keep_attrs=True, min_count=1)
    sum_ds.attrs = attrs
    return sum_ds


liscn = MultiScene.from_files(li_filenames, reader='li_l2_nc')
liscn.load([li_dataset])
li_scn_b = liscn.blend(nan_sum)

Note that due to the bug of this issue, the results will not be the same. They will be after the PR linked in this issue is merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants