Results on stationary MRMS-only data #29

ValterFallenius · 2022-04-22T14:12:25Z

ValterFallenius
Apr 22, 2022

I have finalized my model with a simpler setup than I initially planned. With 8 leadtimes and 15 minute spacing the model achieves something but I after rigorous testing the results are still pretty poor. See F1-score plotted below when compared with persistence:

I suspect I have too few training samples. MetNet uses 1.7M data samples before they stopped observing overfitting. I have trained a network on 4,400 samples that does something but it doesn't perform nearly as well as the MetNet. In the beginning of my project I decided to make it easy for myself and work with a stationary model, this lead to less data available.

My available data before preprocessing: 5 years, 365 days, 90 minute data samples ---> 30,000 samples

After sorting out all samples with less than 5 pixels of rain in any lead time only 4,400 samples remain.

MetNet has an input patch of size 1024km x 1024km and a total coverage of 7000km x 2500km, this gives ~15 non-overlapping geographical locations.

MetNets available data: 1.5 years, 365 days, 90 minute samples, 15 geographical non-overlapping locations ---> 131,000 samples

Same sorting technique leaves only ~20,000 samples.

Since this is way less than 1.7M we can assume they do not use non-overlapping geographical locations, instead this is randomly sampled with yields many more data points.

Some examples of successes and fails:

I am contemplating implementing a non-stationary model, however this would require some time that I might not have since my thesis is due in 1 month.

JackKelly · 2022-04-22T20:14:34Z

JackKelly
Apr 22, 2022
Maintainer

Thanks loads for sharing!

Another option might be to pre-train MetNet on a different dataset (e.g. the same dataset that the MetNet authors used) and then fine-tune on your dataset. I think @jacobbieker is busy uploading at least some of the relevant data.

Another option might be to use a simpler model? We were recently involved in an ML competition where the task was to predict the next 2 hours of satellite imagery. U-Nets did really well. Here's the winning ML model: https://github.com/jmather625/climatehack

3 replies

jacobbieker Apr 23, 2022
Maintainer

Yeah, I'm working on adding an expanded copy of the MetNet and MetNet-2 data. Currently, only have US MRMS data available though here: https://huggingface.co/datasets/openclimatefix/mrms You can just download the data files individually, as the dataset script isn't ready yet. My computer is currently uploading the full 2019, and most of 2018/2016 data now, and already there is the full 2017,2021, and first few months of 2022 uploaded. So at the least, it should allow for right now 2.25ish years of data, and soon 4-4.5 years of data I think either later today or early next week.

Each year is the 2-minutely PrecipRate output from MRMS for the whole continental US, so roughly 250,000 measurements for each year.

jacobbieker Apr 23, 2022
Maintainer

I have metadata with kerchunk for streaming in data from GOES-16/17 but there will be more work before I can get it to work correctly and upload to HF, so it might or might not be ready in time for your thesis, unfortunately.

ValterFallenius Apr 30, 2022
Author

Hello again,

I have implemented a network where the frame is moving, providing me with 16 times more data (non-overlapping targets) on the Swedish dataset. I probably won't be using this for my thesis since the initial results don't seem to improve that much, but it will be interesting to see what it does.

This is not so important but as I am writing my thesis I wanted to cite this repo and I just found out it's possible to implement automatic citation generator in GitHub by adding a CITATION.cff file to a repo. I already did a manual bibtex citation but it could be a good idea for you, especially so credit is given where credit is due, are there more people working on MetNet than you two @JackKelly @jacobbieker ?

/Valter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results on stationary MRMS-only data #29

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Results on stationary MRMS-only data #29

ValterFallenius Apr 22, 2022

Replies: 1 comment · 3 replies

JackKelly Apr 22, 2022 Maintainer

jacobbieker Apr 23, 2022 Maintainer

jacobbieker Apr 23, 2022 Maintainer

ValterFallenius Apr 30, 2022 Author

ValterFallenius
Apr 22, 2022

Replies: 1 comment 3 replies

JackKelly
Apr 22, 2022
Maintainer

jacobbieker Apr 23, 2022
Maintainer

jacobbieker Apr 23, 2022
Maintainer

ValterFallenius Apr 30, 2022
Author