Results on stationary MRMS-only data #29
ValterFallenius
started this conversation in
Show and tell
Replies: 1 comment 3 replies
-
Thanks loads for sharing! Another option might be to pre-train MetNet on a different dataset (e.g. the same dataset that the MetNet authors used) and then fine-tune on your dataset. I think @jacobbieker is busy uploading at least some of the relevant data. Another option might be to use a simpler model? We were recently involved in an ML competition where the task was to predict the next 2 hours of satellite imagery. U-Nets did really well. Here's the winning ML model: https://github.com/jmather625/climatehack |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have finalized my model with a simpler setup than I initially planned. With 8 leadtimes and 15 minute spacing the model achieves something but I after rigorous testing the results are still pretty poor. See F1-score plotted below when compared with persistence:
I suspect I have too few training samples. MetNet uses 1.7M data samples before they stopped observing overfitting. I have trained a network on 4,400 samples that does something but it doesn't perform nearly as well as the MetNet. In the beginning of my project I decided to make it easy for myself and work with a stationary model, this lead to less data available.
My available data before preprocessing: 5 years, 365 days, 90 minute data samples ---> 30,000 samples
After sorting out all samples with less than 5 pixels of rain in any lead time only 4,400 samples remain.
MetNet has an input patch of size 1024km x 1024km and a total coverage of 7000km x 2500km, this gives ~15 non-overlapping geographical locations.
MetNets available data: 1.5 years, 365 days, 90 minute samples, 15 geographical non-overlapping locations ---> 131,000 samples
Same sorting technique leaves only ~20,000 samples.
Since this is way less than 1.7M we can assume they do not use non-overlapping geographical locations, instead this is randomly sampled with yields many more data points.
Some examples of successes and fails:
I am contemplating implementing a non-stationary model, however this would require some time that I might not have since my thesis is due in 1 month.
Beta Was this translation helpful? Give feedback.
All reactions