-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reporting of results that can be validated by others; with code that is reproducible #121
Comments
Thanks @tomasvanoyen for all this. Did you have some Icon and Satellite results aswell? |
Hi @peterdudfield , I made some runs also to compare between with and without satellite input: Specific:
The results are a bit strange w.r.t. the role of satellite images. Following the steps, I mention above you should be able to reproduce them. I would be interested to hear if I made a mistake somewhere (hopefully ! ) or something else is going on .. |
Thanks @tomasvanoyen this is really interesting. Really appreciate you doing this. The result I have with satellite are here and I agree, something doesnt match up. Your error bars look alot bigger than, mine, so just wondering with how many samples you trained your model? |
@tomasvanoyen , thanks for this. It is really helpful. I also have that type of results. I'm only using the existing uk_pv.py configs in which I only put the NWP data source to None in get_data_source_kwargs() The model is then just using an history of PV output and the last available image from Eumetsat where we do the mean of irradiance in a 0.5 degree lat,long square around the pv_id locations. So thinking about it, it makes sense that the error increases fastly through the horizons as there is no info regarding the future. The NWP could provide some information for each horizon but we don't include that here. @peterdudfield, the big difference here is that Tomas does not include NWP data. Note that adding the Open Meteo data from the API is helping significantly and plays a similar role as the NWP. |
Thanks @lorenpelli thats very interesting, do you have any plots to show some results? Open Meteo is indeed userful for some point forecasts. I seem to remeber @tomasvanoyen you said you ran it with some ICON data? Do you have the results for this? |
Sure, here are my results: uk_pv_sat_only: NWP set to None, keeping only satellite and PV outputs data. It's interesting that the satellite data brings nothing (even in the short term) when using Open Meteo data. |
Very interesting. Yea open meteo unfortunately doesn't give historic, or maybe it does now but you have to pay for it. Silly question, but is it worth checking the values of the satellite variables that go into the model, and just checking they are not Nans or filled in to zeros? |
Here is an example of feature fed into the model (including Open Meteo variables): |
Looks like it has real satellite numbers and not nans in there |
How many examples did you train with? One thing that you might want to try, is restricting just to 8 hours. The full 48 hours model might struggler to fully understand the full satellite usefulness. This is what I did here I shoul know this, but is there a feature that says what horizon it is predicting? We might need this to help the satellite data improve the early horizons? |
Thanks for the advice, I'll try the training on 8 hours. I was using all the default parameters from the current repo. I keep you in touch. The forecast_horizons variable is indeed in the feature. |
Thanks @lorenpelli |
Here is the evaluation of Satellite+OpenMeteo for 8 hours horizons. The range of MAE is very similar to what is described in https://github.com/openclimatefix/pv-site-prediction/tree/main/exp_reports/013_satellite#backtest-using-satellite. |
@peterdudfield , how do you explain that pv-site-prediction has so high MAE results compared, for example, to PVNet? On the one hand, I find pv-site-prediction quite elaborated and well done, but on the other hand the results obtained are not very good compared to simple LSTM or Neural Net using the same features. Example of just using Open Meteo on my SMA inverter output with a LSTM regressor: |
TL;DR This is because the models used inside pv-site-prediction are quite simple. The approach here is to use feature engineering alongside a simple model, whereas with models like PVNet you let the big deep learning model do the feature engineering for you. The former will be way faster and require less data, but the latter might find more subtle patterns that are hard to feature engineer. The game changer features with the pv-site-prediction approach are the NWP. This is because a fancy physics simulation has already done the calculations of how much sun, cloud, etc. will be at a given point and time, and that's basically what we need for accurate predictions at that given point and time. Without NWP you would need to learn these patterns from sattelite images and PV output and that's just not possible. Also to hope to replace NWP with sattelite, you would need to use a big sattelite map, and replace our simple model (random forest) with a bigger deep learning model that can learn fancier patterns on images but then that's PVNet! If you use the |
Thanks @simlmx for this, very useful. @lorenpelli im interested in your results above. Did you test this on one of the uk_pv sites, or whats this your own? Does the data look differently somehow? If the sites are different, would you be able to run LSTM on the uk_pv sites, and see if you can compare the results? |
@peterdudfield, the LSTM results were trained on my own installation and not on the uk_pv sites. I will try to test an LSTM in pv-site-prediction framework and keep you in touch. It will be easier to see if the LSTM really add value here or if the good results on my site were just due to the data themselves (or something else). |
On the one hand, I truly applaud the open climate fix initiative in general. On the other, I have spend a fair amount of (spare) time trying to reproduce any result; and it turns out not to be straightforward - to say the least.
One of the major blocking issues comprises the fact that the NWP used (I believe across all repo's) stems from MetOffice which is a closed data source. As such, reported results cannot be reproduced, and hence the code cannot be validated. This is major drawback on the appeal of the work.
I believe it would be a major step forward if an effort is also taken to provide code/data/experiments that are all readily open and therefore can be directly reproduced and validated.
For instance, all experiments reported in this repo at "exp_reports/013_satellite" comprise NWP source data. It would provide a major step forward if there would also an experiment with only satellite data. The google bucket provides open access to that data source, and hence other developers can pull the code, run that experiment and at least be sure that the code works as reported.
Best regards,
Tomas
The text was updated successfully, but these errors were encountered: