Python implementation of the Atmospheric Lidar Data Augmentation (ALiDAn) framework & a learning PyTorch-based pipeline of lidar analysis.
ALiDAn is an end-to-end physics- and statistics-based simulation framework of lidar measurements [1]. This framework aims to promote the study of dynamic phenomena from lidar measurements and set new benchmarks.
The repository also includes a spatiotemporal and synergistic lidar calibration approach [2], which forms a learning pipeline for additional algorithms such as inversion of aerosols, aerosol typing etc.
Note: This repository is still under final preparations. It will hold the supplemental data and code for the papers [1] and [2]. To receive a notification when the code is ready, you are welcome to add our repository to your "star" & "watch" repositories :)
[1] Adi Vainiger, Omer Shubi, Yoav Schechner, Zhenping Yin, Holger Baars, Birgit Heese, Dietrich Althausen, "ALiDAn: Spatiotemporal and Multi--Wavelength Atmospheric Lidar Data Augmentation”, IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1-17, 2022.
[2] Adi Vainiger, Omer Shubi, Yoav Schechner, Zhenping Yin, Holger Baars, Birgit Heese, Dietrich Althausen, "Supervised learning calibration of an atmospheric lidar” IEEE International Geoscience and Remote Sensing Symposium (2022).
I. Czerninski, Y. Sde Chen, M. Tzabari, Y.Bertschy , M. Fisher, J. Hofer, A. Floutsi, R. Hengst, I. Talmon, and D. Yagodin, The Taub Foundation, Ollendorff Minerva Center. The authors acknowledge the financial contributions and the inspiring framework of the ERC Synergy Grant “CloudCT” (Number 810370).
pyALiDAn derives data from measurements, reanalyses, and assimilation databases such as PollyNet, AERONET by NASA , GDAS NOAA, ERA5, etc. Such data varies by geographic location, spatially, temporally, and spectrally. For handling and visualizing we chose to use xarray, pandas, and seaborn. SQLite is used for information extraction from databases, ARLreader is used to read the NOAA ARLs data. Additional science codes are used for physics or machine learning models, as SciPy, lidar_molecular and more. The learning section relies on PyTorch, PyTorch Lightning and RAY. These are wonderful learning packages, if you are not familiar they have many tutorials.
We are grateful to the developers and creators of the above libraries.
To get the code simply clone it -
git clone https://github.com/Addalin/learning_lidar.git
Then, to setup the environment -
cd learning_lidar
conda env create -f environment.yml
Activate it by -
conda activate lidar
Run python setup.py develop
to locally install the lidar learning package -
this is not currently necessary but can assist with missing paths when running scripts from command line.
Each script can be run separately. They all use the command line format, with the base arguments of
--station_name, --start_date, --end_date, --plot_results, --save_ds
, and additional agruments based on the specific script.
For example to run generation main:
python generation_main.py --station_name haifa --start_date 2017-09-01 --end_date 2017-10-31 --plot_results --save_ds
Where relevant, use the --use_km_unit
flag to use km units vs m units.
Under learning_lidar
:
In general, each sub folder corresponds to a process, and each standalone script is in a different file, and has a corresponding <script_name>_utils
file for subroutines,
There is a general utils
folder, and additional minor scripts and notebooks not mentioned here.
- Main script is
preprocessing/preprocessing.py
- converts raw data into clean format.
- Specifically can be used to:
- download and convert gdas files with the
--download_gdas
and--convert_gdas
flags - generate molecular
--generate_molecular_ds
, lidar--generate_lidar_ds
or raw lidar--generate_raw_lidar_ds
--unzip_lidar_tropos
to automatically unzip downloaded TROPOS lidar data.
- download and convert gdas files with the
-
Generates ALiDAn data.
generation/generation_main.py
is a wrapper for the different parts of the process and and can be used to to run everything at once for a given period. It includes:- Background Signal (
genage_bg_signals
) - Angstrom Exponent and optical depth (
read_AERONET_data
) - KDE Estimation (
KDE_estimation_sample
) - Lidar Constant (
generate_LC_pattern
) - Density generation (
generate_density
) - signal generation (
daily_signals_generation
)
- Background Signal (
-
Additional code:
- Figures output and validation of ALiDAn [1] are under [generation/ALiDAn Notebooks](generation/ALiDAn Notebooks).
- Large parts of the code were initially written as notebooks, then manually converted to py files.
- For example under
generation/legacy
are the original notebooks. generate_bg_signals
has been converted to py, but not yet generalized to any time period, thus the original notebook is still in the main generation folder.overlap.ipynb
hasn't been converted to py yet. Overlap is an additional part of the generation process- Figures that were necessary for the paper are saved under the
figures
subdirectory. Only relevant if the--plot_results
flag is present.
- For example under
- Main script is
dataseting/dataseting.py
- Flags:
- Used to create a csv of the records -
--do_dataset
--extend_dataset
to add additional info to the dataset- Create calibration dataset from the extended df -
--do_calibration_dataset
--create_train_test_splits
to create train test splits--calc_stats
to calculate mean, min, max, std statistics--create_time_split_samples
to split up the dataset into small intervals.- Note, use
--generated_mode
to apply the operations on the generated data (vs the raw tropos data)
- Used to create a csv of the records -
The learning pipeline is designed to receive two data types: raw lidar measurements by pollyXT and simulated by ALiDAn. The implementation is oriented to lidar calibration. However, one can easily apply any other model.
- Deep learning module to predict 'Y' given 'X'.
- Makes use of parameters from run_params.py.
- Configure the params as desired, then run the NN with
python main_lightning.py
- The models are implemented with PyTorch Lightning, currently only calibCNN.py.
analysis_LCNet_results
extracts the raw the results from a results folder and displays many comparisons of the different trials. NOTE: currentlyanalysis_LCNet_results.ipynb
is old results with messy code. Updated code is at analysis_LCNet_results_no_overlap.ipynb and this is the notebook that should be used!- model_validation.py is a script that was barely used yet but is meant to be used to load a pretrained model and use it to reproduce results.
- The data folder contains both data necessary for the generation, and csv files that are created in the dataseting stage,
and needed as input for learning phase. Specifically -
stations.csv
defines stations, currently also relevant when working on a different computer.dataset_<station_name>_<start_date>_<end_date>.csv
contain links to the actual data paths. Each row is a record.
- There are many todos in the code, some of which are crucial for certain stages, and some 'nice to have'.
- The run_script.sh can be used as an example of how to run parts of the code from the terminal with the commandline arguments, for example for different dates.
- Paths_lidar_learning.pptx is for the planned changes to the data paths - which are meant to be much more organized, easier to maintain and less dependent.
- The pyALiDAn_dev - is a private folder of ongoing research.