Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of other sensors #134

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

yuvraajnarula
Copy link
Contributor

@yuvraajnarula yuvraajnarula commented Jan 31, 2025

Pull Request

Description

Added implementation for other tensor datasets and their unit tests for sensor data processing functionality.

This PR adds comprehensive unit tests for the sensor data processing module, specifically testing the SensorDataset class and collate_fn function. The tests cover various sensor types including AMSU-A, ATMS, MHS, IASI, and CrIS.

Key changes:

  • Added test_sensor_dataset to verify basic dataset functionality
  • Added test_collate_function to ensure proper data batching
  • Added test_sensor_datasets to test multiple sensor types
  • Implemented mock DataCatalog fixture for consistent testing

Fixes #128

Test Results

tests/test_nnjai.py::test_sensor_dataset PASSED                [ 33%]
tests/test_nnjai.py::test_collate_function PASSED             [ 66%]
tests/test_nnjai.py::test_sensor_datasets PASSED              [100%]
================ 3 passed in 7.78s ================

Test output

Testing sensor: amsu-1bamua-NC021023
Dataset length for amsu-1bamua-NC021023: 100
First item from amsu-1bamua-NC021023:
{'timestamp': tensor(1.7383e+09), 'latitude': tensor(45.), 'longitude': tensor(-120.), 'metadata': tensor([250., 260.])}
Metadata for amsu-1bamua-NC021023: tensor([250., 260.])


Testing sensor: atms-atms-NC021203
Dataset length for atms-atms-NC021203: 100
First item from atms-atms-NC021203:
{'timestamp': tensor(1.7383e+09), 'latitude': tensor(45.), 'longitude': tensor(-120.), 'metadata': tensor([250., 260.])}
Metadata for atms-atms-NC021203: tensor([250., 260.])


Testing sensor: mhs-1bmhs-NC021027
Dataset length for mhs-1bmhs-NC021027: 100
First item from mhs-1bmhs-NC021027:
{'timestamp': tensor(1.7383e+09), 'latitude': tensor(45.), 'longitude': tensor(-120.), 'metadata': tensor([250., 260.])}
Metadata for mhs-1bmhs-NC021027: tensor([250., 260.])


Testing sensor: iasi-mtiasi-NC021241
Dataset length for iasi-mtiasi-NC021241: 100
First item from iasi-mtiasi-NC021241:
{'timestamp': tensor(1.7383e+09), 'latitude': tensor(45.), 'longitude': tensor(-120.), 'metadata': tensor([250., 260.])}
Metadata for iasi-mtiasi-NC021241: tensor([250., 260.])


Testing sensor: cris-crisf4-NC021206
Dataset length for cris-crisf4-NC021206: 100
First item from cris-crisf4-NC021206:
{'timestamp': tensor(1.7383e+09), 'latitude': tensor(45.), 'longitude': tensor(-120.), 'metadata': tensor([250., 260.])}
Metadata for cris-crisf4-NC021206: tensor([250., 260.])

Testing Details

  • Created mock data fixtures to avoid external data dependencies
  • Verified tensor shapes, data types, and structures
  • Tested data loading and processing for multiple sensor types
  • Validated collate function for batch processing
  • All tests pass successfully with pytest

Checklist:

  • Code follows OCF's coding style guidelines
  • Performed self-review of code
  • No documentation changes needed (test-only changes)
  • Added comprehensive test suite
  • Checked code for misspellings

The PR focuses on improving test coverage for the sensor data processing pipeline, ensuring reliable data handling across different sensor types while maintaining code quality standards.

@yuvraajnarula yuvraajnarula changed the title Implementation of other tensors Implementation of other sensors Jan 31, 2025
Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! A good improvement over the original one, just a few changes I think, then we can merge this in.

@@ -1,4 +1,5 @@
"""Main import for the complete models"""

from .data.nnjai_wrapp import SensorDataset, collate_fn
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should actually import this here, since we have nnja as an optional dependency. I think to import this dataset, like in the tests, we should just do from graph_weather.data.nnja_ai import SensorDataset for example. I also renamed this on the main branch, but could you update the nnjai_wrapp.py to be named nnja_ai as I think that is maybe more consistent naming.

)


class SensorDataset(Dataset):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class SensorDataset(Dataset):
class NNJADataset(Dataset):

We might (probably will, as I am looking into adding other, non NNJA sensors) have other sensors that won't fit in this dataset. I think naming this just NNJADataset then is more descriptive and easier to parse where this data is coming from.

@jacobbieker jacobbieker mentioned this pull request Feb 8, 2025
6 tasks
@yuvraajnarula
Copy link
Contributor Author

Could you please tell me the status of any changes for this PR?

Copy link
Member

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few changes needed I think for updating the tests and removing an old file. But then I think it would be good to merge. If you can also resolve the merge conflicts, that would be great.

@@ -0,0 +1,106 @@
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you remove this file as its not used now?

@@ -0,0 +1,166 @@
"""
Tests for the nnjai_wrapp module in the graph_weather package.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tests for the nnjai_wrapp module in the graph_weather package.
Tests for the nnja_ai module in the graph_weather package.

time = datetime(2021, 1, 1, 0, 0) # Using datetime object instead of string
primary_descriptors = ["OBS_TIMESTAMP", "LAT", "LON"]
additional_variables = ["TMBR_00001", "TMBR_00002"]
dataset = SensorDataset(dataset_name, time, primary_descriptors, additional_variables, sensor_type="AMSU")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tests still pass? I think they need to be updated for the new names. Ideally, there should be tests for all the different sensor types as well in NNJADataset. You should be able to do that with pytest.parameterize I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for NNJA-AI Data
2 participants