Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Product values differ between Intel and AMD platforms #79

Open
collinss-jpl opened this issue Apr 2, 2024 · 0 comments
Open

Product values differ between Intel and AMD platforms #79

collinss-jpl opened this issue Apr 2, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@collinss-jpl
Copy link
Contributor

When running the DSWx-NI SAS Interface delivery on an Intel-based EC2 instance (for example c6i.2xlarge), the comparison of output and expected products using the dswx_comparison.py script produces the following comparison failures:

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B01_WTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B01_WTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Water classification (WTR)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "0" in position (x: 2615, y: 460) whereas input 2 has value "1" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B02_BWTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B02_BWTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Binary Water classification (BWTR)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "0" in position (x: 2615, y: 460) whereas input 2 has value "1" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B03_CONF.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B03_CONF.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Confidence values (CONF)"
            * input 1 has value "2" in position (x: 2610, y: 313) whereas input 2 has value "1" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "99" in position (x: 2527, y: 0) whereas input 2 has value "100" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B01_WTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B01_WTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Water classification (WTR)"
            * input 1 has value "1" in position (x: 2260, y: 68) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B02_BWTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B02_BWTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Binary Water classification (BWTR)"
            * input 1 has value "1" in position (x: 2260, y: 68) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B03_CONF.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B03_CONF.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Confidence values (CONF)"
            * input 1 has value "6" in position (x: 3617, y: 16) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SLT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
            * input 1 has value "54" in position (x: 2046, y: 0) whereas input 2 has value "51" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMS_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMS_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
/data/home/collinss/OPERA/DSWx-NI/interface/0.1/dswx_comparison.py:59: RuntimeWarning: overflow encountered in ubyte_scalars
  if (abs(image_1[i, j] - image_2[i, j]) >=
            * input 1 has value "89" in position (x: 33, y: 0) whereas input 2 has value "90" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B01_WTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B01_WTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Water classification (WTR)"
            * input 1 has value "1" in position (x: 343, y: 338) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B02_BWTR.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B02_BWTR.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Binary Water classification (BWTR)"
            * input 1 has value "1" in position (x: 343, y: 338) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B03_CONF.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B03_CONF.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Confidence values (CONF)"
            * input 1 has value "6" in position (x: 285, y: 16) whereas input 2 has value "0" in the same position.

Comparing files:
    file 1: sample_data/expected_output/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240329T181033Z_LSAR_30_v0.1_B04_DIAG.tif
    file 2: sample_data/output_dir/OPERA_L3_DSWx-NI_T11SMT_20110226T061749Z_20240402T154233Z_LSAR_30_v0.1_B04_DIAG.tif
[OK]Comparing number of bands
Comparing DSWx bands...
[FAIL]     Band 1 - Diagnostic layer (DIAG)"
            * input 1 has value "50" in position (x: 25, y: 0) whereas input 2 has value "47" in the same position.

When running the same test and comparisons on an AMD-based EC2 instance (c6a.2xlarge), all tests pass cleanly. This indicates that the DSWX-SAR code is susceptible to floating point precision/rounding errors between Intel and AMD, giving slightly different (incorrect?) values on Intel machines. Note that similar behavior has also been observed when running the DSWx-S1 SAS.

This is a potential issue since we sometimes allocate both Intel and AMD instance types in the same auto-scaling worker pool in the OPERA SDS.

A set of sample DSWx-NI outputs generated on an Intel instance can be downloaded from s3://opera-dev-lts-fwd-collinss/acceptance_test/dswx_ni/interface_0.1/

@collinss-jpl collinss-jpl added the bug Something isn't working label Apr 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants