Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cda.csv row count discrepancy #69

Open
jjturner opened this issue Dec 29, 2021 · 4 comments
Open

cda.csv row count discrepancy #69

jjturner opened this issue Dec 29, 2021 · 4 comments
Labels

Comments

@jjturner
Copy link

cda.csv initially downloaded when obtaining the book and result of my COPY command:

  • 440510 rows of data

COPY operation as illustrated in the book:

  • 387546 rows of data
@robconery
Copy link
Contributor

I think something went off the rails a little bit with 1.1. I might have to revert.

@robconery robconery added the bug label Jan 3, 2022
@greywidget
Copy link

I agree with the above counts (I also get 440510) but think this is more than just a row count discrepancy.

If you follow links in the pdf: Ring Dust | Calculating Cassini's Speed and then page down 3 pages you will see some SQL:

 select time_stamp,
  x_velocity,
  y_velocity,
  z_velocity,
sqrt(
  (x_velocity * x_velocity) + 
  (y_velocity * y_velocity) + 
  (z_velocity * z_velocity)
)::numeric(10,2) as v_kms
from cda.impacts
where x_velocity <> -99.99;

Which produces data show on the next page of the pdf, for which the first two rows have the following timestamps:

  • 2005-04-04 18:12:57.6-07
  • 2005-04-04 20:32:38.4-07

I don't believe I have this data in my file.
I've downloaded the data several times by following the Archives for the Cassini Mission link at the red4 archive both on Windows and Mac. And I've unzipped it with several utilities.

The cda.csv file has 440510 data rows and that is the count that ends up in my import.cda and cda.impacts file.

By my calculation, a timestamp with a date of 2005-04-04 should have an impact_event_time in cda.csv that begins with 2005-094 but there is no such text in cda.csv.

The earliest data I can find in my downloaded cda.csv is for 2005-01-01 and if I run the following SQL:

with t1 as (
select time_stamp,
x_velocity, y_velocity, z_velocity,
sqrt(
  (x_velocity * x_velocity) +
  (y_velocity * y_velocity) +
  (z_velocity * z_velocity)
)::numeric(10, 2) as v_kms
from cda.impacts
where x_velocity <> -99.99
  )
select * from t1
order by time_stamp;

I get data that exactly matches that shown in closed issue #43

I wonder it someone else coud check the download of cda.csv and confirm/deny the presence of data for 2005-04-04?
I would like to be able to get data to match that shown in the pdf it I am to carry on with the rest of the tutorial.

@robconery
Copy link
Contributor

I accessed my own data archives and can confirm that I have the same count as the both of you for the CDA csv file. I remember when I was preparing the downloads I was worried about file sizes so I was going to trim columns and records that weren't needed (the CDA data is gigantic) which evidently was in the first release. The second, however, appears to have more records in it.

I'm still trying to figure out what's going on and I will! I normally leave myself exhaustive notes about the choices I made but I can't seem to locate anything for the CDA - mostly because I use the INMS data for the rest of the book.

To be clear: the choice was gigs and gigs of CDA data that we then pare down, or me just clipping and dropping what we need... not an easy choice and now we can see why :).

Stay tuned...

@greywidget
Copy link

Thanks @robconery appreciate you looking at this.

Yeah I can see that the CDA extract process changed over time, which is a good thing!
I don't really want to be pulling down all that RAW data :-)

nice one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants