Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAS na values management #20301

Open
kharazu opened this issue Mar 12, 2018 · 9 comments
Open

SAS na values management #20301

kharazu opened this issue Mar 12, 2018 · 9 comments
Labels
Enhancement IO SAS SAS: read_sas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate

Comments

@kharazu
Copy link

kharazu commented Mar 12, 2018

Problem description

Hi, I'm writing this report to ask for a na values management parameter for method .read_sas(), just like it's possible to do for .csv tables properly setting na_values or keep_default_na, it would be really useful to directly set na values for this kind of import without time-consuming workarounds

@TomAugspurger
Copy link
Contributor

What values does sas use for missing values? Is this configurable when the data is written?

@TomAugspurger TomAugspurger added IO Data IO issues that don't fit into a more specific label IO SAS SAS: read_sas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 12, 2018
@spedygiorgio
Copy link

A possible solution could be adding na_values or keep_default_na as per read_csv

@kharazu
Copy link
Author

kharazu commented Mar 14, 2018

@TomAugspurger SAS uses dots (.) for missing values, but read_sas already manages that, what I meant was something like @spedygiorgio already said, I'd like to set a parameter where I can decide if a value (e.g. 'NA') shouldn't be automatically loaded in pandas as a NaN, just like it happens in read_csv via na_values or keep_default_na.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Mar 14, 2018 via email

@spedygiorgio
Copy link

Any news from this?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented May 7, 2018 via email

@spedygiorgio
Copy link

spedygiorgio commented May 7, 2018 via email

@sasutils
Copy link

If someone wants to work on this here is information on how SAS stores missing values.

SAS has 28 missing numeric values. The normal missing is represented in code as a period. The special missing values by a period followed by a letter or an underscore. SAS stores all numbers as 8 byte floating point numbers (in IEEE format on non-IBM mainframe systems). To store the missing values they picked specific bit combinations that are invalid IEEE numbers.

Here is a table of the values they use as 16 character hex strings. I am not sure what order they store the 8 bytes of the floating point numbers in the SAS dataset, but LITTLE is the order that SAS stores it in memory on a PC.

number=. little=0000000000D1FFFF big=FFFF1D0000000000
number=._ little=0000000000D2FFFF big=FFFF2D0000000000
number=.A little=0000000000BEFFFF big=FFFFEB0000000000
number=.B little=0000000000BDFFFF big=FFFFDB0000000000
number=.C little=0000000000BCFFFF big=FFFFCB0000000000
number=.D little=0000000000BBFFFF big=FFFFBB0000000000
number=.E little=0000000000BAFFFF big=FFFFAB0000000000
number=.F little=0000000000B9FFFF big=FFFF9B0000000000
number=.G little=0000000000B8FFFF big=FFFF8B0000000000
number=.H little=0000000000B7FFFF big=FFFF7B0000000000
number=.I little=0000000000B6FFFF big=FFFF6B0000000000
number=.J little=0000000000B5FFFF big=FFFF5B0000000000
number=.K little=0000000000B4FFFF big=FFFF4B0000000000
number=.L little=0000000000B3FFFF big=FFFF3B0000000000
number=.M little=0000000000B2FFFF big=FFFF2B0000000000
number=.N little=0000000000B1FFFF big=FFFF1B0000000000
number=.O little=0000000000B0FFFF big=FFFF0B0000000000
number=.P little=0000000000AFFFFF big=FFFFFA0000000000
number=.Q little=0000000000AEFFFF big=FFFFEA0000000000
number=.R little=0000000000ADFFFF big=FFFFDA0000000000
number=.S little=0000000000ACFFFF big=FFFFCA0000000000
number=.T little=0000000000ABFFFF big=FFFFBA0000000000
number=.U little=0000000000AAFFFF big=FFFFAA0000000000
number=.V little=0000000000A9FFFF big=FFFF9A0000000000
number=.W little=0000000000A8FFFF big=FFFF8A0000000000
number=.X little=0000000000A7FFFF big=FFFF7A0000000000
number=.Y little=0000000000A6FFFF big=FFFF6A0000000000
number=.Z little=0000000000A5FFFF big=FFFF5A0000000000

@killerontherun1
Copy link
Contributor

killerontherun1 commented May 19, 2019

Still open. Are you interested in working on it?

On Mon, May 7, 2018 at 7:48 AM, Giorgio Alfredo Spedicato < @.***> wrote: Any news from this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20301 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIvQ1nM64foPvp5nqqsZMK4vrZYBgks5twEKvgaJpZM4Smu5L .

Hi,

I can see this thread hasn't been active for a while. I'd like to try my hand on this if this is not resolved.

@jbrockmendel jbrockmendel removed the IO Data IO issues that don't fit into a more specific label label Dec 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO SAS SAS: read_sas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

No branches or pull requests

7 participants