SAS na values management #20301

kharazu · 2018-03-12T14:25:09Z

Problem description

Hi, I'm writing this report to ask for a na values management parameter for method .read_sas(), just like it's possible to do for .csv tables properly setting na_values or keep_default_na, it would be really useful to directly set na values for this kind of import without time-consuming workarounds

TomAugspurger · 2018-03-12T14:58:16Z

What values does sas use for missing values? Is this configurable when the data is written?

spedygiorgio · 2018-03-13T10:39:31Z

A possible solution could be adding na_values or keep_default_na as per read_csv

kharazu · 2018-03-14T08:32:40Z

@TomAugspurger SAS uses dots (.) for missing values, but read_sas already manages that, what I meant was something like @spedygiorgio already said, I'd like to set a parameter where I can decide if a value (e.g. 'NA') shouldn't be automatically loaded in pandas as a NaN, just like it happens in read_csv via na_values or keep_default_na.

TomAugspurger · 2018-03-14T12:59:51Z

Sounds good. I'm just trying to get a sense for whether that actually happens in practice. I'm sure you have more experience with reading SAS files than I do :)

…

On Wed, Mar 14, 2018 at 3:32 AM, kharazu ***@***.***> wrote: @TomAugspurger <https://github.com/tomaugspurger> SAS uses dots (.) for missing values, but read_sas already manages that, what I meant was something like @spedygiorgio <https://github.com/spedygiorgio> already said, I'd like to set a parameter where I can decide if a value (e.g. 'NA') shouldn't be automatically loaded in pandas as a NaN, just like it happens in read_csv via na_values or keep_default_na. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20301 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIkfUi6ECQ8GgWNgsduxL2ri0bZ5nks5teNW4gaJpZM4Smu5L> .

spedygiorgio · 2018-05-07T12:48:42Z

Any news from this?

TomAugspurger · 2018-05-07T12:53:37Z

Still open. Are you interested in working on it?

…

On Mon, May 7, 2018 at 7:48 AM, Giorgio Alfredo Spedicato < ***@***.***> wrote: Any news from this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20301 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIvQ1nM64foPvp5nqqsZMK4vrZYBgks5twEKvgaJpZM4Smu5L> .

spedygiorgio · 2018-05-07T12:57:15Z

Unfortunately I am not so skilled :-( Il giorno lun 7 mag 2018 alle ore 14:53 Tom Augspurger < notifications@github.com> ha scritto:

…

Still open. Are you interested in working on it? On Mon, May 7, 2018 at 7:48 AM, Giorgio Alfredo Spedicato < ***@***.***> wrote: > Any news from this? > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > < #20301 (comment)>, > or mute the thread > < https://github.com/notifications/unsubscribe-auth/ABQHIvQ1nM64foPvp5nqqsZMK4vrZYBgks5twEKvgaJpZM4Smu5L > > . > — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20301 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAyWxnt2_atEL9_VbZ5hvPWhRGlyNgEOks5twEPjgaJpZM4Smu5L> .

sasutils · 2018-06-24T17:35:57Z

If someone wants to work on this here is information on how SAS stores missing values.

SAS has 28 missing numeric values. The normal missing is represented in code as a period. The special missing values by a period followed by a letter or an underscore. SAS stores all numbers as 8 byte floating point numbers (in IEEE format on non-IBM mainframe systems). To store the missing values they picked specific bit combinations that are invalid IEEE numbers.

Here is a table of the values they use as 16 character hex strings. I am not sure what order they store the 8 bytes of the floating point numbers in the SAS dataset, but LITTLE is the order that SAS stores it in memory on a PC.

number=. little=0000000000D1FFFF big=FFFF1D0000000000
number=._ little=0000000000D2FFFF big=FFFF2D0000000000
number=.A little=0000000000BEFFFF big=FFFFEB0000000000
number=.B little=0000000000BDFFFF big=FFFFDB0000000000
number=.C little=0000000000BCFFFF big=FFFFCB0000000000
number=.D little=0000000000BBFFFF big=FFFFBB0000000000
number=.E little=0000000000BAFFFF big=FFFFAB0000000000
number=.F little=0000000000B9FFFF big=FFFF9B0000000000
number=.G little=0000000000B8FFFF big=FFFF8B0000000000
number=.H little=0000000000B7FFFF big=FFFF7B0000000000
number=.I little=0000000000B6FFFF big=FFFF6B0000000000
number=.J little=0000000000B5FFFF big=FFFF5B0000000000
number=.K little=0000000000B4FFFF big=FFFF4B0000000000
number=.L little=0000000000B3FFFF big=FFFF3B0000000000
number=.M little=0000000000B2FFFF big=FFFF2B0000000000
number=.N little=0000000000B1FFFF big=FFFF1B0000000000
number=.O little=0000000000B0FFFF big=FFFF0B0000000000
number=.P little=0000000000AFFFFF big=FFFFFA0000000000
number=.Q little=0000000000AEFFFF big=FFFFEA0000000000
number=.R little=0000000000ADFFFF big=FFFFDA0000000000
number=.S little=0000000000ACFFFF big=FFFFCA0000000000
number=.T little=0000000000ABFFFF big=FFFFBA0000000000
number=.U little=0000000000AAFFFF big=FFFFAA0000000000
number=.V little=0000000000A9FFFF big=FFFF9A0000000000
number=.W little=0000000000A8FFFF big=FFFF8A0000000000
number=.X little=0000000000A7FFFF big=FFFF7A0000000000
number=.Y little=0000000000A6FFFF big=FFFF6A0000000000
number=.Z little=0000000000A5FFFF big=FFFF5A0000000000

killerontherun1 · 2019-05-19T11:06:29Z

Still open. Are you interested in working on it?
…
On Mon, May 7, 2018 at 7:48 AM, Giorgio Alfredo Spedicato < @.***> wrote: Any news from this? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#20301 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIvQ1nM64foPvp5nqqsZMK4vrZYBgks5twEKvgaJpZM4Smu5L .

Hi,

I can see this thread hasn't been active for a while. I'd like to try my hand on this if this is not resolved.

TomAugspurger added IO Data IO issues that don't fit into a more specific label IO SAS SAS: read_sas Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate labels Mar 12, 2018

jbrockmendel removed the IO Data IO issues that don't fit into a more specific label label Dec 1, 2019

mroeschke added the Enhancement label Apr 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SAS na values management #20301

SAS na values management #20301

kharazu commented Mar 12, 2018

TomAugspurger commented Mar 12, 2018

spedygiorgio commented Mar 13, 2018

kharazu commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018 via email

spedygiorgio commented May 7, 2018

TomAugspurger commented May 7, 2018 via email

spedygiorgio commented May 7, 2018 via email

sasutils commented Jun 24, 2018

killerontherun1 commented May 19, 2019 •

edited

Loading

SAS na values management #20301

SAS na values management #20301

Comments

kharazu commented Mar 12, 2018

Problem description

TomAugspurger commented Mar 12, 2018

spedygiorgio commented Mar 13, 2018

kharazu commented Mar 14, 2018

TomAugspurger commented Mar 14, 2018 via email

spedygiorgio commented May 7, 2018

TomAugspurger commented May 7, 2018 via email

spedygiorgio commented May 7, 2018 via email

sasutils commented Jun 24, 2018

killerontherun1 commented May 19, 2019 • edited Loading

killerontherun1 commented May 19, 2019 •

edited

Loading