-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SAS na values management #20301
Comments
What values does sas use for missing values? Is this configurable when the data is written? |
A possible solution could be adding na_values or keep_default_na as per read_csv |
@TomAugspurger SAS uses dots (.) for missing values, but read_sas already manages that, what I meant was something like @spedygiorgio already said, I'd like to set a parameter where I can decide if a value (e.g. 'NA') shouldn't be automatically loaded in pandas as a NaN, just like it happens in read_csv via na_values or keep_default_na. |
Sounds good. I'm just trying to get a sense for whether that actually
happens in practice. I'm sure you have more experience
with reading SAS files than I do :)
…On Wed, Mar 14, 2018 at 3:32 AM, kharazu ***@***.***> wrote:
@TomAugspurger <https://github.com/tomaugspurger> SAS uses dots (.) for
missing values, but read_sas already manages that, what I meant was
something like @spedygiorgio <https://github.com/spedygiorgio> already
said, I'd like to set a parameter where I can decide if a value (e.g. 'NA')
shouldn't be automatically loaded in pandas as a NaN, just like it happens
in read_csv via na_values or keep_default_na.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20301 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIkfUi6ECQ8GgWNgsduxL2ri0bZ5nks5teNW4gaJpZM4Smu5L>
.
|
Any news from this? |
Still open. Are you interested in working on it?
…On Mon, May 7, 2018 at 7:48 AM, Giorgio Alfredo Spedicato < ***@***.***> wrote:
Any news from this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20301 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIvQ1nM64foPvp5nqqsZMK4vrZYBgks5twEKvgaJpZM4Smu5L>
.
|
Unfortunately I am not so skilled :-(
Il giorno lun 7 mag 2018 alle ore 14:53 Tom Augspurger <
notifications@github.com> ha scritto:
… Still open. Are you interested in working on it?
On Mon, May 7, 2018 at 7:48 AM, Giorgio Alfredo Spedicato <
***@***.***> wrote:
> Any news from this?
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <
#20301 (comment)>,
> or mute the thread
> <
https://github.com/notifications/unsubscribe-auth/ABQHIvQ1nM64foPvp5nqqsZMK4vrZYBgks5twEKvgaJpZM4Smu5L
>
> .
>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#20301 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAyWxnt2_atEL9_VbZ5hvPWhRGlyNgEOks5twEPjgaJpZM4Smu5L>
.
|
If someone wants to work on this here is information on how SAS stores missing values. SAS has 28 missing numeric values. The normal missing is represented in code as a period. The special missing values by a period followed by a letter or an underscore. SAS stores all numbers as 8 byte floating point numbers (in IEEE format on non-IBM mainframe systems). To store the missing values they picked specific bit combinations that are invalid IEEE numbers. Here is a table of the values they use as 16 character hex strings. I am not sure what order they store the 8 bytes of the floating point numbers in the SAS dataset, but LITTLE is the order that SAS stores it in memory on a PC. number=. little=0000000000D1FFFF big=FFFF1D0000000000 |
Hi, I can see this thread hasn't been active for a while. I'd like to try my hand on this if this is not resolved. |
Problem description
Hi, I'm writing this report to ask for a na values management parameter for method .read_sas(), just like it's possible to do for .csv tables properly setting na_values or keep_default_na, it would be really useful to directly set na values for this kind of import without time-consuming workarounds
The text was updated successfully, but these errors were encountered: