Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_stata issue #11526

Closed
benjello opened this issue Nov 5, 2015 · 11 comments
Closed

read_stata issue #11526

benjello opened this issue Nov 5, 2015 · 11 comments
Labels
Compat pandas objects compatability with Numpy or Python functions IO Stata read_stata, to_stata
Milestone

Comments

@benjello
Copy link
Contributor

benjello commented Nov 5, 2015

I have the following error when reading a bunch of stata file

ValueError: Version of given Stata file is not 104, 105, 108, 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), or 118 (Stata 14)

I do not know the way to check the stata version of the file but I suspect it is fairly recent.
Opening the file with stata 12 and saving it solves the problem.
But I do not have stata installed on my machine and I cannot do that everytime.
How can I check the version of the file ?
Is there anybody kind enough to have a look at the problematic file (personnal email please) ?
I suspect encoding problem. data is a sample of a french survey with many string with accent etc.

Thanks for help

@jreback
Copy link
Contributor

jreback commented Nov 5, 2015

try with encoding='utf-8' and see if that works

@sinhrks sinhrks added the IO Stata read_stata, to_stata label Nov 7, 2015
@benjello
Copy link
Contributor Author

Same error.

@jreback
Copy link
Contributor

jreback commented Nov 12, 2015

this seems peculiar to your case and out of scope of pandas

@jreback jreback closed this as completed Nov 12, 2015
@torstees
Copy link

I have the same problem with DTA files generated by SAS. Stata users have no problem opening these files, and neither do I if I use R, but I can't open them up using pandas (v 0.18.1)

@jorisvandenbossche
Copy link
Member

@torstees Can you provide a reproducible example? (eg example file that fails to open)

@torstees
Copy link

Here you go. I included a minimal python and R script along with the output I'm seeing on current OSX and a recent linux system. The dataset was exported with the current version of SAS using proc export. It contains a simple list of ids and nothing else.
testdata.zip

@benjello
Copy link
Contributor Author

benjello commented Sep 5, 2016

There is a huge probability that the stata file I do use are generated by SAS since it is the main software used by the french statistical institute. I tried @torstees data and got the same error.

@jorisvandenbossche
Copy link
Member

cc @bashtage or @kshedden if you could have a look

@kshedden
Copy link
Contributor

kshedden commented Sep 5, 2016

The dta file format code for the file supplied by @torstees is 111. According to the R docs here:

https://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html

format 111 corresponds to Stata 7SE. But the SAS docs linked below state that SAS writes dta files compatible with Stata 8 and later.

http://support.sas.com/documentation/cdl/en/acpcref/63184/HTML/default/viewer.htm#a003103776.htm

There is no mention of format version 111 in the Stata dta format docs:

http://www.stata.com/help.cgi?dta

@kshedden
Copy link
Contributor

kshedden commented Sep 5, 2016

I also noticed that in the References section below:

https://stat.ethz.ch/R-manual/R-devel/library/foreign/html/read.dta.html

they state that the spec for dta's written by Stata 7 is contained in the printed programming manual. I can't find it on-line.

@kshedden
Copy link
Contributor

kshedden commented Sep 5, 2016

The small SAS program below exports a Stata dta file. You can then use Stata to check its version, with dtaversion tmp.dt. Under SAS 9.1 and Stata 14.1 I get:

. dtaversion tmp.dta
  (file "tmp.dta" is .dta-format 111 from Stata 7)

Here is the SAS program (you need to have a small "tmp.csv" file in the working directory):

libname mydata ".";

proc import datafile="tmp.csv"
    dbms=csv
    out=tmp;

proc export
    file="tmp.dta"
    dbms=stata replace;

run;

@jreback jreback reopened this Sep 6, 2016
@jreback jreback added this to the 0.19.0 milestone Sep 6, 2016
@jreback jreback added Bug Compat pandas objects compatability with Numpy or Python functions and removed Bug labels Sep 6, 2016
kshedden added a commit to kshedden/pandas that referenced this issue Sep 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants