-
Notifications
You must be signed in to change notification settings - Fork 1
Home
This Wiki is my blog/memory dump/recipe book where I have recorded my process of learning to use python. It is not written to be pretty, rather it includes output so that it is clear what happens when certain commands are run. I hope it will save others time and angst on their python journey. As of June 7, 2020, I have transferred to NOAA National Marine Fisheries Northeast Fisheries Science Center. This Wiki has been ported there, and there my python saga continues... https://github.com/mmartini-noaa/MartiniStuff/wiki
Or what Marinna learned the super hard way. Especially if you are already an experienced programmer. On this page is what the cool kids may forget to tell you. You need a foundation of tools that are the environment in which you will use python, in order to learn python quickly. Sure, you can install just any old python and massage some ASCII files, make some matplotlib plots. We (and you) will want to do more than that - much more, once you catch the python bug.
Install anaconda using this guide: http://ioos.github.io/notebooks_demos/other_resources/
But don't start learning python yet!! First, learn about environments.
We like this: https://medium.freecodecamp.org/why-you-need-python-environments-and-how-to-manage-them-with-conda-85f155f4353c.
You'll need to know how to make new ones, delete old ones and update them. How to use pip (yes, pip, sometimes it will be pip and not conda) to install something cool in a new environment and if it doesn't work out, get rid of the corrupted environment. How to make a .yml file of your best working environment so you can recreate it later if all goes to H in a hand basket. Because it will.
The reason we like conda is because it is a monitored sand box for you to work in. The reason we like the IOOS environment is that people are behind the scenes checking for package compatibility and fixing issues. Pip has to recompile the package locally - functionality is not guaranteed.
Here's the API listing: https://conda.io/docs/commands.html#conda-environment-commands
Here's what's going on under the hood: https://github.com/mmartini-usgs/MartiniStuff/wiki/Environment-management-nuts-and-bolts
Next, learn the ins and outs of Jupyter-Notebook.
Jupyter should be installed as part of your Anaconda installation you did in the first part of this page. Here is the documentation. There are some handy capabilities for beginners, such as "tab complete" which will list all the methods of an object (myobject.) and "shift tab" to get documentation on a method (myobject.themethod( ))
https://jupyter-notebook.readthedocs.io/en/stable/
Introductory stuff: https://jupyter-notebook.readthedocs.io/en/stable/notebook.html
Windows folks, we are sorry, Jupyter is incapable of seeing directories above its startup or on other drives. So learn where your config file is and choose a starting place. If you aren't sure in which directory to find your jupyter_notebook_config.py file, you can type jupyter --config-dir
and if you don't find the file there, you can create it by typing
jupyter notebook --generate-config
.
Then set:
c.NotebookApp.notebook_dir = 'c:/users/myusername'
or set it to whatever path is most convenient for you to start in. I use my data drive, e:
Or set your starting point each time you run jupyter notebook by: jupyter notebook --notebook-dir=%CD%
where "%CD%" inserts your current directory, or you can specify a directory, such as jupyter notebook --notebook-dir="e://"
I also like to set c.NotebookApp.webbrowser_open_new = 1
to open jupyter in a new browser window
and c.NotebookApp.browser = 'C:/Program Files (x86)/Google/Chrome/Application/chrome.exe %s'
to force use of chrome.
This is documented by our own Rich Signell in https://stackoverflow.com/questions/39252884/jupyter-notebook-starting-directory/41513268?r=SearchResults&s=1|59.4092#41513268
For Jupyter you will also want these extensions:
conda install jupyter_contrib_nbextensions
And in a Jupyter-notebook under Edit -> nbextensions config you will want to install the gist extension
You should use "quit" to exit Jupyter Notebook from your browser, or it will leave kernels running.
See this for more info: http://jupyter-notebook.readthedocs.io/en/latest/config.html
Need a learning Jupyter link here
Understand what a magic is. If you haven't worked in LINUX, learn about ls, and some other unix specific expressions that are handy for python
In a nutshell, don't use it. My attempts to use its shiny Windows interface has ruined more anaconda environments and got me in heaps of trouble. Fixing those is what taught me about environments, I don't recommend that learning path. So if you are uncomfortable with Anaconda prompt command line stuff, practice until you are comfortable. See Environments above.
You are still not ready for python!
Now, you have a chicken and egg problem. You need git on your local machine to interact with github, and you need github to have something with which to use git.
Git is essential to coding in modern world, especially in the science world. It's another repository and code maintenance system. Here is where to get it: https://git-scm.com/ With Git comes Git bash, a LINUX bash shell, so now you have LINUX on your windows box, too! I work on windows, so I like Tortoise Git https://tortoisegit.org/ which has some very helpful explanations in its help files.
Learn git basics here:
- https://matthew-brett.github.io/curious-git/
- https://docs.scipy.org/doc/numpy/dev/gitwash/development_setup.html
- https://www.codecademy.com/learn/learn-git,
- https://www.asmeurer.com/git-workflow/ and that page has a very important hint about how to set your fork up so that it is easy for you to know it is yours.
Learn how to use branches in git. A good example is here: http://xarray.pydata.org/en/stable/contributing.html#creating-a-branch
If you are coming from SVN to git, things are a little different. In SVN we used to commit changes early and often to the server, with git you do not want to push every small change. Branches help manage this. You commit locally, then push a group of changes less often. If you have created a fork, you will not be able to merge pull requests if you do not have write permission in the repository to which you want to merge your changes on your fork. This will make it easier for any pull requests to be evaluated later. Further explanation about this can be found here: https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History and here: https://help.github.com/en/articles/about-pull-request-merges
If you have been using commits in SVN as a means to back up your work off your local computer, you will now need to find another strategy to accomplish that goal, for all git commits are contained on your local machine. That is good for the repository, I suppose, but not good if your hard drive suddenly dies.
You will get yourself knotted up in git, and when you do, this will help: https://ohshitgit.com/
So you have git. Let's get some code... and set up a place to stash your own creations. Get yourself an account at Github: https://github.com/ USGS folks, we have a naming convention going over on github, your first initial+last name+dash+usgs. For example, this account: https://github.com/rsignell-usgs. See if you can match that avatar pic for coolness.
Github can't handle large files. See the sidebar on Git LFS (https://github.com/mmartini-usgs/MartiniStuff/wiki/Git-LFS) for notes.
Finally -- python!
Even though I had been to some boot camps and read some books, I found this course worth the time: https://www.udemy.com/complete-python-bootcamp/learn/v4/overview I took a week of focused work to listen to the videos and do the exercises. As of summer of 2019, the python community is moving on to python 3, many packages are abandoning python 2.
- As I worked with python early on, I found https://realpython.com to be very helpful.
- I still keep a copy of the Python Pocket Reference by Mark Lutz handy.
- listen to the Talk Python to Me podcast, it has many gems: https://talkpython.fm/home
If, by now, you are wanting to play with USGS Coastal and Marine data using the new STGlib (https://github.com/dnowacki-usgs/stglib) the I recommend you learn more sophisticated importing: https://realpython.com/absolute-vs-relative-python-imports/
So you have written code in Jupyter Notebooks and maybe even run something from the Anaconda prompt. And it didn't work. And you are using debugging techniques on the command line. How frustrating is that? Wish you had that nice IDE environment MATLAB provides? Well you can (though some people never use an IDE). Here i am keeping track of the pros and cons of IDE's I have tried: https://github.com/mmartini-usgs/MartiniStuff/wiki/IDEs-and-python
There. Those are the basic tools. Now before you get all excited with this shiny new tool, read this: https://www.fastcompany.com/28121/they-write-right-stuff and then go play...
So you are coding now... and it's time to update, how do you do that?
Here's how I like to update my conda installation. Open you anaconda prompt, and do these in the order in which they are listed:
-
conda update -n root
to update conda itself -
conda update --all
to update all root packages -
activate IOOS
get into the IOOS environment -
conda update --all
to update packages in the IOOS environment - repeat 3 and 4 for each environment you have. Don't know what you have? use
conda env list
I live in the Windows environment. There are many things I like about Windows. This is not one of them. Python can't handle windows paths, yes you always need \\
and it's a pain. '' is the control character sequence for strings in python and python will always interpret a single '' in a string as such and will treat the following character as "special". Not even the pathlib module will solve this problem, especially if you have directories or file names that begin with a number. For instance in this example where a data mooring number is interpreted as hex:
Given:
inpath = Path("E:\data\Sandwich\10811_V20784\processed201809scaled")
Get:
OSError: [Errno 22] Invalid argument: b'E:\\data\\Sandwich\x08811_V20784\\processed201809scaled'
This is a pain, especially if you have files that start with digits. Here is a trick:
import os
winpath = "E:\data\Matanzas\11109_Vec5086Blue"
print(winpath)
# using raw strings is key
rwinpath = r"E:\data\Matanzas\11109_Vec5086Blue"
print(rwinpath)
# output will be
# E:\data\MatanzasI09_Vec5086Blue
# E:\data\Matanzas\11109_Vec5086Blue
See the difference?
Now replace will work:
print(winpath.replace(os.path.sep, '/'))
print(rwinpath.replace(os.path.sep, '/'))
# the output will be
E:/data/MatanzasI09_Vec5086Blue
E:/data/Matanzas/11109_Vec5086Blue
see the difference?
Xarray is powerful. It can average 3 million samples of data in less than 3 minutes without iteration. It expects well mannered data though. This is where I have had trouble. Take the time to understand python and netcdf data files before you prepare data for it. Most of the online tutorials are using data that is already clean and well mannered, so if you are using raw data you are preparing from scratch, you need to be patient. Some good tutorials are:
- https://ueapy.github.io/introduction-to-xarray.html
- http://pure.iiasa.ac.at/id/eprint/14952/1/xarray-tutorial-egu2017-answers.pdf
- https://rabernat.github.io/research_computing/xarray.html
Xarray has some idiosyncrasies to be aware of. I discuss those here: https://github.com/mmartini-usgs/MartiniStuff/wiki/Xarray-things-to-know
If you are coming out of MATLAB, C or some other language, be ready to try and forget some things you know. I have gotten into more trouble making assumptions about python based on what I know about programming.
Some examples:
If you have programmed in some other language that has a scan-something to go with print-something, don't extrapolate this to python. There's no scanf in python. There are clever ways to do what scanf does, and no one got around to writing a scanf. In fact, format strings are only a recent addition to python. You do not want to know how much time I spent looking for a scanf method in python that does not exist. So just let that one go.
These are how one formats one's output. Such as:
the_float = 4.323
the_int = 6
the_str = 'str'
# this gives you precise format control and is like other languages
print('this number is a float %4.2f and this is an int %2d and a string %s' % (the_float, the_int, the_str))
print(f'this number is a float {the_float} and this is an int {the_int} and a string {the_str}')
# the above is very nice, until you have a netCDF file variable like ds['time']
# and the quotes are an issue within the {}, so then try
print('this number is a float {} and this is an int {} and a string {}'.format(the_float, the_int, the_str))
The canonical page is here: https://www.python.org/dev/peps/pep-0498/
And a the comprehensive, python 3 doc on string formatting is here: https://docs.python.org/3/library/string.html
Python is object oriented. That does not mean that if something is said to inherit from a parent class that it will inherit all the methods from that class. Once again with python - do not assume.
This is a cool thing - get a dictionary of what is local and global in scope by using global() and local() and watch out, I have had an occasion where something out of a method's scope - was altered by the method, simply by how I defined the function.
Such as stglib. Here you will have to install pip and use it to add stglib to an activated environment. Ideally, you will cone your shiny, new and perfect IOOS environment before doing this. The process goes something like this, warning - my syntax is not exact:
conda create --name myIOOS --clone IOOS
activate myIOOS
conda install pip
cd directory_to_put_stglib
git clone stgliburl
cd directory_of_stglib
pip install -e . --no-deps
Or, you can add stglib to the bottom of your environment.yml file - see environment nuts and bolts in this wiki. Here is the list of packages I now use: https://github.com/mmartini-usgs/MartiniStuff/wiki/A-list-of-packages-I-use
back to Python Topic page https://cmgsoft.repositoryhosting.com/trac/cmgsoft_m-cmg/wiki/python_topics
References and other sources:
- Eric Firing's Data Analysis Class pages: https://currents.soest.hawaii.edu/ocn_data_analysis/index.html
- Talk Python to me on the Anaconda Distribution: https://talkpython.fm/episodes/show/198/catching-up-with-the-anaconda-distribution
Written by Marinna Martini https://github.com/mmartini-usgs