-
-
Notifications
You must be signed in to change notification settings - Fork 14
Code Style Guide
We're writing Python code for people who are not Python experts, which is the main factor that motivates this code style guide. The TLDR version of the guide is:
- Use PEP 8.
- Use
lower_snake_case
in non-Python structs. - Associated functions, templates, and URL routes should share the same name.
- Import one thing per line.
- Use Google docstrings.
- Use type-hints for function signatures.
- Add more comments than you usually would.
- Prioritize simplicity and readability (more than you usually would).
Here is the longer version of the style guide:
We're sticking strictly to the PEP 8 Style Guide for this project, which means all Python code should adhere to the following:
- 80 character line limit in all Python files.
- 4 space character indents; no tab indents.
- Use
CamelCase
for class names. - Use
lower_snake_case
for functions, variables and methods. - Use
UPPER_SNAKE_CASE
for constants.
We will also be utilizing the Google Style Guide for Python as well (which does not conflict with PEP 8).
In this project, we preference using single-quoted strings '
. As per the Google Style Guide however, we still use 3x double quotes for docstrings.
For yaml files, json files, column names, etc. columns and keys should have a lower_snake_case
format:
Good:
import pandas as pd
df = pd.DataFrame({'my_column_name': [1, 2, 3]})
Bad:
import pandas as pd
df = pd.DataFrame({'MyColumnName': [1, 2, 3]})
Making everything have predictable names will keep things easier, and this is how we do that.
If you have a web page located at /hello
, then the function wrapped by bp.route('/hello')
should be named hello()
, and it should render a template named hello.html
.
For pages that route to /
, use index.html
as the template and index()
as the function name.
Folders in the /templates
directory should adhere to a similar structure as the routes. So if you are working in a blueprint that prefixes to /foo
and there are two web pages tied to that blueprint that route to /
and /bar
, then:
- The templates should be
/templates/foo/index.html
and/templates/foo/bar.html
respectively. - The functions should be named
index()
andbar()
respectively, but they should be in a module called/blueprints/foo.py
There are two exceptions to this so far:
-
/blueprints/flagging.py
is essentially the "index" of the website even though it is not indicated as such in the name. I'm not 100% sure this was the best decision, but it's forgivable as the website is called "flagging." -
/templates/base.html
is not tied to any functions; it is simply the base template of the website from which all other templates extend.
There may be more exceptions to this in the future, but in general the principle holds: make everything as predictable and as consistent as you can!
Good:
from flask import Flask
from flask import render_template
Bad:
from flask import Flask, render_template
The reason why is because the former is more readable. Yes, it will result in much more text, but a few extra bytes or kilobytes never hurt anyone.
Docstrings are triple-quoted strings below functions. They describe what the function does: what it inputs are, what it outputs are, its side-effects, and other expected behaviors:
def hello(name: str) -> str:
"""This function outputs a greeting to the person whose name is in the first
positional argument."""
return f'Hello, {name}!'
The docstring is stored in __doc__
:
>>> print(hello.__doc__)
This function outputs a greeting to the person whose name is in the first
positional argument.
There are a few styles for writing docstrings. Sphinx docstrings (sometimes called reST docstrings) are the most popular style because it strikes a balance between aesthetics and ease of manual typing. But for this project, we will use Google docstrings because they are the most readable and aesthetically appealing, and that is a high priority for our codebase. (I personally prefer Sphinx myself, so if you're sad then I understand!)
# Google style docstring example
def hello(name: str) -> str:
"""This function outputs a greeting to the person whose name is in the first
positional argument.
Args:
name (str): The name of a person.
Returns:
str: A greeting addressed to `name`.
"""
return f'Hello, {name}!'
In PyCharm, you can change the default docstring file with Settings -> Tools -> Python Integrated Tools -> Docstring format
.
This code base is written in Python 3.7, so we definitely support type hints. Additionally, type hints can help make code easier to understand.
# Good use of type hints (because it is in function signature)
def hello(name: str) -> str:
return f'Hello, {name}!'
However, do not utilize type hints for variable assignment outside the context of defining function signatures. They look weird to people who aren't Java and C coders, and they look doubly weird to beginner programmers.
# Bad use of type hints (because it is for normal variable assignment)
my_var: str = 'Foo bar'
Docstrings should be used to describe a function's inputs, outputs, side-effects, and other expected behaviors. Comments should be used when docstrings are not appropriate, which is usually to describe lines of code or chunks of code, why those lines/chunks exist and what those lines/chunks do.
You should add more comments than you usually would in this code because of the intended audience. Don't go overboard, but don't be shy either.
We are writing code that will be maintained by people who are not Python experts, which means the code needs to be very easily readable.
A good rule of thumb is to ask yourself: "Would a Python beginner be able to understand this code quickly?" If the answer is no, then ask "is there a way to write my code so a Python beginner could understand it quickly?" If the answer to that question is yes, then write your code that way.
An example can be found in the keys.py
file. We load our keys from the zip file with a relatively straightforward function:
# Contents of `keys.py`
from flask import current_app
import zipfile
import yaml
def load_keys_from_vault(
vault_password: str,
vault_file: str = VAULT_FILE
) -> dict:
pwd = bytes(vault_password, 'utf-8')
with zipfile.ZipFile(vault_file) as f:
with f.open('keys.yml', pwd=pwd, mode='r') as keys_file:
d = yaml.load(keys_file, Loader=yaml.BaseLoader)
return d
You might think, one way to make this code more flexible so we can add more files to vault.zip
with corresponding loader functions is to abstract it a little. So you create a function called open_from_vault
that creates a little context manager for opening files inside of vault.zip
:
# Hypothetical contents of `keys.py`
from flask import current_app
from contextlib import contextmanager
from typing import Generator
import zipfile
import yaml
@contextmanager
def open_from_vault(
archived_file: str,
vault_password: str,
vault_file: str = VAULT_FILE
) -> Generator:
pwd = bytes(vault_password, 'utf-8')
with zipfile.ZipFile(vault_file) as f:
vault_file = f.open(archived_file, pwd=pwd, mode='r')
try:
yield vault_file
finally:
vault_file.close()
def load_keys_from_vault(
vault_password: str,
vault_file: str = VAULT_FILE
) -> dict:
with open_from_vault('keys.yml', vault_password, vault_file) as keys_file:
d = yaml.load(keys_file, Loader=yaml.BaseLoader)
return d
There's nothing wrong with this code. In fact, in a large company with a bunch of Python engineers, the latter more abstracted code will often be preferable. But between using contextmanager
, yield
, Generator
, try
, and finally
, we're getting pretty deep into Python world. Because the former block of code is a lot easier to grasp if you are a Python beginner, we prefer to use the former code. Even if we add another file or two to vault.zip
, we will still most likely prefer the former code style and add a little repetition to the code.
Here are 4 tips that can make your code easier for a beginner to grasp:
- Avoid metaprogramming packages such as
contextlib
,importlib
, andfunctools
when possible. - Don't be overly scared of a little WET code.
- Isolate complicated-but-necessary code from the easy-to-understand code.
- Make sure complicated-but-necessary code has clear behavior and input-output specifications, so that someone reading the code doesn't need to read every line of the source code to understand what it's doing.