Welcome to the contributing guide for the observatory-platform! We welcome contributions to the project, please see below for details about how to contribute.
- Python: version and code style guidelines.
- Documentation: docstrings and readthedocs.org documentation.
- Unit Tests: how to write unit tests and where to store data.
- License and Copyright: what licenses are ok and not ok and how to use the automatic license checker.
- Development Workflow: how to develop a new feature.
- Deployment: how to deploy the package to PyPI.
The Observatory Platform is written in Python.
A minimum version of Python 3.10 is required.
The code style should conform to the Python PEP 8 Style Guide for Python Code. Additional code style guidelines specific to this project include:
- Function parameters and return types should be annotated with type hints to make the API definition as clear as possible to end users of the library. The Python documentation page typing — Support for type hints provides a good starting point for learning how to use type hints.
- A maximum line length of 120 characters.
- Formatting with Black.
Python docstrings should be written for all classes, methods and functions using the Sphinx docstring format. See the Sphinx tutorial Writing docstrings for more details. Additional documentation style guidelines specific to this project include:
The instructions below show how to setup automatic docstring template generation with popular IDEs:
- Visual Studio Code: install the autoDocstring
plugin. To enable docstrings to be generated in the Sphinx format, click Code > Preferences > Settings > Extensions >
AutoDocstring Settings and from the
Docstring Format
dropdown, choosesphinx
. - PyCharm: PyCharm supports docstring generation out of the box, however, the Sphinx docstring format may need to be
explicitly specified. Click PyCharm > Preferences > Tools > Python Integrated Tools and under the Docstrings heading,
choose
reStructedText
from theDocstring format
dropdown.
The Observatory Platform documentation hosted at observatory-platform.readthedocs.io
is built from the files in the docs
directory.
An overview of the technologies used to build the documentation:
- Generated with Sphinx.
- Theme: sphinx-rtd-theme.
- Documentation can be written in both Markdown and ReStructuredText text. Markdown is preferred and is parsed with recommonmark.
- API reference documentation is automatically generated from docstrings with sphinx-autoapi.
The Read the Docs documentation is built automatically on every pull request and push.
Make sure that the Observatory Platform is installed with the docs requirements:
pip install -e .[docs]
Navigate to the docs
directory:
cd docs
Build the documentation with the following command:
make html
The documentation should be generated in the docs/_build
directory. You can open the file docs_/build/index.html
in a browser to preview what the documentation will look like.
Unit tests are written with the Python unittest framework.
The below code snippet is an example of a simple unit test. Create a class, in this example TestString
, which
represents a batch of tests to implement (for instance all of the tests for a class). Implement tests for each
function as a new method beginning with test_
, so that the Python unittest framework can find the methods
to test. For instance, the code snippet below tests the concatenation functionality for the Python String class in
the function test_concatenate
. See the Python unittest framework
documentation for more details.
import unittest
class TestString(unittest.TestCase):
def test_concatenate(self):
expected = "hello world"
actual = "hello" + " " + "world"
self.assertEqual(expected, actual)
- Test datasets should be kept in the
observatory-platform/fixtures
folder, which are stored in Git LFS. - The Python unittest framework looks for files named test*.py, so make sure to put "test_" at the start of your test
filename. For example, the unit tests for the file
gc_utils.py
are contained in the file calledtest_gc_utils.py
.
The unit tests should be kept in the folder called tests
at the root level of the project. The tests
directory
mimics the folder structure of the observatory_platform
Python package folder. For example, as illustrated in the
figure below, the tests for the code in the file observatory-platform/observatory_platform/utils/gc_utils.py
are
contained in the file observatory-platform/tests/observatory_platform/utils/test_gc_utils.py
.
An example of project and test directory structure:
|-- observatory-platform
|-- .github
|-- observatory_platform
|-- scripts
|-- telescopes
|-- utils
|-- __init__.py
|-- gc_utils.py
...
|-- __init__.py
|-- tests
|-- observatory_platform
|-- scripts
|-- telescopes
|-- utils
|-- __init__.py
|-- test_gc_utils.py
...
|-- __init__.py
|-- data
|-- __init__.py
|-- .gitignore
|-- CONTRIBUTING.md
...
Unit tests for the Observatory Platform's dependent repositories (oaebu-workflows, academic-observatory-workflows) are stored differently. Tests for any code should be stored in a directory named tests
which shares a directory with the code it is testing. For example, as illustrated in thefigure below, the tests for the code in the file oaebu-workflows/oaebu_workflows/telescopes/oapen_metadata_telescpe.py
are contained in the file oaebu-workflows/oaebu_workflows/telescopes/tests/test_oapen_metadata_telescpe.py
.
An example of project and test directory structure:
|-- oaebu-workflows
|-- .github
|-- .gitignore
|-- oaebu_workflows
|-- onix.py
|-- telescopes
|-- __init__.py
|-- oapen_metadata_telescope.py
|-- tests
|-- __init__.py
|-- test_oapen_metadata_telescope.py
|-- tests
|-- __init__.py
|-- test_onix.py
|-- fixtures
|-- oapen_metadata
|-- test_data.json
|-- onix
|-- test_data.json
|-- CONTRIBUTING.md
To test code that makes HTTP requests:
- VCR.py is used to test code that makes HTTP requests, enabling the tests to work offline without calling the real endpoints. VCR.py records the HTTP requests made by a section of code and stores the results in a file called a 'cassette'. When the same section of code is run again, the HTTP requests are read from the cassette, rather than calling the real endpoint.
- Cassettes should be committed into the folder
observatory-platform/fixtures/cassettes
.
To run the unittests from the command line, execute the following command from the root of the project, it should automatically discover all the unit tests:
python -m unittest
How to enable popular IDEs to run the unittests:
- PyCharm: PyCharm supports test discovery and execution for the Python unittest framework. You may need to configure
PyCharm to use the unittest framework. Click PyCharm > Preferences > Tools > Python Integrated Tools and under the
Testing heading, choose
Unittests
from theDefault test runner
dropdown. - VSCode: VSCode supports test discovery and execution for the Python unittest framework. Under the Testing panel, configure the tests to use the unittest framework and search for tests beginning with *test_**.
A .env
file will also need to be configured with the following variables set:
- GOOGLE_APPLICATION_CREDENTIALS - The path to your Google Cloud Project credentials
- TEST_GCP_PROJECT_ID - Your GCP Project ID
- TEST_GCP_DATA_LOCATION - The location of your GCP Project
- TEST_GCP_BUCKET_NAME - The name of your GCP testing bucket
Some tests may also require access to Amazon Web Services. For these tests, the following additional envrionment variables are required:
- AWS_ACCESS_KEY_ID - Your AWS secret key ID
- AWS_SECRET_ACCESS_KEY - Your AWS secret key
- AWS_DEFAULT_REGION - The AWS region
Contributors agree to release their source code under the Apache 2.0 license. Contributors retain their copyright.
Make sure to place the license header (below) at the top of each Python file, customising the year, copyright owner and author fields.
Typically, if you create the contribution as a part of your work at a company (including a University), you should put the copyright owner as the company and not yourself. You may put yourself as the author. If you created the contribution independently and you own the copyright to the contribution, then you should place yourself as the copyright owner as well as the author.
# Copyright [yyyy] [name of copyright owner]
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Author: [name of author]
You may depend on a third party package if the third party package has a license from the
unencumbered
,
permissive
,
notice
,
reciprocal
lists, or if the license is LGPL.
You must not depend on any package that has a license in the
restricted
or
banned
license lists (unless the license is LGPL).
Common examples of these licenses include, Creative Commons "Attribution-ShareAlike" (CC BY-SA),
Creative Commons "Attribution-NoDerivs" (CC BY-ND), the GNU GPL and the A-GPL. These licenses are incompatible with the
Apache 2.0 license that the Observatory Platform is released with.
The licenses of the dependencies in requirements.txt
are checked with liccheck
on each push and pull request as a part of the Python package Github Action.
- The list of authorized and unauthorized licenses are documented in strategy.ini
- The
Python package
Github Action will fail if any unknown or unauthorized licenses are detected. - If the license checker fails:
- Check the license of the dependency.
- If the dependency has a license that may be depended on, add the exact license text, in lower case, under
authorized_licenses
in strategy.ini. - If liccheck could not find license information for a package, and you can manually verify that the license is OK
to be depended on, then add the package and version under the section
[Authorized Packages]
in strategy.ini along with a comment about what the license is and a URL to the license. - If the dependency has a license that must not be depended on, then don't use that dependency.
Code written by third parties (not the contributor) can be directly integrated into the project, although it is best avoided if possible. The requirements for integrating third party code includes:
- The code requires one of the open source license from the
unencumbered
,permissive
ornotice
lists. - The third party open source code should be contained in its own file and not copied into other files. You may modify the code in the file and make changes.
- The license for the third party open source code must be specified at the top of the file and in the LICENSES_THIRD_PARTY file.
You must not include any code from a project with a license in the
restricted
or
banned
license lists published by Google.
Common examples of these licenses include, Creative Commons "Attribution-ShareAlike" (CC BY-SA),
Creative Commons "Attribution-NoDerivs" (CC BY-ND), the GNU GPL and the A-GPL. These licenses are incompatible with the
Apache 2.0 license that the Observatory Platform is released with.
This section explains the development workflow used in the Observatory Platform project and its dependent repositories.
The observatory-platform and its dependent repositories each have only one long-lived branch - main
. This branch is in continuous development; all official releases are made from a point in time of the main
branch. The branching strategy employed is not unlike the popular GitHub Flow and trunk-based development strategies whereby all feature branches are created from and merged into main
.
GitHub has and official guide on how to contribute to open-source projects when you do not have direct access to the repository.
If you do have direct repository access, the general workflow for working on a feature is as follows:
- Clone the project locally.
- Create a feature branch, branching off the
main
branch. The branch name should be descriptive of its changes. - Once your feature is ready and before making your pull request, make sure to rebase your changes onto the latest
origin/main
commit. It is a good idea to rebase regularly. It is preferred that bloated commits are squashed using the interactive tag when rebasing. - Make your pull request:
- Tag at least one reviewer in the pull request, so that they receive a notification and know to review it.
- The guide on How to write the perfect pull request might be helpful.
Detailed instructions on how to use Git to accomplish this process are given below.
Clone the Observatory Platform Github project, with either HTTPS:
git clone https://github.com/The-Academic-Observatory/observatory-platform.git
Or SSH (you need to setup an SSH keypair on Github for this to work):
git clone git@github.com:The-Academic-Observatory/observatory-platform.git
Checkout main:
git checkout main
Then, create a new feature branch from main:
git checkout -b <your-feature-name>
Before rebasing, make sure that you have the latest changes from the origin main branch.
Fetch the latest changes from origin:
git fetch --all
Checkout the local main branch:
git checkout main
Merge the changes from upstream main onto your local branch:
git merge upstream/main
Git should just need to fast forward the changes, because we make our changes from feature branches.
Checkout your feature branch:
git checkout <your-feature-name>
Rebase your feature branch (the branch currently checked out) onto main.
git rebase -i main
This section contains information about how to deploy the project to PyPI.
Follow the instructions below to deploy the package to PyPI.
In setup.py
update version
and download_url
with the latest version number and the latest Github release download
URL respectively:
setup(
version='19.12.0',
download_url=('https://github.com/The-Academic-Observatory/observatory-platform/v19.12.0.tar.gz'
)
Commit these changes, push and make a new release on Github.
Enter the package folder:
cd observatory-platform
Ensure any dependencies are installed:
pip3 install -r requirements.txt
Create a source distribution for the package:
python3 setup.py sdist
Install twine, which we will use to upload the release to PyPI:
pip3 install twine
Use twine to upload the release to PyPI:
twine upload dist/*