Skip to content

Commit

Permalink
Add developer documentation about dependency management and rules to …
Browse files Browse the repository at this point in the history
…follow.
  • Loading branch information
poikilotherm committed Nov 14, 2018
1 parent b52425b commit e00b23c
Show file tree
Hide file tree
Showing 4 changed files with 225 additions and 2 deletions.
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,8 @@
'sphinx.ext.autodoc',
'sphinx.ext.intersphinx',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode'
'sphinx.ext.viewcode',
'sphinx.ext.graphviz'
]

# Add any paths that contain templates here, relative to this directory.
Expand Down
221 changes: 221 additions & 0 deletions doc/sphinx-guides/source/developers/dependencies.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,221 @@
=====================
Dependency Management
=====================

.. contents:: |toctitle|
:local:

Dataverse is a (currently) Java EE 7 based application, that uses a lot of additional libraries for special purposes.
This includes features like support of SWORD-API, S3 storage and many others.

Besides the code that glues together the single pieces, any developer needs to describe used dependencies for the
Maven-based build system. Familiar to any Maven user, this happens inside the "Project Object Model" (POM) living in
``pom.xml`` at the root of the project repository. Recursive and convergent dependency resolution makes dependency
management with Maven very easy. But sometimes, in projects with a lot and big dependencies like Dataverse, you have
to help Maven along making the right choices.

Terms
-----

As a developer, you should make yourself familiar with the following terms:

- **Direct dependencies**: things *you use* yourself in your own code for Dataverse.
- **Transitive dependencies**: things *others use* for things you use, pulled in recursively.
See also at `Maven docs <https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Transitive_Dependencies>`_.

.. graphviz::

digraph {
rankdir="LR";
node [fontsize=10]

yc [label="Your Code"]
da [label="Direct Dependency A"]
db [label="Direct Dependency B"]
ta [label="Transitive Dependency TA"]
tb [label="Transitive Dependency TB"]
tc [label="Transitive Dependency TC"]
dtz [label="Direct/Transitive Dependency Z"]

yc -> da -> ta;
yc -> db -> tc;
da -> tb -> tc;
db -> dtz;
yc -> dtz;
}

Direct dependencies
-------------------

Within the POM, any direct dependencies live within the ``<dependencies>`` tag:

.. code:: xml
<dependencies>
<dependency>
<groupId>org.example</groupId>
<artifactId>example</artifactId>
<version>1.1.0</version>
<scope>compile</scope>
</dependency>
</dependencies>
Anytime you add a ``<dependency>``, Maven will try to fetch it from defined/configured repositories and use it
within the build lifecycle. You have to define a ``<version>``, but ``<scope>`` is optional for ``compile``.
(See `Maven docs: Dep. Scope <https://maven.apache.org/guides/introduction/introduction-to-dependency-mechanism.html#Dependency_Scope>`_)


During fetching, Maven will analyse all transitive dependencies (see graph above) and, if necessary, fetch those, too.
Everything downloaded once is cached locally by default, so nothing needs to be fetched again and again, as long as the
dependency definition does not change.

**Rules to follow:**

1. You should only use direct dependencies for **things you are actually using** in your code.
2. **Cleanup** direct dependencies no longer in use. It will bloat the deployment package otherwise!
3. Care about the **scope**. Do not include "testing only" dependencies in the package - it will hurt you in IDEs [#ide]_ and bloat things.
4. Avoid using different dependencies for the **same purpose**, e. g. different JSON parsing libraries.
5. Refactor your code to **use Java EE** standards as much as possible.
6. When you rely on big SDKs or similar big cool stuff, try to **include the smallest portion possible**. Complete SDK
bundles are typically heavyweight and most of the time unnecessary.
7. **Don't include transitive dependencies.** [#ide2]_

* Exception: if you are relying on it in your code (see *Z* in the graph above), you must declare it. See below
for proper handling in these (rare) cases.


Transitive dependencies
-----------------------

Maven is comfortable for developers as it handles recursive resolution, downloading and adding "dependencies of dependencies".
But as life is a box of chocolates, you might find yourself in *version conflict hell* sooner than later without even
knowing, but experiencing unintended side effects.

When you look at the graph above, imagine *B* and *TB* rely on different *versions* of *TC*. How does Maven decide
which version it will include? Easy: the dependent version of the nearest version wins:

.. graphviz::

digraph {
rankdir="LR";
node [fontsize=10]

yc [label="Your Code"]
db [label="Direct Dependency B"]
dtz1 [label="Z v1.0"]
dtz2 [label="Z v2.0"]

yc -> db -> dtz1;
yc -> dtz2;
}

In this case, version "2.0" will be included. If you know something about semantic versioning, a red alert should ring in your mind right now.
How do we know that *B* is compatible with *Z v2.0* when depending on *Z v1.0*?

Another scenario getting us in trouble: indirect use of transitive dependencies. Imagine the following: we rely on *Z*
in our code, but do not include a direct dependency for it within the POM. Now *B* is updated and removed its dependency
on *Z*. You definitely don't want to head down that road.

**Follow the rules to be safe:**

1. Do **not use transitive deps implicit**: add a direct dependency for transitive deps you re-use in your code.
2. On every build check that no implicit usage was added by accident.
3. **Explicitly declare versions** of transitive dependencies in use by multiple direct dependencies.
4. On every build check that there are no convergence problems hiding in the shadows.
5. **Do tests** on every build to verify these explicit combinations work.

Managing transitive dependencies in ``pom.xml``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Maven can manage versions of transitive dependencies in four ways:

1. Make a transitive dependency a direct one, which needs a ``<version>`` tag. Typically a bad idea, don't do that.
2. Use ``<optional>`` or ``<exclusion>`` tags on direct dependencies that request the transitive dependency.
*Last resort*, you really should avoid this. Not explained or used here. `See Maven docs <https://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html>`_.
3. Explicitly declare the dependency in ``<dependencyManagement>`` and add a ``<version>`` tag.
4. For more complex transitive dependencies, reuse a "Bill of Materials" (BOM) within ``<dependencyManagement>``
and add a ``<version>`` tag. Many bigger and standard use projects provide those, making the POM much less bloated.

Examples to follow:

.. code-block:: xml
:linenos:
<properties>
<aws.version>1.11.172</aws.version>
<!-- We need to ensure that our choosen version is compatible with every dependency relying on it.
This is manual work and needs testing, but a good invest in stability and up-to-date dependencies. -->
<jackson.version>2.9.6</jackson.version>
<joda.version>2.10.1</joda.version>
</properties>
<!-- Transitive dependencies, bigger library "bill of materials" (BOM) and
versions of dependencies used both directly and transitive are managed here. -->
<dependencyManagement>
<dependencies>
<!-- First example for case 4. Only one part of the SDK (S3) is used and transitive deps
of that are again managed by the upstream BOM. -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-bom</artifactId>
<version>${aws.version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- Second example for case 4 and an example for explicit direct usage of a transitive dependency.
Jackson is used by AWS SDK and others, but we also use it in Dataverse. -->
<dependency>
<groupId>com.fasterxml.jackson</groupId>
<artifactId>jackson-bom</artifactId>
<version>${jackson.version}</version>
<scope>import</scope>
<type>pom</type>
</dependency>
<!-- Example for case 3. Joda is not used in Dataverse (as of writing this). -->
<dependency>
<groupId>joda-time</groupId>
<artifactId>joda-time</artifactId>
<version>${joda.version}</version>
</dependency>
</dependencies>
</dependencyManagement>
<!-- Declare any DIRECT dependencies here.
In case the depency is both transitive and direct (e. g. some common lib for logging),
manage the version above and add the direct dependency here WITHOUT version tag, too.
-->
<dependencies>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-s3</artifactId>
<!-- no version here as managed by BOM above! -->
</dependency>
<!-- Should be refactored and removed once on Java EE 8 -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-core</artifactId>
<!-- no version here as managed above! -->
</dependency>
<!-- Should be refactored and removed once on Java EE 8 -->
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<!-- no version here as managed above! -->
</dependency>
</dependencies>
Helpfull tools
~~~~~~~~~~~~~~

TODO


.. [#ide] Modern IDEs import your Maven POM and offer import autocompletion for classes based on direct dependencies
in the model. You might end up using legacy or repackaged classes because of a wrong scope.
.. [#ide2] This is going to bite back in modern IDEs when importing classes from transitive dependencies by "autocompletion accident".
----

Previous: :doc:`documentation` | Next: :doc:`debugging`
2 changes: 1 addition & 1 deletion doc/sphinx-guides/source/developers/documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,4 +86,4 @@ In order to make it clear to the crawlers that we only want the latest version d

----

Previous: :doc:`testing` | Next: :doc:`debugging`
Previous: :doc:`testing` | Next: :doc:`dependencies`
1 change: 1 addition & 0 deletions doc/sphinx-guides/source/developers/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Developer Guide
sql-upgrade-scripts
testing
documentation
dependencies
debugging
coding-style
deployment
Expand Down

0 comments on commit e00b23c

Please sign in to comment.