From 7b072aaa6957c3b5b457ec98dcd346b875bcf922 Mon Sep 17 00:00:00 2001 From: Tibor Simko Date: Tue, 17 May 2016 10:02:25 +0200 Subject: [PATCH] docs: richer documentation * Enriches documentation by the information taken from wiki pages. Useful for future plugging into Read The Docs. (addresses #13) * Note: the information is taken rather literally so far, modulo formatting changes. An update of the documentation content, reflecting the recent code base changes is still to come. Signed-off-by: Tibor Simko --- README.rst | 111 +++++--------------------------- docs/advanced-features.rst | 54 ++++++++++++++++ docs/big-picture.rst | 65 +++++++++++++++++++ docs/cache.rst | 59 +++++++++++++++++ docs/configuration.rst | 87 +++++++++++++++++++++++++ docs/getting-started.rst | 103 ++++++++++++++++++++++++++++++ docs/handler.rst | 112 +++++++++++++++++++++++++++++++++ docs/http-response-headers.rst | 54 ++++++++++++++++ docs/index.rst | 12 +++- docs/introduction.rst | 76 ++++++++++++++++++++++ docs/memento.rst | 54 ++++++++++++++++ docs/usage.rst | 12 ---- 12 files changed, 690 insertions(+), 109 deletions(-) create mode 100644 docs/advanced-features.rst create mode 100644 docs/big-picture.rst create mode 100644 docs/cache.rst create mode 100644 docs/configuration.rst create mode 100644 docs/getting-started.rst create mode 100644 docs/handler.rst create mode 100644 docs/http-response-headers.rst create mode 100644 docs/introduction.rst create mode 100644 docs/memento.rst delete mode 100644 docs/usage.rst diff --git a/README.rst b/README.rst index 9a2e1e2..28378d7 100644 --- a/README.rst +++ b/README.rst @@ -4,6 +4,9 @@ Memento TimeGate .. image:: https://img.shields.io/travis/mementoweb/timegate.svg :target: https://travis-ci.org/mementoweb/timegate +About +----- + Make your web resources `Memento `__ compliant in a few easy steps. @@ -11,110 +14,28 @@ The Memento framework enables datetime negotiation for web resources. Knowing the URI of a Memento-compliant web resource, a user can select a date and see what it was like around that time. -Introduction ------------- - -In order to support Memento, a web server must obviously have accessible -archives of its online resources. And it must also have a piece of -software that handles the datetime negotiation according to the Memento -protocol for those resources. - -But in such datetime negotiation server, only a small proportion of the -code is specific to the particular web resources it handles. The main -part of logic will be very similar throughout many implementations. -TimeGate isolates the core components and functionality. With it, -there's no need to implement, or to re-implement the same logic and -algorithms over and over again. Its architecture is designed to accept -easy-to-code plugins to match any web resources. - -From now on, this documentation will refer to the web server where -resources and archives are as the **web server** and to the Memento -TimeGate datetime negotiation server as the **TimeGate**. - -- Suppose you have a web resource accessible in a web server by some - URI. We call the resource the **Original Resource** and refer to its - URI as **URI-R**. -- Suppose a web server has a snapshot of what this URI-R looked like in - the past. We call such a snapshot a **Memento** and we refer to its - URI as **URI-M**. There could be many snapshots of URI-R, taken at - different moments in time, each Memento i with its distinct URI-Mi. - The Mementos do not necessary need to be in the same web server as - the Original Resources. - -Example -------- - -.. figure:: https://raw.githubusercontent.com/mementoweb/timegate/master/docs/uris_example.png - :alt: Image - -There are only two steps to make such resource Memento compliant. - -Step 1: Setting up TimeGate ---------------------------- - -The first thing to do is to set up the TimeGate for the specific web -server. - -* Run the TimeGate with your custom handler. The handler is the - piece of code that is specific to how the web server manages Original - Resources and Mementos. It needs to implement either one of the - following: - - - Given a URI-R, return the list of URI-Ms along with their respective dates. - - Given a URI-R and a datetime, return one single URI-M along with its date. - -Step 2: Providing the headers ------------------------------ - -The second thing to do is to provide Memento's HTTP headers at the web -server. - -* Add HTTP headers required by the Memento protocol to responses from the - Original Resource and its Mementos: - - - For the Original Resource, add a "Link" header that points at its TimeGate - - For each Memento, add a "Link" header that points at the TimeGate - - For each Memento, add a "Link" header that points to the Original Resource - - For each Memento, add a Memento-Datetime header that conveys the snapshot datetime - -Using the previous example, and supposing a TimeGate is running at -``http://example.com/timegate/``, Memento HTTP response headers for the -Original Resource and one Memento look as follows. |Image| - -And that's it! With the TimeGate, datetime negotiation is now possible -for these resources. - -How it works +Installation ------------ -Read the `big -picture `__ -to understand how it works and what are the requirements. - -Getting Started ---------------- - -Start by `reading the -guide `__ -for comprehensive information about how to use TimeGate for your own web -resources. +Memento TimeGate is on PyPI so all you need is: :: -Requirements ------------- + pip install -e git+https://github.com/mementoweb/timegate.git#egg=TimeGate + uwsgi --http :9999 -s /tmp/mysock.sock --module timegate.application --callable application -- `Python `__ -- `uWSGI `__ Documentation ------------- -See the `wiki `__. +The documentation is readable at http://timegate.readthedocs.io or can be built +using Sphinx: :: + + pip install timegate[docs] + python setup.py build_sphinx + -License +Testing ------- -See the -`LICENSE `__ -file. +Running the test suite is as simple as: :: -.. |Image| image:: https://raw.githubusercontent.com/mementoweb/timegate/master/docs/headers_example.png + ./run-tests.sh diff --git a/docs/advanced-features.rst b/docs/advanced-features.rst new file mode 100644 index 0000000..2a1b237 --- /dev/null +++ b/docs/advanced-features.rst @@ -0,0 +1,54 @@ +.. _advanced_features: + +TimeMaps +======== + +The TimeGate can easily be used as a TimeMap server too. ## Requirements +For that there are two requirements: + +- The Handler must implement the ``get_all_mementos(uri_r)`` function to return + the entire history of an Original Resource. + + +- The ``conf/config.ini`` file must have the variable ``use_timemap = true``. + +Resulting links +--------------- + +Once this setup is in place, the TimeGate responses' ``Link`` header +will contain two new relations, for two different formats (MIME types): + +- ``; rel="timemap"; type="application/link-format"`` + `Link TimeMaps `_ + +- ``; rel="timemap"; type="application/json"`` JSON + TimeMaps + +Where ``HOST`` is the base URI of the program and ``URI-R`` is the URI +of the Original Resource. + +Example +------- + +For example, suppose ``http://www.example.com/resourceA`` is the URI-R +of an Original Resource. And suppose the TimeGate/TimeMap server's +``host`` configuration is set to ``http://timegate.example.com`` Then, +HTTP responses from the TimeGate will include the following: + +- ``; rel="timemap"; type="application/link-format"`` +- ``; rel="timemap"; type="application/json"`` + +Now a user can request an ``HTTP GET`` on one of those link and the +server's response will have a ``200 OK`` status code and its body will +be the TimeMap. + +HandlerErrors +============= + +Custom error messages can be sent to the client using the custom +exception module: ``from errors.timegateerrors import HandlerError``. +For instance, a custom message with HTTP status ``400`` and body +``Custom error message`` can be sent using: +``raise HandlerError("Custom error message", status=400)``. Raising a +``HandlerError`` will stop the request and not return any Memento to the +client. diff --git a/docs/big-picture.rst b/docs/big-picture.rst new file mode 100644 index 0000000..28ca111 --- /dev/null +++ b/docs/big-picture.rst @@ -0,0 +1,65 @@ +.. _big_picture: + +Big picture +=========== + +Definitions +----------- + +From now on, this documentation will refer to the web server where +resources and archives are as the **web server** and to the Memento +TimeGate datetime negotiation server as the **TimeGate**. + +- Suppose you have a web resource accessible in a web server by some + URI. We call the resource the **Original Resource** and refer to its + URI as **URI-R**. +- Suppose a web server has a snapshot of what this URI-R looked like in + the past. We call such a snapshot a **Memento** and we refer to its + URI as **URI-M**. There could be many snapshots of URI-R, taken at + different moments in time, each with their distinct URI-Ms. The + Mementos do not necessary need to be in the same web server as the + Original Resources. + +Client, Server and TimeGate +--------------------------- + +This figure represents the current situation; Without date time +negotiation, the client has to find by hand the URIs for the previous +versions of a web resource. If they exists: |client_server.png| To make +this web resources Memento compliant, two things need to be added. The +new components of the systems are the TimeGate and Memento HTTP headers +at the web server's side: |client_server_tg.png| With these links, the +client now gets the address of the TimeGate when retrieving an Original +Resource or a Memento. Then, he can use datetime negotiation with the +TimeGate to get the URI of an archived version (``URI-M2``) of the +Original Resource at specific a point in time (``T2``): |sequence.png| + +Architecture +------------ + +The TimeGate will manage the framework's logic in a generic manner. +However, every web server has its specific way to store snapshots and to +construct URI-Ms. Thus, a specific plugin must be written for every web +server. Such a plugin is called a handler. A handler will typically talk +to an API to return the list of URI-Ms given a URI-R, but there are +several alternatives to this setup. + +.. figure:: architecture.png + :alt: architecture.png + + architecture.png + +The system can be seen as three components. + +- The Memento user who wishes to retrieve an older version of a + resource +- The web server where the active version (original URI) and revisions + (mementos) can be accessed. This entity must provide a way to access + these versions. Typically through an API. +- The TimeGate which itself is composed of two main elements: +- One API-specific handler +- The generic TimeGate code + +.. |client_server.png| image:: client_server.png +.. |client_server_tg.png| image:: client_server_tg.png +.. |sequence.png| image:: sequence.png diff --git a/docs/cache.rst b/docs/cache.rst new file mode 100644 index 0000000..a3df38d --- /dev/null +++ b/docs/cache.rst @@ -0,0 +1,59 @@ +.. _cache: + +Cache +===== + +The TimeGate comes with a built-in cache that is activated by default. Change +this behavior editing in the configuration file. See :ref:`configuration`. + +Populating the cache +-------------------- + +The cache stores TimeMaps which is the return values of the handler +function ``get_all_mementos()`` only: - If the Handler does not have +``get_all_mementos()`` implemented, the cache will never be filled. - If +the Handler has both the functions ``get_all_mementos()`` and +``get_memento()``, only TimeMap requests will fill the cache. All +TimeGate requests will use ``get_memento()`` which result will not be +cached. + +Cache HIT conditions +-------------------- + +- Cached TimeMaps can be used used to respond to a TimeMap request from + a client if it is fresh enough. The tolerance for freshness can be + defined in the configuration file. +- Cached TimeMap can also be used to respond to a TimeGate requests + from a client. In this case, it is not the request's time that must + lie within the tolerance bounds, but the requested datetime. + +Force Fresh value +----------------- + +If the request contains the header ``Cache Control: no-cache``, then the +TimeGate will not return anything from cache. + +Example +------- + +Suppose you have a TimeMap that was cached at time ``T``. Suppose you +have a tolerance of ``d`` seconds. A TimeMap request arrives at time +``R1``. A TimeGate request arrives at time ``R2`` with requested +datetime j. This request does **not** contain the header +``Cache Control: no-cache``. - A TimeMap request will be served from +cache only if it arrives within the tolerance: ``R1 <= T+d``. - A +TimeGate request will be served from cache only if the requested +datetime happens within the tolerance: ``j <= T+d``, no matter ``R2``. +This means that even if a cached value is old, the cache can still +respond to TimeGate requests for requested datetimes that are until time +``T+d``. - All other requests will be cache misses. + +Cache size +---------- + +There is no "maximum size" parameter. The reason for this is that the +cache size will depend on the average size of TimeMaps, which itself +depends on the length of each URI-Ms it contains, and their average +count. These variables will depend on your system. The cache can be +managed using the ``cache_max_values`` parameter which will affect +indirectly its size. diff --git a/docs/configuration.rst b/docs/configuration.rst new file mode 100644 index 0000000..839a594 --- /dev/null +++ b/docs/configuration.rst @@ -0,0 +1,87 @@ +.. _configuration: + +Configuring the server +====================== + +Edit the `config +file `__: +``conf/config.ini``. + +Mandatory field +--------------- + +``host`` Is the server's base URI. This is the URI on which the TimeGate +is deployed. No default value. + +Example: - Suppose TimeGate is running at ``http://tg.example.com`` and +``URI-R`` refers to an Orignal Resource's URI. + +- The program will respond to TimeGate requests at + ``http://tg.example.com/timegate/URI-R`` + +- The program will respond to ``TimeMap`` requests at + ``http://tg.example.com/timemap/link/URI-R`` and + ``http://tg.example.com/timemap/json/URI-R`` if the feature is enabled. + See :ref:`advanced_features`. + +Important field +--------------- + +``is_vcs`` The type of archive affects the best Memento selection +algorithm. Default ``false``. - When ``false``, the history is +considered to be snapshots taken at some points in time, thus the best +memento is the *absolute* closest to the requested date. - When +``true``, the history the handler returns is considered to be from a +version control system. In other words, the history represents every +change that was made to the Original Resource and the exact datetimes of +the change. In this case, the best Memento for a requested datetime T +will be the closest *before* T. + +Other fields +------------ + +- ``handler_class`` (Optional) Python module path to a handler class. + This is useful if the handler is composed of several classes or to + quickly switch between handlers. If this parameter is not provided, + the program will search for handler classes in ``core.handler``. For + example: + ``handler_class = core.handler_examples.wikipedia.WikipediaHandler`` +- ``api_time_out`` Time, in seconds, before a request to an API times + out when using the ``Handler.request()`` function. Default 6 seconds +- ``base_uri`` (Optional) String that will be prepended to requested + URI if missing. This can be used to shorten the request URI and to + avoid repeating the base URI that is common to all resources. Default + empty +- For example, suppose the TimeGate is deployed at + ``http://tg.example.com`` +- Suppose every Original Resources ``URI-Ri`` has the following format + ``http://resource.example.com/res/URI-Ri`` +- Then, Setting ``base_uri = http://resource.example.com/res/`` will + allow short requests such as for example + ``http://tg.example.com/timegate/URI-Ri`` instead of + ``http://tg.example.com/timegate/http://resource.example.com/res/URI-Ri``. +- ``use_timemap`` When ``true``, the TimeGate adds TimeMaps links to + its (non error) responses. Default ``false`` + +Cache parameters: +----------------- + +- ``cache_activated`` When ``true``, the cache stores the entire + history of an Original Resource from handlers that allows batch + ``get_all_mementos(uri_r)`` requests. It can then respond from cache + if the value is fresh enough. If a requests contains the header + ``Cache-Control: no-cache`` the server will not respond from cache. + When ``false`` the cache files are not created. Default ``true``. +- ``cache_refresh_time`` tolerance in seconds, for which it is assumed + that a history didn't change. Any TimeGate request for a datetime + past this (or any TimeMap request past this) will trigger a refresh + of the cached history. Default 86400 seconds (one day). +- ``cache_directory`` Relative path for data files. Do not add any + other file to this directory as they could be deleted. Each file + represents an entire history of an Original Resource. Default + ``cache/``. +- ``cache_max_values`` Maximum number of URI-Rs for which its entire + history is stored. This is then the number of files in the + ``cache_directory``. Default 250. + +See :ref:`cache`. diff --git a/docs/getting-started.rst b/docs/getting-started.rst new file mode 100644 index 0000000..35f8bf3 --- /dev/null +++ b/docs/getting-started.rst @@ -0,0 +1,103 @@ +Getting Started +=============== + +Memento TimeGate +---------------- + +TimeGate is a `WSGI `__ +application server that allows simple implementation of +`Memento `__ capabilities for web resources +having accessible revisions. It manages all the content negotiation +logic, from request processing, best memento query and selection to HTTP +response. + +To make web resources that is accessible on a web server fully Memento +compliant, two things need to be done. - TimeGate is generic: a custom +handler must be plugged in to match the specific web server. - The +Memento framework uses specific HTTP headers: they must be added to the +resource's web server responses. + +Steps +----- + +The big picture +~~~~~~~~~~~~~~~ + +The first thing to do is to understand how the program is +structured. See :ref:`big_picture`. + +Installing the server +~~~~~~~~~~~~~~~~~~~~~ + +The code can be obtained +`here `__. Download a +zip or tar.gz archive into a directory of your choice. + +Decompress the zip files using: + +.. code:: bash + + $ unzip timegate-.zip + +Decompress tar.gz files using: + +.. code:: bash + + $ tar xvzf timegate-.tar.gz + +Install the dependencies using: + +.. code:: bash + + $ echo 'uWSGI>=2.0.3 ConfigParser>=3.3.0r2 python-dateutil>=2.1 requests>=2.2.1 werkzeug>=0.9.6 lxml>=3.4.1' | xargs pip install + +Running the TimeGate +~~~~~~~~~~~~~~~~~~~~ + +Then try starting the TimeGate server with one of the handler that is +already provided. To run it, first navigate to the directory: + +.. code:: bash + + $ cd timegate- + +Then, there are two possibilities: - Either execute +``uwsgi --http :9999 --wsgi-file core/application.py --master`` to +deploy the TimeGate on ``localhost:9999``. Add the option +``--pidfile /path/to/file.pid`` to store the process ID in a file. - Or +edit the uWSGI launch configuration in ``conf/timegate.ini`` and then +execute ``uwsgi conf/timegate.ini`` + +To stop the server: - Simply use ``CTRL+C`` if it is running in +foreground. - Or execute ``uwsgi --stop /path/to/file.pid`` if you have +stored the PID to run it in the background. - If by mistake the PID is +not stored but the TimeGate is still running, list all uwsgi processes +using ``ps ux | grep uwsgi``, identify the TimeGate process from the +``COMMAND`` column and kill it using ``kill -INT ``. + +Handler +~~~~~~~ + +Once the server is successfully running with an example handler that was +provided, edit it or create a new one (see :ref:`handler`) that returns the list +of all URI-Ms given a URI-R of an Original Resource you wish to make Memento +compliant. + +Memento Headers +~~~~~~~~~~~~~~~ + +The Memento protocol mainly works with HTTP headers. Now add the required +headers (see :ref:`http_response_headers`) to your web server's HTTP responses. + +Configuring the TimeGate +~~~~~~~~~~~~~~~~~~~~~~~~ + +Finally, enter the TimeGate's ``HOST`` location in the ``config.ini`` (see +:ref:`configuration`) file. Also edit the other parameters' default values to +your preferences. + +Memento compliance +~~~~~~~~~~~~~~~~~~ + +That's it. The basic Memento functionalities are here and your web +server is now Memento compliant. See :ref:`advanced_features`. diff --git a/docs/handler.rst b/docs/handler.rst new file mode 100644 index 0000000..30daa80 --- /dev/null +++ b/docs/handler.rst @@ -0,0 +1,112 @@ +.. _handler: + +Resources-specific Handler +========================== + +A handler is a python class that is plugged into the generic TimeGate to +fit any specific technique a web server has to manage its Original +Resources and Mementos. Its role is simple: to retrieve the list of +URI-Ms (with their archival dates) given a URI-R. It typically does so +by connecting to an API. + +Alternatives +------------ + +- If no API is present: The list can be retrieved from many different + ways. Page scraping, rule-based or even in a static manner. Anything + will do. +- If the history cannot be retrieved entirely: The handler can + implement an alternative function that returns one single URI-M and + its archival datetime given both URI-R and the datetime the user + requested. +- If the TimeGate's algorithms that select the best Memento for a + requested date do not apply to the system: Implementing the + alternative function could also be used to bypass these algorithms. + This is particularly useful if there are performance concerns, + special cases or access restriction for Mementos. + +Requirements +------------ + +.. image:: code_architecture.png + +A handler require to have the following: + +- It must a python file placed in the ``core.handler`` module (which is + the ``core/handler/`` folder). And it must be unique. If several + classes are needed, or to switch quickly between handlers, consider + adding the handler module path manually in the configuration + file. (See :ref:`configuration`.) +- A handler must extend the ``core.handler_baseclass.Handler`` + base-class. +- Implement at least one of the following: + + - ``get_all_mementos(uri_r)`` class function: This function is called + by the TimeGate to retrieve the history an original resource + ``uri_r``. The parameter ``uri_r`` is a Python string representing + the requested URI-R. The return value must be a list of 2-tuples: + ``[(uri_m1, date1), (uri_m2, date2), ...]`` . Each pair + ``(uri_m, date)`` contains the URI of an archived version of R + ``uri_m``, and the date at which it was archived ``date``. + - ``get_memento(uri_r, requested_date)`` class function (alternative): + This function will be called by the TimeGate to retrieve the best + Memento for ``uri_`` at the date ``date``. Use it if the API cannot + return the entire history for a resource efficiently or to bypass the + TimeGate's best Memento selection. The parameter ``uri_r`` is a + Python string representing the requested URI-R. The parameter + ``date`` is a Python ``datetime.DateTime`` object. In this case, the + return value will contain only one 2-tuple: ``(uri_m, date)`` which + is the best memento that the handler could provide taking into + account the limits of the API. + +- Input parameters: + + - All parameter values ``uri_r`` are Python strings representing the + user's requested URI-R. + - All parameter values ``requested_date``\ are ``datetime.DateTime`` + objects representing the user's requested datetime. + +- Output return values: + + - All return values ``uri_m`` must be strings. + - All return values ``date`` must be strings representing dates. Prefer + the `ISO 8601 `__ format for + the dates. + +- Note that: + + - If both functions are implemented, + ``get_memento(uri_r, requested_date)`` will always be used for + TimeGate requests. + - If the TimeMap advanced feature (see :ref:`advanced_features`) is enabled, + ``get_all_mementos(uri_r)`` must be implemented. + +Example +------- + +A simple example handler is provided in\ ``core/handler/`` and can be +edited to match your web server's requirements: - See +`example.py `__ +Which returns static lists. + +Other handlers examples are provided for real world APIs in +``core/handler_examples/`` for instance: + +- `arXiv.py + `__ + Where the Original Resources are the e-prints of http://arxiv.org/ - +- `wikipedia.py + `__ + Where the Original Resources are the articles of https://www.wikipedia.org/ +- `github.py + `__ + Where the Original Resources are the repositories, trees (branches and + directories), files and raw files. + +Other scraping Handlers examples are provided for real world resources +without any API: + +- `can.py + `__ + Where the Original Resources are the archives stored in + http://www.collectionscanada.gc.ca/webarchives/ diff --git a/docs/http-response-headers.rst b/docs/http-response-headers.rst new file mode 100644 index 0000000..e365568 --- /dev/null +++ b/docs/http-response-headers.rst @@ -0,0 +1,54 @@ +.. _http_response_headers: + +Memento and HTTP +================ + +The Memento framework requires specific HTTP headers in order to work +properly. They must be added to the server's response headers for any +Original Resources or Mementos request. + +Intuitively, a user needs to be able to know which server to contact to +do the time negotiation. Hence a link to the TimeGate is needed from +both the Original Resource and the Mementos. Additionally, a Memento is +defined by an Original Resource it is the snapshot of, and the date time +at which it was created. Thus, it carries a link to its Original +Resource and a datetime information. + +Example +------- + +Let's take the following example: Suppose a server is handling requests +for the following URIs: + +.. image:: uris_example.png + +Each time a server responds to requests for any of these URIs, standards +HTTP headers are returned. With Memento, the following headers are +added: - For the Original Resource, add a "Link" header that points at +its TimeGate - For each Memento, add a "Link" header that points at the +TimeGate - For each Memento, add a "Link" header that points to the +Original Resource - For each Memento, add a Memento-Datetime header that +conveys the snapshot datetime + +Using the previous example, and supposing a TimeGate server is running +at ``http://example.com/timegate/``, Memento HTTP response headers for +the Original Resource and one Memento look as follows: + +.. image:: uris_example.png + +To sum up +--------- + +- The ``Memento-Datetime:`` header is a Memento-specific header which + value is the `rfc1123 `__-date of + the Memento. +- It must be included in any response to a Memento request. +- It cannot be in an Original Resource response. +- The ``Link:`` header is a standard header to which new values are + added. +- A link to the TimeGate with relation ``rel="timegate"`` must be + included in all Memento and Original Resource responses. +- A link to the Original Resource with relation ``rel="original"`` must + be included in all Memento responses. +- Link with relation ``rel="original"`` cannot be in an Original + Resource response. diff --git a/docs/index.rst b/docs/index.rst index b8a3a0c..d737ee3 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -8,6 +8,7 @@ .. include:: ../README.rst + :end-before: Installation User's Guide ------------ @@ -18,9 +19,16 @@ TimeGate. .. toctree:: :maxdepth: 2 + introduction installation - usage - + big-picture + getting-started + memento + http-response-headers + handler + configuration + cache + advanced-features API Reference ------------- diff --git a/docs/introduction.rst b/docs/introduction.rst new file mode 100644 index 0000000..8bc2df2 --- /dev/null +++ b/docs/introduction.rst @@ -0,0 +1,76 @@ +Introduction +============ + +Introduction +------------ + +In order to support Memento, a web server must obviously have accessible +archives of its online resources. And it must also have a piece of +software that handles the datetime negotiation according to the Memento +protocol for those resources. + +But in such datetime negotiation server, only a small proportion of the +code is specific to the particular web resources it handles. The main +part of logic will be very similar throughout many implementations. +TimeGate isolates the core components and functionality. With it, +there's no need to implement, or to re-implement the same logic and +algorithms over and over again. Its architecture is designed to accept +easy-to-code plugins to match any web resources. + +From now on, this documentation will refer to the web server where +resources and archives are as the **web server** and to the Memento +TimeGate datetime negotiation server as the **TimeGate**. + +- Suppose you have a web resource accessible in a web server by some + URI. We call the resource the **Original Resource** and refer to its + URI as **URI-R**. +- Suppose a web server has a snapshot of what this URI-R looked like in + the past. We call such a snapshot a **Memento** and we refer to its + URI as **URI-M**. There could be many snapshots of URI-R, taken at + different moments in time, each Memento i with its distinct URI-Mi. + The Mementos do not necessary need to be in the same web server as + the Original Resources. + +Example +------- + +.. figure:: uris_example.png + +There are only two steps to make such resource Memento compliant. + +Step 1: Setting up TimeGate +--------------------------- + +The first thing to do is to set up the TimeGate for the specific web +server. + +* Run the TimeGate with your custom handler. The handler is the + piece of code that is specific to how the web server manages Original + Resources and Mementos. It needs to implement either one of the + following: + + - Given a URI-R, return the list of URI-Ms along with their respective dates. + - Given a URI-R and a datetime, return one single URI-M along with its date. + +Step 2: Providing the headers +----------------------------- + +The second thing to do is to provide Memento's HTTP headers at the web +server. + +* Add HTTP headers required by the Memento protocol to responses from the + Original Resource and its Mementos: + + - For the Original Resource, add a "Link" header that points at its TimeGate + - For each Memento, add a "Link" header that points at the TimeGate + - For each Memento, add a "Link" header that points to the Original Resource + - For each Memento, add a Memento-Datetime header that conveys the snapshot datetime + +Using the previous example, and supposing a TimeGate is running at +``http://example.com/timegate/``, Memento HTTP response headers for the +Original Resource and one Memento look as follows: + +.. image:: headers_example.png + +And that's it! With the TimeGate, datetime negotiation is now possible +for these resources. diff --git a/docs/memento.rst b/docs/memento.rst new file mode 100644 index 0000000..45aa8b9 --- /dev/null +++ b/docs/memento.rst @@ -0,0 +1,54 @@ +Memento Framework +================= + +Resources on the web change over time. While many server keep archives +of what these resources looked like in the past, it is often difficult +for the user to retrieve the URI of such an archive for a specific point +in time. + +The `Memento Framework `__ leverages the +need for the user to do the search by hand. + +Components +---------- + +- Suppose a web resource is located at some URI. We call the resource + the **Original Resource** and refer to its URI as the **URI-R**. This + is the resource for which a user wants to find a prior version. +- A prior version of an Original Resource is called a **Memento** and + we refer to its URI as the **URI-M**. There could be many Mementos + for one Original Resource. Each having its own URI-Mi and each + encapsulating the state of the Original Resource at a specific point + in time. +- The **TimeGate** is the application which selects the best Memento of + an Original Resource for a given datetime. This is where datetime + negotiation happens. + +Requirements +------------ + +- The first requirements is that Original Resources and Mementos must + be accessible through their respective and unique URIs. +- Also, the framework operates using HTTP headers to work. Headers of requests + from/to the TimeGate are taken care of. However, Original Resources and + Mementos require the add of new headers. (See :ref:`http_response_headers`.) + +The Generic TimeGate +-------------------- + +The TimeGate is where most of the Memento magic happens. And its +implementation is likely to be extremely close from one server to +another. In this sense, its processing of HTTP requests / responses +headers, its algorithms and logic can be abstracted and made generic. +The only thing server-specific is the management of URIs and datetimes. +To do that, this TimeGate can fit any web resource if it is provided a +way to retrieve a history of a specific Original Resource. This is made +using a custom handler. (See :ref:`handler`.) + +More about Memento +------------------ + +- Details about Memento are available in the `RFC + 7089 `__. +- A `quick intro `__ is + available on Memento's website. diff --git a/docs/usage.rst b/docs/usage.rst deleted file mode 100644 index 00abfa8..0000000 --- a/docs/usage.rst +++ /dev/null @@ -1,12 +0,0 @@ -.. - This file is part of TimeGate - Copyright (C) 2016 CERN. - - TimeGate is free software; you can redistribute it and/or modify - it under the terms of the Revised BSD License; see LICENSE file for - more details. - -Usage -===== - -.. automodule:: timegate