-
Notifications
You must be signed in to change notification settings - Fork 19
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Supporting Neptune in the Databuilder (#13)
* RFC for adding Neptune support to the Databuilder. Signed-off-by: Andrew <andrjc4@vt.edu> * Fix typos Signed-off-by: Andrew <andrjc4@vt.edu> * update with pr info Signed-off-by: Andrew <andrjc4@vt.edu> * update with pr info Signed-off-by: Andrew <andrjc4@vt.edu>
- Loading branch information
1 parent
193f897
commit 6a18663
Showing
1 changed file
with
61 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,61 @@ | ||
- Feature Name: Amazon Neptune Databuilder support | ||
- Start Date: 2020-11-10 | ||
- RFC PR: [amundsen-io/rfcs#13](https://github.com/amundsen-io/rfcs/pull/13) | ||
- Amundsen Issue: [amundsen-io/amundsen#0000](https://github.com/amundsen-io/amundsen/issues/0000) (leave this empty for now) | ||
|
||
# Amazon Neptune Databuilder Support | ||
|
||
## Summary | ||
|
||
This RFC proposes introducing support for Amazon's GraphDB Neptune. | ||
|
||
## Motivation | ||
|
||
As of now Amundsen only supports Neptune in the metadata proxy. This RFC proposes to add Neptune support to the databuilder so that Amundsen fully supports Neptune throughout its stack. | ||
|
||
## Guide-level Explanation (aka Product Details) | ||
|
||
Currently the Amundsen databuilder library only has support for the Neo4j datastore. The goal of this RFC is to add additional loaders, publishers, and serializers to the library suite so that Neptune is supported. The goal is to maintain the same interfaces so that switching between neo4j and Neptune is as easy as switching the components. | ||
|
||
## UI/UX-level Explanation | ||
|
||
Not Applicable | ||
|
||
## Reference-level Explanation (aka Technical Details) | ||
|
||
To support Neptune in the databuilder. Several new components are needed: | ||
|
||
- A Neptune serializer which converts `GraphNodes` and `GraphRelationships` into the format that the Neptune's bulk data loader expects. | ||
|
||
- A `FsNeputuneCSVLoader` similar to the `FsNeo4jCSVLoader` which writes the GraphNodes and GraphRelationships into CSVs that can be consumed by the publisher. | ||
|
||
- A `NeputuneCsvBulkPublisher` which takes the CSVs generated by the `FsNeputuneCSVLoader` and publishes them to Neptune. The process of publishing can be broken down into 2 steps: | ||
1. Uploading the CSV files to Amazon's S3. | ||
2. Making a request to the Neptune's bulk loader endpoint pointing at the s3 files. (details can be found https://docs.aws.amazon.com/neptune/latest/userguide/bulk-load.html) | ||
Thanks to the team at Square most of the process of publishing Amundsen data to Neptune is already implemented in the Neptune bulk loader API found in the repo https://github.com/amundsen-io/amundsengremlin. | ||
|
||
- Adding the amundsengremlin repo as a dependency. | ||
|
||
- Tests supporting Neptune models and loader and publisher. | ||
|
||
## Drawbacks | ||
|
||
The RFC adds support for another datastore which brings in additional components and increases the code size of the repo. In addition the https://github.com/amundsen-io/amundsengremlin repo will be added as a dependency which brings in its own complexities as well. | ||
|
||
|
||
## Alternatives | ||
|
||
No action is the main alternative here. The dependencies from https://github.com/amundsen-io/amundsengremlin could be separated so that the metadataproxy and databuilder don't have the same requirements but it seems unnecessary as of now. | ||
|
||
## Prior art | ||
|
||
N/A | ||
|
||
## Unresolved questions | ||
|
||
N/A | ||
|
||
|
||
## Future possibilities | ||
|
||
None. |