Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Proposal: Support MySQL as metadata store in Databuilder #760

Closed
xuan616 opened this issue Oct 17, 2020 · 6 comments
Closed

Feature Proposal: Support MySQL as metadata store in Databuilder #760

xuan616 opened this issue Oct 17, 2020 · 6 comments
Labels
keep fresh Disables stalebot from closing an issue

Comments

@xuan616
Copy link
Member

xuan616 commented Oct 17, 2020

My workplace is going to use Amundsen with MySQL as metadata store but currently, Amundsen only supports Graph DB as the backend metadata store. We plan to do the dev work to integrate Amundsen with MySQL. There would be some work to do in databuilder and metaservice and I created this issue for any possible work needed to do in databuilder.

Expected Behavior or Use Case

The workflow would be similar to the existing one and backend store would have relational DB options, like MySQL

Service or Ingestion ETL

Data loader, publisher and models in Databuilder

Possible Implementation

  1. Currently, Amundsen does not support any relational DB as backend store, so we have to support ORM first with 'SQLAlchemy' for relational DBs. Considering the possible metadata schema change, we have to add a feature to support DB version management with 'alembic'.

  2. We may need new models that work with ORM and probably different fields from the existing model. I would also like to know if it is possible to update all current models to let them work with both graph and relational DBs? I assume we still use current ETL logic for the metadata to be pushed to MySQL, so for the model used for mysql, is it good to develop new iterator working for each row instead of 'node' and 'relationship' in model?

  3. Add data loader that work for MySQL

  4. Add MySQL publisher

All the above is possible implementation looking forward a better solution. If I missed anything or the proposal is not proper, please advise. Thanks.

Example Screenshots (if appropriate):

Context

We are not sure if there is a plan to support relation DB from Amundsen side and in order to use Amundsen and MySQL as the metadata engine in my workplace, we plan to do the dev work to support MySQL as backend store.

@feng-tao
Copy link
Member

will take a look next week.

@feng-tao
Copy link
Member

also we should create a rfc in https://github.com/amundsen-io/rfcs . And it would be good to be more explicit about how to map the graph model to ORM model .

In terms of databuilder, @AndrewCiambrone has a proposal( amundsen-io/rfcs#5) to make the entity more generic.

@feng-tao
Copy link
Member

here are a few additional thoughts:
1). we initial target graph model because it is easier to model metadata entity connection through a graph (e.g we could easily add more additional structure metadata(e.g update time, create time, etc) for the entity by just adding a new node with connection to the entity(e.g dataset). It would be good to understand what is ORM model looks like (e.g will it be a denormalized model which each entity has all the metadata inside a single schema?) This would be good to be more explicit for the RFC.
2) Depends on 1, I haven't thought about how to modify the existing model to an ORM model. cc @jinhyukchang as well

@xuan616
Copy link
Member Author

xuan616 commented Oct 19, 2020

@feng-tao Thanks for your reply. Yes, Graph DB has some advantages on metadata store and it is good to know the work of @AndrewCiambrone, which will bring more generic graph db models. For the ORM model, we will probably have to make a trade-off between the normalized and denormalized models for metadata store due to some concerns, like performance, consistency issues. Also, the model will need support schema upgrade to handle possible metadata structure change. I am trying to figure out how to map graph DB model to ORM model, probably with the incoming abstraction layer for an explicit plan.

dorianj pushed a commit to dorianj/amundsen that referenced this issue Apr 25, 2021
Signed-off-by: Marcos Iglesias Valle <golodhros@gmail.com>
@stale
Copy link

stale bot commented May 6, 2021

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

@stale stale bot added the stale label May 6, 2021
feng-tao pushed a commit that referenced this issue May 7, 2021
Signed-off-by: Marcos Iglesias Valle <golodhros@gmail.com>
@dorianj dorianj added the keep fresh Disables stalebot from closing an issue label May 19, 2021
@stale stale bot removed the stale label May 19, 2021
zacr pushed a commit to SaltIO/amundsen that referenced this issue May 13, 2022
Signed-off-by: Marcos Iglesias Valle <golodhros@gmail.com>
hansadriaans pushed a commit to DataChefHQ/amundsen that referenced this issue Jun 30, 2022
Signed-off-by: Marcos Iglesias Valle <golodhros@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keep fresh Disables stalebot from closing an issue
Projects
None yet
Development

No branches or pull requests

4 participants