-
Notifications
You must be signed in to change notification settings - Fork 25
BestPractice
Always re-use existing APIs if they exist! No need to write our own in most cases. While remaining mindful of the dependencies we are committing to.
Sometimes a streaming approach is possible (i.e. dump triple(s) for every line, then discard line)
In some cases it may be necessary to load all lines into an in-memory model, but this should in general be avoided
TODO: investigate best lib to use
One possibility is to use xslt but in my experience this leads to scalability issues, and a programmatic approach is usually best
Slurp all into memory vs SAX-type approach?
i.e.
http://pythonhosted.org/generateDS/
It may be possible to 'convert' JSON by simply providing a JSON-LD context. Then it will naturally translate to RDF (e.g. via Apache-Jena RIOT - possibly also python rdflib equivalent)
We want to avoid writing our own ORMs. We must first ask - if the SQL db is widely used (e.g. ENSEMBL) is there an existing API we can use?
See Beautiful Soup. Also if already ingested into DISCO continue with disco2turtle rout for now