Dendro 0.2 "Bambusa" : New identifiers, Cut/Paste operations, Social Dendro, CKAN exporting
Pre-release
Pre-release
Dendro v0.2-beta "Bambusa"
New features
The Dendro Research Data Management platform now provides a complete overhaul of the database, a social timeline, automatic data extraction and more!
Database and abstraction layer overhaul
- The identifiers for every resource have been migrated to not convey meaning. This was a huge change to the entire database layer, because we replaced identifiers like
/project/project1/data/folder1/file2.txt
with /r/file/[[UUID]]. This change was necessary because if a file was moved between folders, the identifiers would break. As a consequence, the API suffered a great overhaul, not depending anymore on Regex's to determine the types of the resources to be modified. - Queries are now based openlink/virtuoso7's JDBC interface. Greatly reduced the overhead of HTTP connections and JSON serialization / deserialization to-from the Virtuoso database. Much lower CPU usage. The original reason for this was that Virtuoso's HTTP SPARQL endpoint would randomly throw 404 errors during tests and operation too!
- Cache was greatly optimized. Ontologies are loaded on startup and saved into cache for faster bootups and access
- Resource fetching by type now uses MongoDB as cache to reduce accesses to Virtuoso whenever possible.
Projects
- Project archival via BagIt 0.97 format (with an accepted pull request to little9/gladstone.
- Project restore via BagIt 0.97 format: Users can import a new Dendro project from a BagIt file produced by another Dendro. Other BagIt files can be imported, but metadata in the file and folder hierarchy will not be restored.
- Projects can now be deleted.
Files
- Files and folders can now be moved, even between projects, if the user has permission to modify both the source and target project.
- Files and folders can now be renamed
- Rewrote the file upload area interface
- Progress bars for MD5 calculation during file uploads (library pull request)
Users
- Users are now given a generated avatar by default (similar to GitHub's generated thumbnails)
- Contributors can be added to a project via an autocomplete box in the administer screen that searches for their username (no more uri pasting!)
Dataset exporting
- Exporting of a Dendro folder to a CKAN instance as a new dataset has been completely rewritten by @NelsonPereira1991 and covered by tests. Now supports dataset refreshing after the initial upload to CKAN and correct file-level metadata.
- Exporting to EUDAT's B2Share has been updated, with an in-house developed client https://github.com/feup-infolab/node-b2share-v2 for the API being used to export datasets from Dendro to the platform.
Data extraction
- Dendro now automatically extracts data from certain file types (.xls, .csv, etc.) and places the data in a separate mongodb database. One collection is created per file with data, allowing for very huge files (arbitrary number of lines) to be processed in a streaming fashion. The data previews were also overhauled and already use this new "data paging" API. Before this, they would hang whenever a large file was being previewed.
Privacy in searchable contents
- Whenever the creator of a project (its administrator) makes changes to the privacy settings of the project, all resources are reindexed automatically so that files and folders start showing up in the search results (if the new visibility is public or metadata only). If the project's visibility was changed to "private", all resources are removed from the index so that they no longer show up in search results. This functionality was covered by tests.
New features for administrators
- Logs are now accessible via the administration screen (/administer) allowing the Dendro administrator to see the server logs directly from the browser
- Dendro administrators can now edit the server configuration directly from the browser using a JSON editor and also reboot the server to apply changes immediately. Careful administrators, because if you make a mistake your Dendro will be unusable until you fix it via SSH. Dendro automatically makes a backup of the configurations whenever you make a change, so it is a relatively easy fix.
Social Dendro
- Dendro now shows a separate timeline of interactions and uploads over the projects that a user participates in. Called Social Dendro, this extension stemmed from @NelsonPereira1991's work on his Master's thesis to mature into an interesting view on projects activity. This extension shows a timeline of posts, comments and likes over the activity of users of the projects where a user participates. Check it out!
Code Quality
- We have implemented ESLint rules and the entire code has been linted, with configurations added in the different folders of the project.
Logging
- Logging is now taken care by winstonjs. We have different log levels and output to files and console, with separate error and info logs in the
/logs
folder, separated also by running environment (test/development/production)
Technical
- 53% Test Coverage
- Codacy Rating: A
- Updated dependencies to current ones
- Tests currently run in Jenkins in 3m33m average, on a 2.5 GHz Core 2 Quad Q9300 machine with 5GB RAM and a 480GB OCZ SSD (affectionately called "Chaço", meaning "Junker" in Portuguese).