Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add write_dwc() function #257

Merged
merged 39 commits into from
Dec 9, 2022
Merged

Add write_dwc() function #257

merged 39 commits into from
Dec 9, 2022

Conversation

peterdesmet
Copy link
Member

Implements #256

@jreubens @jonasmortelmansvliz @damianooldoni @jdpye @sarahcd here's a first version of the write_dwc() function that transforms acoustic telemetry data to Darwin Core that can be harvested by OBIS and GBIF. It follows the recommendations we discussed, which are based on conventions introduced for Movebank data in the movepub write_dwc() function. An eml.xml file is not created.

You can test it by installing:

devtools::install_github("inbo/etn#257")

@jdpye @sarahcd this will work on LifeWatch RStudio Server only for now, but you can see the SQL file in this PR to see the transformation.

Then:

library(etn)
con <- connect_to_etn()
write_dwc(
  animal_project_code = "2014_demer" # or other project
  # optional parameters are: directory, rights_holder, license

If all goes well, a dwc_occurrence.csv will be written to disk.

@peterdesmet
Copy link
Member Author

peterdesmet commented Nov 23, 2022

Note: dataGeneralizations currently has the fixed value subsampled by hour. The goal is to make this subsampled by hour: first of 3 record(s) cf. Movebank.

Calculating the number of records per group is however much slower than the current approach. Do you consider it worth to provide this information at the cost of performance? Update: currently blocked by #259.

Using the function on project 2013_albertkanaal which has 6 million records currently takes 39 secs.

@peterdesmet
Copy link
Member Author

For those who cannot run the function, here's the DwC file dwc_occurrence.csv for the following query:

write_dwc(animal_project_code = "2014_demer", rights_holder = "INBO")

@peterdesmet
Copy link
Member Author

peterdesmet commented Nov 29, 2022

Regarding #257 (comment):

Count per group is now included as subsampled by hour: first of 6 record(s) (cf. Movebank). Duplicates are excluded. The function takes the same amount of time as previously (35 seconds for a big project like 2013_albertkanaal).

@peterdesmet peterdesmet mentioned this pull request Nov 29, 2022
59 tasks
@peterdesmet
Copy link
Member Author

Only a couple of pending questions to finalize this: #256 (comment)

@peterdesmet peterdesmet requested a review from PietrH December 2, 2022 08:11
@peterdesmet
Copy link
Member Author

peterdesmet commented Dec 2, 2022

All Darwin Core mapping issues are resolved (#256).

@jreubens @jonasmortelmansvliz @jdpye note that the current implementation only exports an Occurrence core (cf. Movebank data), not an Extended measurement or fact extension. The latter could be added in the future, but it should then be discussed what to express in there (scope potentially huge).

This PR is pending

  • Technical review by @PietrH
  • Minor version bump
  • Rebuilding of website

The requested views have been removed from the database, see #226
R/write_dwc.R Outdated Show resolved Hide resolved
Copy link
Member

@PietrH PietrH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging I suggest the following changes:

  • replace expect_equal with expect_identical where appropriate
  • Allow write_dwc() to return objects rather than files

The latter similar to inbo/camtraptor#181

R/utils.R Outdated Show resolved Hide resolved
tests/testthat/test-check_value.R Outdated Show resolved Hide resolved
@peterdesmet
Copy link
Member Author

@PietrH all comments addressed.

@peterdesmet peterdesmet requested a review from PietrH December 9, 2022 14:38
Copy link
Member

@PietrH PietrH left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@peterdesmet peterdesmet merged commit 6a302d7 into main Dec 9, 2022
@peterdesmet peterdesmet deleted the dwc branch December 9, 2022 15:44
@sarahcd
Copy link

sarahcd commented Jan 6, 2023

Super belatedly, I checked out the sql and example output dwc_occurrence (without access to the LifeWatch server), it all looks sensible to me. Happy to coordinate with ETN at any point on aligning future changes with Movebank. Thank you for all the work @peterdesmet !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

with changes to helper functions, collapse_transformer() is no longer in use
4 participants