Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrapping around galah functions #207

Closed
fontikar opened this issue Sep 21, 2023 · 2 comments
Closed

Wrapping around galah functions #207

fontikar opened this issue Sep 21, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@fontikar
Copy link

fontikar commented Sep 21, 2023

Describe the bug
Hi {galah} team 👋
Firstly I want to say, I love {galah} as an interface to GBIF nodes, I use it all the time for my work 😄
So much so, I wanted to build my own wrapper function around {galah} functions so I don't have to type out the same query every time I was download an update of the data.

My function looks like this: I have noted galah in my DESCRIPTION file to import various functions from galah.

#' Default ALA query
#'
#' @param taxa 
#' @param years 
query <- function(taxa, years){
  galah::galah_call() |> 
  galah::galah_identify(taxa) |> 
  galah::galah_filter(
    spatiallyValid == TRUE, 
    species != "",
    decimalLatitude != "",
    year == years,
    basisOfRecord == c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN")
  ) |> 
  galah::galah_select(
    recordID, species, genus, family, decimalLatitude, decimalLongitude, 
    coordinateUncertaintyInMeters, eventDate, datasetName, basisOfRecord, 
    references, institutionCode, recordedBy, outlierLayerCount, isDuplicateOf,sounds
  )
}

Unfortunately, my query() function returns a strange error 😞

Error in `FUN()`:
! Can't subset columns with `galah::galah_filter(...)`.
✖ `galah::galah_filter(...)` must be numeric or character, not a <tbl_df/tbl/data.frame> object.
Run `rlang::last_trace()` to see where the error occurred.

I've tried to do some digging in my reprex below:

galah version

galah_1.5.3

To Reproduce

# Calling galah functions via namespace as you would in writing a wrapper function for an R package

query <- function(taxa, years){
  galah::galah_call() |> 
    galah::galah_identify(taxa) |> 
    galah::galah_filter(
      spatiallyValid == TRUE, 
      species != "",
      decimalLatitude != "",
      year == years,
      basisOfRecord == c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN")
    ) |> 
    galah::galah_select(
      recordID, species, genus, family, decimalLatitude, decimalLongitude, 
      coordinateUncertaintyInMeters, eventDate, datasetName, basisOfRecord, 
      references, institutionCode, recordedBy, outlierLayerCount, isDuplicateOf,sounds
    )
}

# Set inputs
taxa = "Orthoptera"
years = seq(1923, 2023)

query(taxa, years)
#> Error in `FUN()`:
#> ! Can't subset columns with `galah::galah_filter(...)`.
#> ✖ `galah::galah_filter(...)` must be numeric or character, not a <tbl_df/tbl/data.frame> object.
#> Backtrace:
#>      ▆
#>   1. ├─global query(taxa, years)
#>   2. │ └─galah::galah_select(...)
#>   3. │   └─galah:::parse_select(dots, group_chosen)
#>   4. │     ├─base::unlist(...)
#>   5. │     └─base::lapply(...)
#>   6. │       └─galah (local) FUN(X[[i]], ...)
#>   7. │         └─tidyselect::eval_select(a, data = df)
#>   8. │           └─tidyselect:::eval_select_impl(...)
#>   9. │             ├─tidyselect:::with_subscript_errors(...)
#>  10. │             │ └─rlang::try_fetch(...)
#>  11. │             │   └─base::withCallingHandlers(...)
#>  12. │             └─tidyselect:::vars_select_eval(...)
#>  13. │               └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
#>  14. │                 └─tidyselect:::as_indices_sel_impl(...)
#>  15. │                   └─tidyselect:::as_indices_impl(...)
#>  16. │                     └─vctrs::vec_as_subscript(x, logical = "error", call = call, arg = arg)
#>  17. └─rlang::cnd_signal(x)

# Break up the galah query, each working independently
# Call
galah::galah_call() |> 
  galah::galah_identify(taxa) 
#> An object of type `data_request` containing:
#> 
#> $identify
#> # A tibble: 1 × 1
#>   identifier                                                               
#>   <chr>                                                                    
#> 1 https://biodiversity.org.au/afd/taxa/0192736e-0955-4830-9977-61e07c843b28

# Filter
  galah::galah_filter(
    spatiallyValid == TRUE, 
    species != "",
    decimalLatitude != "",
    year == years,
    basisOfRecord == c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN")
  ) 
#> # A tibble: 5 × 4
#>   variable        logical value                                            query
#>   <chr>           <chr>   <chr>                                            <chr>
#> 1 spatiallyValid  ==      "TRUE"                                           "(sp…
#> 2 species         !=      "\"\""                                           "(sp…
#> 3 decimalLatitude !=      "\"\""                                           "(de…
#> 4 year            ==      "c(\\\"1923\\\", \\\"1924\\\", \\\"1925\\\", \\… "(ye…
#> 5 basisOfRecord   ==      "c(\\\"HUMAN_OBSERVATION\\\", \\\"PRESERVED_SPE… "(ba…
  
 # Select 
  galah::galah_select(
    recordID, species, genus, family, decimalLatitude, decimalLongitude, 
    coordinateUncertaintyInMeters, eventDate, datasetName, basisOfRecord, 
    references, institutionCode, recordedBy, outlierLayerCount, isDuplicateOf,sounds
  )
#> # A tibble: 16 × 2
#>    name                          type 
#>    <chr>                         <chr>
#>  1 recordID                      field
#>  2 species                       field
#>  3 genus                         field
#>  4 family                        field
#>  5 decimalLatitude               field
#>  6 decimalLongitude              field
#>  7 coordinateUncertaintyInMeters field
#>  8 eventDate                     field
#>  9 datasetName                   field
#> 10 basisOfRecord                 field
#> 11 references                    field
#> 12 institutionCode               field
#> 13 recordedBy                    field
#> 14 outlierLayerCount             field
#> 15 isDuplicateOf                 field
#> 16 sounds                        field
  
# Start joining the different parts together
# Identify + filter
# Missing identifer info and data_request structure
# Identify + filter
galah::galah_call() |> 
  galah::galah_identify(taxa)  |> 
  galah::galah_filter(
    spatiallyValid == TRUE, 
    species != "",
    decimalLatitude != "",
    year == years,
    basisOfRecord == c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN")
  )  
#> # A tibble: 5 × 4
#>   variable        logical value                                            query
#>   <chr>           <chr>   <chr>                                            <chr>
#> 1 spatiallyValid  ==      "TRUE"                                           "(sp…
#> 2 species         !=      "\"\""                                           "(sp…
#> 3 decimalLatitude !=      "\"\""                                           "(de…
#> 4 year            ==      "c(\\\"1923\\\", \\\"1924\\\", \\\"1925\\\", \\… "(ye…
#> 5 basisOfRecord   ==      "c(\\\"HUMAN_OBSERVATION\\\", \\\"PRESERVED_SPE… "(ba…

# Filter + select
    galah::galah_filter(
      spatiallyValid == TRUE, 
      species != "",
      decimalLatitude != "",
      year == years,
      basisOfRecord == c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN")
    )  |> 
      galah::galah_select(
        recordID, species, genus, family, decimalLatitude, decimalLongitude, 
        coordinateUncertaintyInMeters, eventDate, datasetName, basisOfRecord, 
        references, institutionCode, recordedBy, outlierLayerCount, isDuplicateOf,sounds
      )
#> Error in `FUN()`:
#> ! Can't subset columns with `galah::galah_filter(...)`.
#> ✖ `galah::galah_filter(...)` must be numeric or character, not a <tbl_df/tbl/data.frame> object.
#> Backtrace:
#>      ▆
#>   1. ├─galah::galah_select(...)
#>   2. │ └─galah:::parse_select(dots, group_chosen)
#>   3. │   ├─base::unlist(...)
#>   4. │   └─base::lapply(...)
#>   5. │     └─galah (local) FUN(X[[i]], ...)
#>   6. │       └─tidyselect::eval_select(a, data = df)
#>   7. │         └─tidyselect:::eval_select_impl(...)
#>   8. │           ├─tidyselect:::with_subscript_errors(...)
#>   9. │           │ └─rlang::try_fetch(...)
#>  10. │           │   └─base::withCallingHandlers(...)
#>  11. │           └─tidyselect:::vars_select_eval(...)
#>  12. │             └─tidyselect:::walk_data_tree(expr, data_mask, context_mask)
#>  13. │               └─tidyselect:::as_indices_sel_impl(...)
#>  14. │                 └─tidyselect:::as_indices_impl(...)
#>  15. │                   └─vctrs::vec_as_subscript(x, logical = "error", call = call, arg = arg)
#>  16. └─rlang::cnd_signal(x)

Created on 2023-09-21 with reprex v2.0.2

Expected behaviour

library(galah)
#> 
#> Attaching package: 'galah'
#> The following object is masked from 'package:stats':
#> 
#>     filter

# As is query, calling functions as code and not placed in function
galah_call() |>                               
  galah_identify("Orthoptera") |>   
  galah_filter(
    spatiallyValid == TRUE,
    species != "",
    decimalLatitude != "",
    year == seq(1923, 2023),
    basisOfRecord == c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN")
  ) |> 
  galah_select(
    recordID, species, genus, family, decimalLatitude, decimalLongitude, 
    coordinateUncertaintyInMeters, eventDate, datasetName, basisOfRecord, 
    references, institutionCode, recordedBy, outlierLayerCount, isDuplicateOf,sounds
  )
#> An object of type `data_request` containing:
#> 
#> $identify
#> # A tibble: 1 × 1
#>   identifier                                                               
#>   <chr>                                                                    
#> 1 https://biodiversity.org.au/afd/taxa/0192736e-0955-4830-9977-61e07c843b28
#> 
#> $select
#> # A tibble: 16 × 2
#>    name                          type 
#>    <chr>                         <chr>
#>  1 recordID                      field
#>  2 species                       field
#>  3 genus                         field
#>  4 family                        field
#>  5 decimalLatitude               field
#>  6 decimalLongitude              field
#>  7 coordinateUncertaintyInMeters field
#>  8 eventDate                     field
#>  9 datasetName                   field
#> 10 basisOfRecord                 field
#> 11 references                    field
#> 12 institutionCode               field
#> 13 recordedBy                    field
#> 14 outlierLayerCount             field
#> 15 isDuplicateOf                 field
#> 16 sounds                        field
#> 
#> $filter
#> # A tibble: 5 × 4
#>   variable        logical value                                            query
#>   <chr>           <chr>   <chr>                                            <chr>
#> 1 spatiallyValid  ==      "TRUE"                                           "(sp…
#> 2 species         !=      "\"\""                                           "(sp…
#> 3 decimalLatitude !=      "\"\""                                           "(de…
#> 4 year            ==      "c(\\\"1923\\\", \\\"1924\\\", \\\"1925\\\", \\… "(ye…
#> 5 basisOfRecord   ==      "c(\\\"HUMAN_OBSERVATION\\\", \\\"PRESERVED_SPE… "(ba…

Additional context
it seems like the data_request object is not being passed from galah_identify to galah_filter and galah_select.
I've cross posted this issue in my own repo here

@fontikar fontikar added the bug Something isn't working label Sep 21, 2023
@fontikar
Copy link
Author

My current work-around (inspired by @shandiya) is manually joining galah_ query "chunks" as a list

# Create my own data_request object
create_data_request <- function(taxa, years){
  identify <- galah::galah_call() |> 
    galah::galah_identify(taxa)
  
  filter <- galah::galah_filter(
    spatiallyValid == TRUE, 
    species != "",
    decimalLatitude != "",
    year == years,
    basisOfRecord == c("HUMAN_OBSERVATION", "PRESERVED_SPECIMEN")
  )
  
  select <- galah::galah_select(
    recordID, species, genus, family, decimalLatitude, decimalLongitude, 
    coordinateUncertaintyInMeters, eventDate, datasetName, basisOfRecord, 
    references, institutionCode, recordedBy, outlierLayerCount, isDuplicateOf,sounds
  )
  
  identify$filter <- filter
  identify$select <- select
  
  return(identify)
}

# Set inputs
taxa = "Orthoptera"
years = seq(1923, 2023)

create_data_request(taxa, years) |> galah::atlas_counts()
#> # A tibble: 1 × 1
#>   count
#>   <int>
#> 1 55936

Created on 2023-09-21 with reprex v2.0.2

@mjwestgate
Copy link
Collaborator

Hi Fonti! Thanks heaps for this, and good to hear that galah is proving useful! We're working on the next release right now, and I think this has been solved already. That said, the next version isn't ready yet, so I don't have a fix that I can point you to immediately. Instead I'll put this on our work list and ping here with a solution once it's ready.

re: timelines, we should have something for you to try next week, and are aiming for release in a month or so. M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants