Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More careful handling of NA in tf #5

Closed
fabian-s opened this issue Dec 23, 2018 · 11 comments
Closed

More careful handling of NA in tf #5

fabian-s opened this issue Dec 23, 2018 · 11 comments
Labels
help wanted Extra attention is needed priority question Further information is requested

Comments

@fabian-s
Copy link
Contributor

Things to think about:

  • can regular tfd's consist of all NAs (cf. zoom.tfd if you zoom between two arg-values)? -> currently not
  • should it be possible to specifically define NA-values for regions / args where measurements are known to be invalid/ not applicable? -> currently not possible, can be solved by using suitable evaluators that don't interpolate / yield NAs there.

also: needs explicit documentation.

@fabian-s fabian-s added help wanted Extra attention is needed question Further information is requested labels Dec 23, 2018
@fabian-s
Copy link
Contributor Author

> mean(data_irreg)
tfd[1] on (1,100) based on Inf to -Inf (mean: NaN) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: NA

this really ugly -- should always keep domain etc.

@fabian-s fabian-s transferred this issue from tidyfun/tidyfun May 10, 2022
@fabian-s
Copy link
Contributor Author

  • would boil down to allowing different domains in a tf vector -- don'T want that

  • need more documentation / tests

@fabian-s
Copy link
Contributor Author

  • what actually should mean/sd etc do for irregular inputs?
> f <- tf_sparsify(tf_rgp(4))
> f
tfd[4] on (0,1) based on 24 to 29 (mean: 26) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (0.00, 0.68);(0.02, 0.81);(0.04, 0.95); ...
[2]: (0.00, -2.4);(0.04, -2.6);(0.08, -2.8); ...
[3]: (0.00, 0.79);(0.02, 0.82);(0.06, 0.68); ...
[4]: (0.02,-0.84);(0.04,-0.76);(0.06,-0.70); ...
> mean(f)
tfd[1] on (0,1) based on 5 to 5 (mean: 5) evaluations each
inter-/extrapolation by tf_approx_linear 
[1]: (0.08,-0.41);(0.18,-0.39);(0.54, 0.13); ...
  • check how irregular the grids are & warn if pointwise operations are a bad idea
  • message about doing interpolation to common grid first

@fabian-s
Copy link
Contributor Author

fabian-s commented Jan 8, 2024

also this:

library(tf)
#> 
#> Attaching package: 'tf'
#> The following objects are masked from 'package:stats':
#> 
#>     sd, var

d = data.frame(time = 1, value = NA_real_, id = "1")

x = tfd(d, arg = "time", value = "value", id = "id")

x
#> tfd[0] on (NA,NA)
#> Error in attr(f, "arg")[[1]]: subscript out of bounds

@fabian-s fabian-s added this to the put it on CRAN milestone Jan 8, 2024
@fabian-s
Copy link
Contributor Author

the current implementation (branch 5-NAhandling @ c7e351cf) takes care of these, mostly by being more careful about when to return an "empty prototype" and what kind, see also the comment on #33

@jeff-goldsmith: similar request as in the other issue: have you come across other problems in this vein? if not, inclined to close this as done for now.

@jeff-goldsmith
Copy link
Contributor

I think the three scenarios I mentioned in the other issue would apply here (irregular device weartimes -> some overlap; irregular sampling times in ambulatory BP -> not a lot of overlap; data after registration -> no overlap).

Is message to users effectively "we're not going to assume a particular model or approach if you have irregular data"? That's probably reasonable and fair, although it might be frustrating in some cases...

@fabian-s
Copy link
Contributor Author

Is message to users effectively "we're not going to assume a particular model or approach if you have irregular data"? That's probably reasonable and fair, although it might be frustrating in some cases...

I think so, yes. let's see if we can come up with ideas on what could make it less frustrating without making too many assumptions about how to inter/extrapolate irregular data..?

@fabian-s
Copy link
Contributor Author

fabian-s commented Feb 22, 2024

these also seem somewhat suboptimal:

> x <- tf_rgp(2, arg = seq(0, 1, length.out = 11))
> (x*NA)[1]
tfd[1] on (0,1) based on 11 evaluations each
interpolation by tf_approx_linear 
1: NA
> c(x, NA)[3]
tfd[1] on (0,1) based on 11 evaluations each
interpolation by tf_approx_linear 
[1]: (0.0,NULL);(0.1,NULL);(0.2,NULL); ...
> str(c(NA, x))
List of 3
 $  : logi NA
 $ 1: num [1:11] 4.156 3.667 2.643 1.558 0.736 ...
 $ 2: num [1:11] -0.0453 -0.389 -0.6657 -0.91 -0.7647 ...

also related to / affected by #77

@fabian-s
Copy link
Contributor Author

re #77: see vctrs::vec_detect_missing(), vec_any_missing()

@fabian-s
Copy link
Contributor Author

NB: using vec_c instead of c gives the desired behavior for c(NA, x)etc ...

@fabian-s
Copy link
Contributor Author

most of this seems fixed to me, will open issues for specific edge cases as they come up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed priority question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants