-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy in MVPA between different input methods #1178
Comments
I assumed that GGIR uses the same code for both data formats.
Note that saving numeric data to csv can in itself introduce some tiny rounding errors. |
@pinweichen have you tried running one of the data files where you see this issue through the latest version of GGIR? I know you said you need GGIR 3.0-0 for your project, but there have been a few changes to raw data handling since last October when that version came out. |
If you insist on using an older version of GGIR then you also have to life with all the bugs and inconsistencies it had. If you want the issue to be fixed use the latest GGIR version. If the issue is still present in the latest GGIR version then please clarify as this is currently unclear to me. |
Hi Vicent and Lena, I adapted my custom csv file to the current version of read.myacc.csv function in the GGIR (v. 3.1-2). With no modification on the package, I ran my csv into the GGIR and compared it with the gt3x ran. However, the part 1 run became much slower and the same problem persists in which the sphere data calibration used much more data than when reading gt3x. And the MVPA is incorrect compared to the gt3x version. In addition, the new result has extended data that is longer than the original data. The results showed an imputation was done while rmc.imputegaps = FALSE and rmc.doresample = FALSE. This additional data error only exists in GGIR 3.1-2 but not in GGIR 3.0-0 when the same data was run. Here is an example of the data (attached) that I fed into GGIR. The data is usually 7 days or 14 days long. I would like to check with you if these header settings and the format are correct. I'm happy to provide the original data if that helps debug this. Here are the parameter settings: do.cal = T, ------------Custom csv settings------------rmc.firstrow.header = 1, ------------strategy = 1, def.noc.sleep = 1, threshold.lig = c(30), threshold.mod = c(100), threshold.vig = c(400), Visual report#===================== Here is the GGIR 3.1-2 gt3x original report. Thank you again for your help. Best Regards, |
Hi Benny, I now see this issue is still open. Can you share the actual csv and gt3x file with me? |
Yes, I can. Sorry for the delay. I will send it to your dropbox if that works. |
Thanks for sharing the file. It seems you are are not storing the data in a format function Please always first check that
If you want me to investigate further, then I am happy to do that as a paid consultancy as this goes beyond simple bug fixing and user support that I try to do for free (it is free software but I need to pay my bills at the end of the month). |
Hi Dr. van Hees,
Thank you for taking the time to examine this and sorry for this delayed
reply.
1) Regarding the timestamps, I forgot to mention that I set the parameter
rmc.format.time = "%y-%m-%dT%H:%M:%OS%z" and was able to run that. However,
I also tried your explanation in the document and could run GGIR with no
problem.
2) For the timezone, I tried "America/New_York", "EST", and "EST5EDT" and
it seemed to run just fine.
3) There are 2 reasons why I'm testing the gt3x to CSV file.
First, the recent update on the ActiGraph CenterPoint API system allowed us
to download raw CSV files (not gt3x) one day per file. However, these
files can't measure sleep metrics properly in GGIR because they were cut
into one day per file. We can't load them into the GGIR without combining
multiple files into one (usually 14 files for 14 days). Because I'm
modifying the header, I want to make sure the result I loaded as a CSV file
will be the same as the gt3x file. Hence, I created this experiment. I
wonder if this is something that GGIR will adapt the latest CenterPoint raw
csv (one day per file), I can provide some example data.
Second, Jonathan Mitchell from UPenn and me at Children's Hospital of
Philadephia are creating an internal actigraphy platform for the
researchers at our hospital. We are loading GGIR onto the platform as the
go-to algorithm for the researchers. However, we wish to make this platform
modular so that we can incorporate other algorithms such as some machine
learning sleep metrics or MIMS units. We are trying to understand the
preprocessing steps and how they can be varied so we have some standard
processing across different algorithms. If this platform setup goes well,
we will encourage more healthcare researchers to use GGIR for their
actigraphy analysis. In the future, we will submit grants to NIH for the
support of this work with you as the paid consultant if you are interested.
Thank you again for your help.
Best,
Benny Pin-Wei Chen
…On Wed, Oct 2, 2024 at 5:35 AM Vincent van Hees ***@***.***> wrote:
Thanks for sharing the file. It seems you are are not storing the data in
a format function read.myacc.csv can understand and also the timezone
specification is not correct.
Please always first check that read.myacc.csv on its own generates
correct output. In this
<https://github.com/wadpac/GGIR/blob/master/tests/testthat/test_read.myacc.csv.R>
and in the documentation
<https://wadpac.github.io/GGIR/articles/readmyacccsv.html#usage-of-the-read-myacc-csv-function>
you will find examples. There is no point in comparing MVPA estimates and
all the other aspects of the GGIR output without first reviewing this
initial file reading step.
-
Your timestamps are in format 2023-09-06T08:00:00.000-0400 while the
documentation does not indicate that this format can be read by
read.myacc.csv. I have tried using your parameter set as input to
read.myacc.csv and for me the timestamps in the output do not match
the timestamps in the data. If you want this timestamp format to be
recognisable then I am happy to work on that as a paid consultancy.
-
You are specifying rmc.desiredtz while the function gives a warning
that this argument will be deprecated, please use desiredtz. It looks
like this needs to be updated in the documentation. Further, you are
setting rmc.desiredtz to "EST5EDT" which is not a valid timezone
definition, see documentation and examples for valid values. In your case
it may need to be "America/New_York" or similar, because that allows GGIR
to automatically account for DST.
-
Further, I am still not sure I understand why you want to convert gt3x
to csv. You only risk inconsistencies without a clear advantage for
research. For deriving calibration coefficients GGIR does not use a
specific standardised amount of data across data formats as we define data
volumes in different ways across formats (rows, pages, seconds, blocks),
which will explain some (minor) differences in derived calibration values
and by that in all acceleration values. It could be standardised but I
never did as this would be extra work without a clear advantage for
research. So, if you want to compare and take out this influence then maybe
turn calibration off with do.cal = FALSE.
If you want me to investigate further, then I am happy to do that as a
paid consultancy as this goes beyond simple bug fixing and user support
that I try to do for free (it is free software but I need to pay my bills
at the end of the month).
—
Reply to this email directly, view it on GitHub
<#1178 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFJBXNIAJ246UQ7CL4ZGCATZZO45JAVCNFSM6AAAAABLYZ4AQKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOBYGA2TCNZWGY>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
In relation to the split recordings, did you consider parameter maxrecordinginterval? This was implemented for exactly the kind of scenario you describe where shorter recordings need to be appended. Sounds good about the platform. Let me know if you run into any issues. I am generally very happy to accommodate other algorithms and research interests where possible. The difficulty I had with MIMSunit was its speed, which I tried to work on for a short period mHealthGroup/MIMSunit#36 but with extra efforts by the community it may be possible to make it sufficiently fast to be used more widely. |
Thank you, Dr. van Hees. I was able to use maxrecordinginterval for data
directly downloaded from CenterPoint UI system, which still contains the
ActiGraph header. But those downloading from CenterPoint API usually don't
have a header. I was able to use rmc settings we discussed previously to
read them into GGIR after I combined them. I will conduct another round of
tests to make sure the two download methods do not produce different
results. I can keep you updated on this if you are interested.
For the MIMSunit adaptation, we were trying to keep GGIR analysis and the
MIMSunit algorithm parallel. This is because MIMSunit usually pairs with
SWaN software to label sleep, wake, and nonwear. In your MIMS unit
adaptation, did you keep the non-wear and sleep algorithm which is based on
the ENMO units separate from the MIMS unit calculation? If our platform
wants to make them interchangeable, I was wondering if your team has made
some progress already. I'm happy to contribute to this part of the work if
you allow it.
Best,
Benny
…On Thu, Oct 24, 2024 at 2:16 AM Vincent van Hees ***@***.***> wrote:
In relation to the split recordings, did you consider parameter
maxrecordinginterval
<https://wadpac.github.io/GGIR/articles/GGIRParameters.html#maxrecordinginterval>?
This was implemented for exactly the kind of scenario you describe where
shorter recordings need to be appended.
Sounds good about the platform. Let me know if you run into any issues. I
am generally very happy to accommodate other algorithms and research
interests where possible. The difficulty I had with MIMSunit was its speed,
which I tried to work on mHealthGroup/MIMSunit#36
<mHealthGroup/MIMSunit#36> but with extra
efforts by the community it may be possible to make it sufficiently fast to
be used more widely.
Best, Vincent
—
Reply to this email directly, view it on GitHub
<#1178 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFJBXNMB6IHGHYUJL4WNFHTZ5CGD3AVCNFSM6AAAAABLYZ4AQKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZUGM4DSNJQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I only focussed on trying to speed up the MIMS units calculation outside GGIR, which happens based on raw data only and does not involve sleep or nonwear detection. Please note that GGIR does not use ENMO for sleep or nonwear detection.
Just to clarify that I work alone as freelancer. GGIR as open-source project benefits from other contributors, but I do not think I can call them my team. I have not looked at MIMSunit since that exchange in 2021, but one group reached out to me recently to ask for the possibility to hire me to pick up this work. I have given them a quote and they will try to get funding for it. |
Thank you, Dr. van Hees. Do you mind answering a few questions on sleep
detection and non-wear detection? We are currently writing a GGIR-based
sleep algorithms validation paper (reference with PSG). I would like to
clarify a few things about your comment that "GGIR does not use ENMO for
sleep or nonwear detection."
(1) Epoch-by-Epoch comparison between GGIR-based sleep algorithms (i.e.,
vanHees2015, Cole-Kripke, Sadeh) and PSG-based scoring.
For this purpose, I used the sleep and wake label from sib.cla.sum from
ms3.out folder and compared it with the 30-second PSG data. Is this proper
data to extract epoch-by-epoch GGIR data? Does this data take into account
the Sustained Inactivity Bout Detection?
(2) We know that non-wear and sleep were labeled separately. However, in
g.sib.deb function, non-wear labels were used. If a non-wear time is
between sleep onset and wake onset is detected, will the non-wear time be
removed from the final sleep estimates in part 4 reports such as total
sleep time and WASO. If so, can we use ignorenonwear function to
recalculate the sleep time without being affected by the nonwear label?
(3) We tried to create a diagram to explain the GGIR sleep and non-wear
process to the readers. We want to make sure we are describing the GGIR
correctly. We are wondering if you can provide comments on the figure
below. Specifically, does the sleep algorithms like CK or your 2015
algorithm only search for sleep-wake within the Sustained Inactivity Bouts
(sleep period: red dotted lines)? The paper is still in production and
welcome any comments.
[image: Slide1.jpeg]
Thank you.
Best,
Benny
…On Fri, Nov 8, 2024 at 3:18 AM Vincent van Hees ***@***.***> wrote:
In your MIMS unit adaptation, did you keep the non-wear and sleep
algorithm which is based on
the ENMO units separate from the MIMS unit calculation?
I only focussed on trying to speed up the MIMS units calculation outside
GGIR, which happens based on raw data only and does not involve sleep or
nonwear detection. Please note that GGIR does not use ENMO for sleep or
nonwear detection.
I was wondering if your team has made some progress already.
Just to clarify that I work alone as freelancer. GGIR as open-source
project benefits from other contributors, but I do not think I can call
them my team. I have not looked at MIMSunit since that exchange in 2021,
but one group reached out to me recently to ask for the possibility to hire
me to pick up this work. I have given them a quote and they will try to get
funding for it.
—
Reply to this email directly, view it on GitHub
<#1178 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFJBXNNGLCXIO2Z5QVK2OADZ7RXWNAVCNFSM6AAAAABLYZ4AQKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRUGA4DCMJZGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
You will not find sleep classification in ms3.out folder, because sleep / daytime rest distinction only happens in part 4. Part 3 output only has the sustained inactivity bouts (rest periods). Part 5 has the option to export the time series with all information (sleep/nonwear) included, and these time series are consistent with the classifications used in part 4 and 5. I am overwhelmed at the moment by people reaching out to me to ask for free help and advice with their research. GGIR is open-source and I cannot be taking responsibility for everyone's problems without payment. Happy to help you as part of a paid consultancy. Alternatively, see https://wadpac.github.io/GGIR/ for elaborate documentation on how GGIR works. |
Hi Dr. van Hees,
Thank you for your help. We'd like to acquire your consultation service.
Would you mind providing an estimate for the following scope?
The scope of this service will focus on understanding the GGIR methods for
our upcoming publication. We want to make sure that we are describing the
GGIR method correctly as well as asking some questions regarding the
parameter settings. We also hope that you can review a figure we create for
GGIR methods.
Thank you.
Best,
Benny
…On Mon, Nov 11, 2024 at 4:42 AM Vincent van Hees ***@***.***> wrote:
You will not find sleep classification in ms3.out folder, because sleep /
daytime rest distinction only happens in part 4. Part 3 output only has the
sustained inactivity bouts (rest periods).
Part 5 has the option to export the time series with all information
(sleep/nonwear) included, and these time series are consistent with the
classifications used in part 4 and 5.
I am overwhelmed at the moment by people reaching out to me to ask for
free help and advice with their research. GGIR is open-source and I cannot
be taking responsibility for everyone's problems without payment. Happy to
help you as part of a paid consultancy. Alternatively, see
https://wadpac.github.io/GGIR/ for elaborate documentation on how GGIR
works.
—
Reply to this email directly, view it on GitHub
<#1178 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AFJBXNJLSKZ32GOC5AAL7BL2AB3YXAVCNFSM6AAAAABLYZ4AQKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINRXGY3TMOJQGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hi there,
I was testing the read.my.acc utilization of the GGIR and noticed that there is a discrepancy in the result of the same file if I input it as gt3x or as a customized csv. I am building a customized pipeline to organize data from different actigraphy devices with similar preprocessing steps (e.g., resampling, impute, or calibration).
Here are my two comparisons:
(1) I input gt3x files directly into GGIR and get one result.
(2) I converted a gt3x file into a CSV using read.gt3x::read.gt3x. Then I place a one-line header and include information needed for the "rmc." function. I was able to run through the GGIR and obtained a result.
However, I noticed there is a discrepancy in MVPA values between different input methods. The intensity is larger when I specify the rmc.doresample = T and rmc.check4timegaps = T. If I use an approx function to custom impute all time gaps, and turn off rmc.doresample and rmc.check4timegaps, the intensity becomes way too small that in part 2 there is barely any MVPA.
My question is if there are any calibration or normalization steps that I'm missing that exist for gt3x but not for the customized CSV files. I know there is a CSV input for ActiGraph data. However, I would like to standardize some preprocessing for all data from different actigraphy brands. I want to use the custom CSV input for the GGIR and produce similar results as if I input gt3x directly.
I borrowed the debug issue format here.
To Reproduce
version of GGIR (3.0-0). We started the project when this version was available.
Sensor brand: ActiGraph
Data format: customized CSV with imputation steps using approx function
Approximate recording duration 7 days
Are you using a sleep diary to guide sleep detection: NO
Copy of R command used:
I customized some parts of the rmc. functions to fit the header name reading of my customized csv.
rmc.firstrow.header = 1,
rmc.header.length = 1,
rmc.firstrow.acc = 2, # first row is header
rmc.col.time = 1,
rmc.col.acc = 2:4,
rmc.unit.time = "UNIXsec",
rmc.headername.sf = "Sampling_frequency",
rmc.headername.sn = "sensor_type",
rmc.headername.recordingid = "filename",
rmc.header.structure = "std",
rmc.doresample = T,
rmc.check4timegaps = T
Have you tried processing your data based on GGIR's default argument values? Does the issue you report still appear?
Yes, I have. The file can run. The results are different.
Expected behavior
I'm hoping to have the same GGIR results between the gt3x direct input and the customized csv of the same file.
I provide config files if that helps.
The original gt3x input
config_direct_input.csv
The customized csv with all time gaps imputed
config_custom_impute.csv
The customized csv without customized impute but turned on rmc.doresample and rmc.check4timegaps.
config_custom_csv_resample_on_timegap_on.csv
I can also provide example data and output folders if you need them.
Desktop (please complete the following information):
Thank you very much.
The text was updated successfully, but these errors were encountered: