Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate Medicare Hierarchical Condition Categories HCC #87

Merged
merged 24 commits into from
Aug 31, 2016
Merged

Calculate Medicare Hierarchical Condition Categories HCC #87

merged 24 commits into from
Aug 31, 2016

Conversation

anobel
Copy link
Collaborator

@anobel anobel commented Apr 27, 2016

addresses issue #31

New Function

icd_comorbid_cc : assigns CCs and then applies hierarchy to generate HCCs. Slightly different than the current icd_comorbid functions in a few ways.

  • Uses crosswalk stored in a dataframe, as opposed to the generated comorbidity list mappings.
  • The imported dataframe has columns for year and icd version, as the mappings change over time. The function therefore needs a date input as well.
  • The CMS specification lists out each ICD, instead of the parent codes, so no expansion is done. I have not checked if all the children of each code are present, so I'm not sure it is wise to condense the ICD children to parents and back

New Data

Because we can't directly download the CMS data (stored in nested zip files, would be a headache to navigate through from within the function), saved the relevant files (crosswalks and SAS code) as plain text and parse it into data frames for use. The raw data, code for parsing 'parse-hcc.R', and final .RData objects all there. This is a also a bit different than the approach used for the other mappings, so I kept it separate from the parse-comorbid.R file.

To Do

  • Labels for HCCs have been included in the mappingNames.R file, but have not been assigned to any HCCs yet. Also need to manually create abbreviations (there are 189 HCCs!)

@jackwasey
Copy link
Owner

Thanks for your significant contribution. I'll certainly add you as an author.

We'll have to fix up a few things: firstly it will need some tests before I merge it. Also, I've been very strict with myself regarding code formatting, hence the automated lintr comments, and I think being brutally consistent has helped keep the code clear: this is why the Wercker build fails. In Rstudio, select all, then ctrl-shift-A will probably get you most of the way there. ctrl-alt-shift-D will give formatting diagnostics which are similar to the lintr output. devotols::lint() replicates what Wercker does.

The raw data text files should go in inst/data-raw and these are then included in the github repo, but excluded from the built package. The processed data then goes into data, as you did. The call to generate the data then goes in icd:::update_everything.

The travis build fails with an error in the examples:
"Error in as.Date(x[[date]]) : object 'admitdate' not found"

I haven't yet dug into your code in detail, but I did notice that you had combined the ICD-9 and ICD-10 data using an additional column containing 9 or 10. To be consistent with the rest of the package, I would separate these out into two data structures named icd9_... and icd10_....

Thanks for your great work.

@anobel
Copy link
Collaborator Author

anobel commented Apr 28, 2016

will start working through these issues

@anobel
Copy link
Collaborator Author

anobel commented May 2, 2016

@jackwasey passed Wercker but I can't figure out why it failed travisCI....

@jackwasey
Copy link
Owner

I see this error buried in the logs from running the examples:

Error in icd_comorbid_hcc(pt, date) :

Assertion on 'x' failed: Must be of type 'data.frame', not 'closure'.

Calls: icd_comorbid_hcc ... icd_guess_version.data.frame ->
assert_data_frame -> makeAssertion -> mstop

This probably means it thought pt was a code block, not a data frame.
Will look into if you don't see the problem in the example. This
example would make a good test, since it already demonstrates a
failure, assuming the test itself is correct.

On Monday, May 2, 2016, anobel odisho notifications@github.com wrote:

@jackwasey https://github.com/jackwasey passed Wercker but I can't
figure out why it failed travisCI....


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#87 (comment)

@anobel
Copy link
Collaborator Author

anobel commented May 2, 2016

ah, simple fix. the name i used for the sample dataframe in the example (pt) is also a function in base r. renamed, will re-check travis

rename variables to match style
update example
@anobel
Copy link
Collaborator Author

anobel commented May 3, 2016

can't seem to see where travis is failing...

@jackwasey
Copy link
Owner

will take a look

On Mon, May 2, 2016 at 8:54 PM, anobel odisho notifications@github.com
wrote:

can't seem to see where travis is failing...


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#87 (comment)

@jackwasey
Copy link
Owner

I think it's just failing on warnings, but I don't immediately see why there are new warnings about undocumented objects, unless this is a co-incidental change in the R CMD check,

@jackwasey
Copy link
Owner

The wercker error means you need icd::icd_map_cc_hcc instead, just because it is not defined in an enclosing code block. Or you can write (outside any functions, I think) utils::globalVariables("icd_map_cc_hcc"), but the first method also makes Rstudio syntax checking happier.

R/comorbid.R:635:16: warning: no visible binding for global variable ‘icd_map_cc_hcc’
hierarchy <- icd_map_cc_hcc
^~~~~~~~~~~~~~
R/comorbid.R:711:16: warning: no visible binding for global variable ‘icd_map_cc_hcc’
hierarchy <- icd_map_cc_hcc
^~~~~~~~~~~~~~

@anobel
Copy link
Collaborator Author

anobel commented May 4, 2016

hopefully that should do it!

@jackwasey
Copy link
Owner

The travis automated build script was updated, and now includes a bug which affects us: it tried to install all dependencies and fails if it can't despite me setting _R_CHECK_FORCE_SUGGESTS_=FALSE. The travis update now fails if it didn't install everything. I'll file a bug and think about a work-around. There is now a new R release 3.3, which means we may have new CRAN check problems in 3.4 development version (which we have to pass to get any changes to CRAN).

@jackwasey
Copy link
Owner

I don't know the best way to do this, but I made some relatively minor changes to your excellent pull request. I did this by starting a new branch in icd called anobel-master which I have pushed to.

I do not know enough about HCC to really know whether it produces what is intended. I started an empty test script with the hope that you would write some basic tests.

I also re-based on master, so updates to travis should work.

In the example you add to icd_comorbid, The result seems to show the same code for every date/visit pair. I would have expected different codes in some cases. Should we sort the result by visit id? Or by date? Or let the user decide. I guess we implicitly sort by hcc now.

Now there is a date column, it is unclear whether we should check that the date (presumably the admission date) is the same for each identical visit ID.

It looks really good, and I think with some tests to catch basic simple use, and tricky cases, it can be merged. E.g. what happens if you give it a list of completely invalid ICD codes? What if there is only a single visit? What if there are valid ICD codes given but ones which are not used by the HCC. Mix of valid and invalid codes?

@anobel
Copy link
Collaborator Author

anobel commented May 17, 2016

Because I'm not assigned as a contributor, I can't push changes to any forks on your repo. So to make these changes, I cloned your fork, made changes, replaced my local files, and pushed back to my own repo (which updates the pull request).

I had a mistake in the icd_comorbid example, which is why it showed so many codes for each date/visit combo. That should be fixed (but I count do proper testing because of the weird forking situation).

Its OK for there to be different dates for each visit_id, this allows flexibility for the visit_id to actually refer to a patient ID. We only use the visit_date to extract the year and use the correct year ICD9/10. No need to check date for uniqueness.

I put in some basic tests. One feature of CC/HCCs is that they are only defined for a select number of ICDs, not all ICDs. therefore, if the ICD does not exist in the ICD-CC crosswalk, it just doesn't get assigned a CC/HCC, which is the expected functionality. If there are invalid ICDs, they just get ignored.

I hope that makes sense.

@jackwasey
Copy link
Owner

Makes sense, and it looks great. I don't have any time until at least
Friday or Saturday, but will take a good look as soon as I can. Thanks.

PS, if you didn't already, try and pull in the few updates I made to the
master branch, if you didn't already: I fixed a couple of travis and CRAN
check things a while ago. There is a git way to do this, something like
rebase or cherry pick, but I have to figure it out myself every time. There
still may be new changes that are needed on my side to get travis to
complete, as it seems to warn now on some things it used not to warn about,
and thus fails the build.

On Mon, May 16, 2016 at 9:34 PM, anobel odisho notifications@github.com
wrote:

Because I'm not assigned as a contributor, I can't push changes to any
forks on your repo. So to make these changes, I cloned your fork, made
changes, replaced my local files, and pushed back to my own repo (which
updates the pull request).

I had a mistake in the icd_comorbid example, which is why it showed so
many codes for each date/visit combo. That should be fixed (but I count do
proper testing because of the weird forking situation).

Its OK for there to be different dates for each visit_id, this allows
flexibility for the visit_id to actually refer to a patient ID. We only use
the visit_date to extract the year and use the correct year ICD9/10. No
need to check date for uniqueness.

I put in some basic tests. One feature of CC/HCCs is that they are only
defined for a select number of ICDs, not all ICDs. therefore, if the ICD
does not exist in the ICD-CC crosswalk, it just doesn't get assigned a
CC/HCC, which is the expected functionality. If there are invalid ICDs,
they just get ignored.

I hope that makes sense.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#87 (comment)

@anobel
Copy link
Collaborator Author

anobel commented May 18, 2016

although it may not be reflected in the build history, I copied the anobel-master branch you had made and built upon that (as of yesterday morning), so should be up to date

@anobel
Copy link
Collaborator Author

anobel commented Jun 15, 2016

just wanted to check in and see if you'd had a chance to review?

@jackwasey
Copy link
Owner

Absolutely haven't forgotten your work. Should have time to devote to it in
the next week or two.

On Wednesday, June 15, 2016, anobel odisho notifications@github.com wrote:

just wanted to check in and see if you'd had a chance to review?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AHeCzrigWcHbdvhP3IzZa-q-E_GrJlm3ks5qMFB0gaJpZM4IRaZ2
.

@anobel
Copy link
Collaborator Author

anobel commented Jun 16, 2016

no prob!

@jackwasey
Copy link
Owner

I fixed a minor documentation error you made, which prevented some other documentation compiling: this led to all the warnings which caused the travis failure. However, there still seems to be some data which is not documented, and thus causes warnings which will fail travis (and of course CRAN).

Undocumented data sets:
  ‘icd_map_cc_hcc’ ‘icd_names_cc’

I changed the name you used in the documentation from some other data, called icd_map_hcc to icd_map_cc which is what I saw you used in your code. This may be incorrect, but I'm not clear enough on the meaning of these different things yet to decide what you meant. Perhaps it should have ben icd_map_cc_hcc, but that would result in icd_map_cc not having documentation.

Nearly there!

@jackwasey
Copy link
Owner

jackwasey commented Jul 4, 2016

The only thing stopping this passing now is documenting. The wercker build is actually passing, not sure why it doesn't appear so in the section below.

checking for missing documentation entries ... WARNING
Undocumented code objects:
‘icd_map_cc_hcc’ ‘icd_names_cc’
Undocumented data sets:
‘icd_map_cc_hcc’ ‘icd_names_cc’

add aliases for icd_map_cc_hcc and icd_names_cc
@anobel
Copy link
Collaborator Author

anobel commented Aug 31, 2016

Sorry for the extended absence. this was a simple fix, had just left these out as aliases. have fixed this (hopefully). would love to get this PR completed!

@jackwasey
Copy link
Owner

Excellent, thanks. I'm just about to drop a CRAN version so that's perfect
timing.

On Tue, Aug 30, 2016 at 9:31 PM, anobel odisho notifications@github.com
wrote:

Sorry for the extended absence. this was a simple fix, had just left these
out as aliases. have fixed this (hopefully). would love to get this PR
completed!


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#87 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AHeCzntRZKAUhqAlDm05Ok9enFJkDvMtks5qlNl-gaJpZM4IRaZ2
.

@jackwasey jackwasey merged commit a537470 into jackwasey:master Aug 31, 2016
jackwasey pushed a commit that referenced this pull request Sep 1, 2016
@jackwasey
Copy link
Owner

One small thing to fix: I removed all stringr code from the package due its big dependency tree, and it is often slower than base R. I fixed the str_extract reference you had. The str_split_fixed still needs to be done. As I don't know the data format you are processing, it'll be much easier if you can replace this with some kind of vapply and strsplit using base R. I've checked in everything up to this point in the icd master branch. Would you be able to submit a pull request just to fix this issue? Once it is in, the package should compile and be ready for CRAN. Thanks!

@jackwasey
Copy link
Owner

Anobel,

I meant to credit you in the last CRAN release with your great HCC work. I'm sorry I didn't. If you want me to do this, please let me know your email address here or via my name at jackwasey .com

Best wishes,
Jack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants