Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance messy_linelist() #199

Merged
merged 8 commits into from
Feb 21, 2025
Merged

Enhance messy_linelist() #199

merged 8 commits into from
Feb 21, 2025

Conversation

joshwlambert
Copy link
Member

This PR closes #191 and closes #192 by enhancing the messy_linelist() function.

The prop_missing setting in messy_linelist() now introduces missing values to <data.frame> cells not already NA, if the missing_value specified is NA (default). This replaces the random placement of missing values in the previous implementation of messy_linelist().

The .add_missing() function is added which contains a more sophisticated approach to inserting user-specified missing values (missing_value) into the line list <data.frame>.

  • It samples elements of the <data.frame> to make missing depending on the missing_value and doesn't sample NA elements if the missing_value is NA.
  • It performs type coercion in cases where the type of the missing_value and the type of the <data.frame> column differ to avoid unwanted coercions, e.g. numeric to Date.
  • It records the type coercions and warns the user

int_as_word setting in messy_linelist() has been updated to prop_int_as_word to allow users to control the proportion of integer values that are convert to words using english::words().

inconsistent_id is added as a setting to messy_linelist(), which by default is off (FALSE), but when switched on appends random three letter prefixes or suffixes to a random ~10% sample of $ids.

Input checking for prop_missing and missing_value is added to messy_linelist().

New unit tests are added to test the behaviour of missing_value in messy_linelist() since adding the internal .add_missing() function.

@joshwlambert joshwlambert added the enhancement New feature or request label Feb 21, 2025
@joshwlambert joshwlambert merged commit 18e1e89 into main Feb 21, 2025
9 checks passed
@joshwlambert joshwlambert deleted the updt-messy branch February 21, 2025 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
1 participant