Skip to content

Commit

Permalink
implements an option to preserve order of original list, rather than …
Browse files Browse the repository at this point in the history
…sort alphabetically
  • Loading branch information
sts10 committed Aug 6, 2022
1 parent 7130e05 commit 9c4f904
Show file tree
Hide file tree
Showing 6 changed files with 67 additions and 16 deletions.
18 changes: 17 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "tidy"
version = "0.2.22"
version = "0.2.30"
authors = ["sts10 <sschlinkert@gmail.com>"]
edition = "2021"

Expand All @@ -9,3 +9,4 @@ clap = { version = "3.0.14", features = ["derive"] }
memchr = "2.4"
radix_fmt = "1.0.0"
rand = "0.8.4"
itertools = "0.10.3"
10 changes: 9 additions & 1 deletion readme.markdown
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Given a text file with a word list, this tool will create a new word list in whi
- duplicate lines (words) are removed
- empty lines have been removed
- whitespace from beginning and end of words is deleted
- words are sorted alphabetically
- words are sorted alphabetically (though this can be optionally prevented -- see below)

and print that new word list to the terminal or to a new text file.

Expand All @@ -43,6 +43,8 @@ Optionally, the tool can...
- print corresponding dice rolls before words, separated by a tab. Dice can have 2 to 36 sides. (`-D`)
- print information about the new list, such as entropy per word, to the terminal (`-A`)

If you do NOT want Tidy to sort list alphabetically, you can use the `--no-alpha` option.

## Usage

```txt
Expand Down Expand Up @@ -122,6 +124,10 @@ OPTIONS:
Path for outputted list file. If none given, generated word list will be printed to
terminal
-O, --no-alpha
Do NOT sort outputted list alphabetically. Preserves original list order. Note that
duplicates lines and blank lines will still be removed
-P, --remove-prefix
Remove prefix words from new list
Expand Down Expand Up @@ -182,6 +188,8 @@ OPTIONS:

- `tidy -lPi -o new_list.txt inputted_word_list.txt` Same as above, but the added `-i` flag deletes any integers in words. Words with integers in them are not removed, only the integers within them. For example, "11326 agency" becomes "agency".

- `tidy -lPiO -o new_list.txt inputted_word_list.txt` Same as above, but the added `-O` flag preserves the original order of the list, rather than sort it alphabetically. Note that duplicates and blank lines are still removed.

- `tidy -I -o new_list.txt inputted_word_list.txt` Using the `-I` flag removes any words with integers from the list. For example, "hello1" would be removed from the list.

- `tidy -AA -I -o new_list.txt inputted_word_list.txt` Adding `-AA` prints some information about the created list to the terminal.
Expand Down
32 changes: 19 additions & 13 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ pub struct TidyRequest {
pub list: Vec<String>,
pub take_first: Option<usize>,
pub take_rand: Option<usize>,
pub sort_alphabetically: bool,
pub to_lowercase: bool,
pub should_straighten_quotes: bool,
pub should_remove_prefix_words: bool,
Expand Down Expand Up @@ -232,13 +233,13 @@ pub fn tidy_list(req: TidyRequest) -> Vec<String> {
None => tidied_list,
};
tidied_list = if req.should_remove_suffix_words {
remove_suffix_words(sort_and_dedup(&mut tidied_list))
remove_suffix_words(dedup_without_sorting(&mut tidied_list))
} else {
tidied_list
};

tidied_list = if req.should_remove_prefix_words {
remove_prefix_words(sort_and_dedup(&mut tidied_list))
remove_prefix_words(dedup_without_sorting(&mut tidied_list))
} else {
tidied_list
};
Expand All @@ -250,7 +251,12 @@ pub fn tidy_list(req: TidyRequest) -> Vec<String> {
};

// Sort and dedup here
tidied_list = sort_and_dedup(&mut tidied_list);
// tidied_list = sort_and_dedup(&mut tidied_list);

if req.sort_alphabetically {
tidied_list.sort();
}
tidied_list = dedup_without_sorting(&mut tidied_list);

// User can cut words from nearly finished list.
// Does so randomly.
Expand All @@ -264,7 +270,11 @@ pub fn tidy_list(req: TidyRequest) -> Vec<String> {
None => tidied_list,
};
// Finally, sort and dedup list (for final time)
tidied_list = sort_and_dedup(&mut tidied_list);
// tidied_list = sort_and_dedup(&mut tidied_list);
if req.sort_alphabetically {
tidied_list.sort();
}
tidied_list = dedup_without_sorting(&mut tidied_list);
tidied_list
}

Expand Down Expand Up @@ -433,15 +443,11 @@ fn straighten_quotes(input: &str) -> String {
result
}

/// Alphabetizes and de-duplicates a Vector of `String`s.
///
/// For Rust's [`dedup()`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.dedup)
/// function to remove all duplicates, the Vector needs to be
/// [`sort()`](https://doc.rust-lang.org/std/vec/struct.Vec.html#method.sort)ed first.
fn sort_and_dedup(list: &mut Vec<String>) -> Vec<String> {
list.sort();
list.dedup();
list.to_vec()
use itertools::Itertools;
/// De-duplicates a Vector of `String`s while maintaining list order.
fn dedup_without_sorting(list: &mut [String]) -> Vec<String> {
let dedup: Vec<String> = list.iter().unique().map(|s| s.to_string()).collect();
dedup.to_vec()
}

/// Remove prefix words from the given Vector of `String`s.
Expand Down
6 changes: 6 additions & 0 deletions src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,11 @@ struct Args {
#[clap(long = "samples")]
samples: bool,

/// Do NOT sort outputted list alphabetically. Preserves original list order.
/// Note that duplicates lines and blank lines will still be removed.
#[clap(short = 'O', long = "no-alpha")]
no_alpha: bool,

/// Lowercase all words on new list
#[clap(short = 'l', long = "lowercase")]
to_lowercase: bool,
Expand Down Expand Up @@ -218,6 +223,7 @@ fn main() {
list: make_vec_from_filenames(&opt.inputted_word_list),
take_first: opt.take_first,
take_rand: opt.take_rand,
sort_alphabetically: !opt.no_alpha,
to_lowercase: opt.to_lowercase,
should_straighten_quotes: opt.straighten_quotes,
should_remove_prefix_words: opt.remove_prefix_words,
Expand Down
14 changes: 14 additions & 0 deletions tests/list_manipulation_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,7 @@ mod list_manipulation_tests {
fn can_sort_words_alphabetically() {
let this_tidy_request = TidyRequest {
list: make_lists().0,
sort_alphabetically: true,
..Default::default()
};
let new_list = tidy_list(this_tidy_request);
Expand All @@ -73,6 +74,19 @@ mod list_manipulation_tests {
assert!(new_list[new_list.len() - 1] == "zookeeper".to_string());
}

#[test]
fn respect_option_to_not_sort_alphabetically() {
let this_tidy_request = TidyRequest {
list: make_lists().0,
sort_alphabetically: false,
..Default::default()
};
let new_list = tidy_list(this_tidy_request);
assert!(new_list[0] == "zookeeper".to_string());
assert!(new_list.contains(&"apple".to_string()));
assert!(new_list[new_list.len() - 1] == "station".to_string());
}

#[test]
fn removes_blank_lines() {
let this_tidy_request = TidyRequest {
Expand Down

0 comments on commit 9c4f904

Please sign in to comment.