Migrate to `phf`, add `no-std` support #9

mkroening · 2022-12-22T13:18:30Z

Closes #7.

This PR is structured as follows:

Migrate code generation from generate_table.py to build.rs.

Before, generate_table.py was called once and the generated code was then committed to the repository.
Now, build.rs generates the code from assets/code_tables.json on the fly for every user before the compilation. The resulting code is not saved persistently anywhere, and is regenerated after cargo clean.

This step does not change the generated code or any dependencies.
Migrate to phf.

This change migrates the code generation for hash maps from the manual lazy_static and ahash::AHashMap based code to a phf_codegen and phf::Map based one.

This commit and the following ones are for supporting no-std environments:

Enable Rust 2021 to enable Cargo resolver version 2.

Without this, phf_codegen in build-dependencies enables the std
feature for phf_shared in the normal dependencies scope as well.

See https://doc.rust-lang.org/cargo/reference/features.html#feature-resolver-version-2

The next two commits are for refactoring all functionality that requires an allocator into one module for easier gating.

Move string functions into string module.
Move TableType string methods to string module.
Add no_std support.

This makes all functionality depend on the alloc crate instead of std, making it suitable for no-std environments.
Additionally, the alloc functionality can be disabled via Cargo features for no-std environments without alloc.

Cargo.toml

.github/workflows/master.yml

tats-u · 2022-12-24T15:38:12Z

Are you trying to rewrite generate_table.py in Rust? If the build.rs does just what generate_table.py, I think you should rename build.rs as generate_tables.rs for example. build is too generic.

mkroening · 2022-12-25T11:05:49Z

Are you trying to rewrite generate_table.py in Rust? If the build.rs does just what generate_table.py, I think you should rename build.rs as generate_tables.rs for example. build is too generic.

build.rs is a Cargo build script. It has to be called build.rs. It's executed before compilation of the source files, and I used it to replace generate_table.py in the first commit, since I deemed it most suitable for migrating to phf in the second commit.

I rebased on latest master and applied your early suggestions. I pushed it commit by commit to have CI runs for each one of them.
I'll also update the description of this PR and mark this as ready for review.

Would you like me to split this PR up or are you okay with keeping it this big?

mkroening · 2022-12-26T00:36:55Z

If the build.rs does just what generate_table.py, I think you should rename build.rs as generate_tables.rs for example. build is too generic.

I just found, that you can actually use a different name for the build script: package.build. I have never seen another name than build.rs before though, and I would find things easier myself, if we keep the name build.rs for the build script.

It's up to you to decide, though. :)

build.rs

tats-u · 2022-12-26T15:14:50Z

build.rs

+struct CodeTables {
+    created: String,
+    tables: Vec<(u16, Table)>,
+}


I want a doc comment that shows this is parsed content of code_tables.json. This is just an draft.

/// Parsed code tables from `code_tables.json` struct CodeTables { /// ISO 8601 Timestamp when tables in the JSON are created created: String, //// code tables main part //// //// `([code page], [table definition])` tables: Vec<(u16, Table)>, }

The name ParsedCodeTablesJsonContent might be better because this is not tables itself and has also a timestamp.

Also could you tell me why you think the Vec of the tuple is better than a map (Map, HashMap, or something similar)?

I called it CodeTables, because the file is called code_tables.json. I also wanted to avoid a long name.

I chose Vec, because I want the tables to be sorted and stable in the generated output.

@mkroening Autocompletion by modern editors reduces a pain of typing long names a lot, though.

because I want the tables to be sorted and stable in the generated output.

Why not BTreeMap? Also you want to clarify a sort is needed in a comment.

I was more concerned about reading, but if you want, I can rename it to ParsedCodeTablesJsonContent.

I chose Vec, because it is faster and simpler if we only sort it once and don't mutate it. BTreeMap would also work fine though and I can migrate to it if you want me to. :)

@mkroening I'm glad to hear your opinion. You don't have to modify.

I chose Vec, because it is faster

BTreeMap has the same order $O(n \log n)$ as a sort of Vec, but I didn't know the combination of Vec and sort is faster.

They should scale the same, and BTreeMap makes sure that the collection is kept sorted. Vec provides better cache locality, since its items are stored in a row. So it's about nice access patterns when iterating. For the BTreeMap, we'd jump around in memory a lot.

All of this is super irrelevant for our use case, since this is not at all performance sensitive, and benchmarks might prove me wrong in this case. So my choice was rather arbitrary and mainly out of habit. :)

tats-u · 2022-12-26T15:21:46Z

@mkroening I see, and you can go ahead (you can stick to build.rs) if you add sufficient comments.

Enable Rust 2021 We need the version 2 feature resolver for no_std support. Otherwise, `phf_codegen` in `build-dependencies` enables the `std` feature for `phf_shared` in the normal `dependencies` scope as well. https://doc.rust-lang.org/cargo/reference/features.html#feature-resolver-version-2

mkroening · 2023-01-03T11:49:04Z

Sorry for taking so long. I addressed your concerns. :)

@mkroening

- Use `phf` (perfect hash map) instead of `ahash` (#7, #9) by @mkroening - Actual type of `OEMCPHashMap` will be changed. I suppose that you do not have to modify your code; recompile might be needed (I do not know much) - (Use `clippy` as lints (#8) by @mkroening) - Remove unused dependency on `hfs_nfd`

tats-u · 2023-02-20T15:31:12Z

This change is now available in 1.2.0. Sorry to keep you waiting for a long time.

mkroening · 2023-02-20T17:53:23Z

No worries! Thanks a lot! :)

- Revert changes (#9) that should have been 2.0.0 (#11) - Update `ahash` - Remove unused `hfs_nfd` Note: #8 is included

@mkroening

- Use phf (perfect hash map) instead of ahash (#7, #9) by @mkroening Actual type of OEMCPHashMap will be changed.

tats-u · 2023-02-21T15:54:54Z

Sorry I'd like you to use 2.0.0(-beta.1) instead. I found this patch is a breaking change, contrary to my expectations.

@mkroening

- Use `once_cell` instead of `lazy_static` for test - Comply with Clippy rules - More small fixes The following changes has been included since 2.0.0-beta.1. - Use phf (perfect hash map) instead of ahash (#7, #9) by @mkroening Actual type of OEMCPHashMap will be changed.

mkroening force-pushed the phf branch from 8a08600 to 9fd1674 Compare December 22, 2022 18:43

tats-u reviewed Dec 24, 2022

View reviewed changes

Cargo.toml Show resolved Hide resolved

.github/workflows/master.yml Outdated Show resolved Hide resolved

mkroening force-pushed the phf branch 2 times, most recently from 161eb9e to 470c35c Compare December 25, 2022 10:56

mkroening changed the title ~~Migrate to phf~~ Migrate to phf, add no-std support Dec 25, 2022

mkroening marked this pull request as ready for review December 25, 2022 11:21

mkroening force-pushed the phf branch from c4edde4 to f10feee Compare December 26, 2022 00:31

tats-u reviewed Dec 26, 2022

View reviewed changes

mkroening added 6 commits January 3, 2023 12:44

Generate code_table in build script

657c66f

Migrate to phf

86fdd02

Move string functions into string module

fc8a893

Move TableType string methods to string module

a899765

Add no_std support

71326b9

mkroening force-pushed the phf branch from a73496f to 657c66f Compare January 3, 2023 11:45

mkroening mentioned this pull request Jan 3, 2023

Provide char-based API #10

Open

tats-u approved these changes Jan 4, 2023

View reviewed changes

tats-u merged commit 47af156 into tats-u:master Jan 5, 2023

mkroening deleted the phf branch January 5, 2023 15:55

tats-u added a commit that referenced this pull request Feb 21, 2023

Version 1.3.0

2b53f10

- Revert changes (#9) that should have been 2.0.0 (#11) - Update `ahash` - Remove unused `hfs_nfd` Note: #8 is included

tats-u added a commit that referenced this pull request Feb 21, 2023

Version 2.0.0-beta.1

7f3094b

- Use phf (perfect hash map) instead of ahash (#7, #9) by @mkroening Actual type of OEMCPHashMap will be changed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate to `phf`, add `no-std` support #9

Migrate to `phf`, add `no-std` support #9

mkroening commented Dec 22, 2022 •

edited

Loading

tats-u commented Dec 24, 2022

mkroening commented Dec 25, 2022

mkroening commented Dec 26, 2022

tats-u Dec 26, 2022

mkroening Jan 3, 2023

tats-u Jan 4, 2023 •

edited

Loading

mkroening Jan 5, 2023

tats-u Jan 5, 2023 •

edited

Loading

mkroening Jan 5, 2023

tats-u commented Dec 26, 2022 •

edited

Loading

mkroening commented Jan 3, 2023

tats-u commented Feb 20, 2023

mkroening commented Feb 20, 2023

tats-u commented Feb 21, 2023

Migrate to phf, add no-std support #9

Migrate to phf, add no-std support #9

Conversation

mkroening commented Dec 22, 2022 • edited Loading

tats-u commented Dec 24, 2022

mkroening commented Dec 25, 2022

mkroening commented Dec 26, 2022

tats-u Dec 26, 2022

Choose a reason for hiding this comment

mkroening Jan 3, 2023

Choose a reason for hiding this comment

tats-u Jan 4, 2023 • edited Loading

Choose a reason for hiding this comment

mkroening Jan 5, 2023

Choose a reason for hiding this comment

tats-u Jan 5, 2023 • edited Loading

Choose a reason for hiding this comment

mkroening Jan 5, 2023

Choose a reason for hiding this comment

tats-u commented Dec 26, 2022 • edited Loading

mkroening commented Jan 3, 2023

tats-u commented Feb 20, 2023

mkroening commented Feb 20, 2023

tats-u commented Feb 21, 2023

Migrate to `phf`, add `no-std` support #9

Migrate to `phf`, add `no-std` support #9

mkroening commented Dec 22, 2022 •

edited

Loading

tats-u Jan 4, 2023 •

edited

Loading

tats-u Jan 5, 2023 •

edited

Loading

tats-u commented Dec 26, 2022 •

edited

Loading