Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

builtin: fix runes.to_upper() (fix #22742) #22755

Merged
merged 10 commits into from
Nov 6, 2024

Conversation

yuyi98
Copy link
Member

@yuyi98 yuyi98 commented Nov 4, 2024

This PR fix runes.to_upper() (fix #22742).

  • Implement rune.to_upper()/to_lower()/to_title().
  • Fix runes.to_upper().
  • Add tests.
fn main() {
	s1 := 'água'.to_upper()
	println(s1)
	s2 := 'ÁGUA'.to_lower()
	println(s2)
}

PS D:\Test\v\tt1> v run .        
ÁGUA
água

Copy link
Contributor

@StunxFS StunxFS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent!

@spytheman
Copy link
Member

The change is nice, but unfortunately it affects builtin in a non trivial way, by bloating the generated .c source for it, and its binary size :-|, and builtin is used by practically all V programs.

Here are some comparisons between master and it, for hello_world.v with and without -skip-unused:

#0 13:03:13 ^ master /space/v/oo>./v -skip-unused -o hw_master_skip_unused examples/hello_world.v
#0 13:03:27 ^ master /space/v/oo>./v -o hw_master examples/hello_world.v
#0 13:03:32 ^ master /space/v/oo>ls -larS hw_*
-rw-rw-r-- 1 delian delian 221988 Nov  5 12:48 hw_master_skip_unused.c
-rw-rw-r-- 1 delian delian 246839 Nov  5 12:56 hw_pr_shortened_skip_unused.c
-rw-rw-r-- 1 delian delian 246839 Nov  5 13:01 hw_pr_shortened_skip_unused_2.c
-rw-rw-r-- 1 delian delian 260311 Nov  5 12:49 hw_pr_skip_unused.c
-rwxrwxr-x 1 delian delian 445184 Nov  5 13:03 hw_master_skip_unused
-rw-rw-r-- 1 delian delian 450586 Nov  5 12:48 hw_master.c
-rw-rw-r-- 1 delian delian 476070 Nov  5 12:56 hw_pr_shortened.c
-rw-rw-r-- 1 delian delian 476152 Nov  5 13:01 hw_pr_shortened_2.c
-rwxrwxr-x 1 delian delian 482008 Nov  5 13:01 hw_pr_shortened_skip_unused_2
-rwxrwxr-x 1 delian delian 482008 Nov  5 13:02 hw_pr_shortened_skip_unused
-rw-rw-r-- 1 delian delian 489542 Nov  5 12:49 hw_pr.c
-rwxrwxr-x 1 delian delian 719432 Nov  5 13:03 hw_master
-rwxrwxr-x 1 delian delian 757320 Nov  5 13:02 hw_pr
-rwxrwxr-x 1 delian delian 757552 Nov  5 13:01 hw_pr_shortened_2
#0 13:03:34 ^ master /space/v/oo>

I've managed to reduce the .C size expansion a bit, by renaming upper_lower to ul and start to s, end to e, upper_offset to uo, lower_offset to lo, title_offset to to (the biggest contributor to the .c bloat is the initialization of the rune_maps constant in _vinit, that repeats the field names a lot), but still ... as it is, a simple program will grow with ~37KB, even though it does not use that new functionality at all :-| .

Can we move the new versions, that require that table, outside of builtin, to encoding.utf8 where they will not affect all V programs?

@JalonSolov
Copy link
Contributor

Does the size of the generated C file really matter that much? I don't usually pay any attention at all the the size of the .c files. I am more interested in the speed of the final code than the size of the intermediate steps.

If it also affects the size of the final executable, that's not too big a deal right now, either. At some point, we will have better unused code deletion, which will help tremendously.

@spytheman
Copy link
Member

spytheman commented Nov 5, 2024

Yes, it does matter, because it slows down compilation for short programs.
Most tests are very short, and there are thousands of them.
Increasing their binary size and compile time requirements with 5-10%, for a feature that will not be used much, does not seem like a good trade off to me.

@spytheman
Copy link
Member

At some point, we will have better unused code deletion, which will help tremendously.

Ideally yes. Pragmatically, it does increase the size (and slows down compilation) right now.

@spytheman
Copy link
Member

You could say, that features like this, can serve as motivators for improving the code generation, as a second order effect, and you will be right 🤷🏻‍♂️ .

@spytheman
Copy link
Member

The speed of the final code (at runtime) is not affected much - the size is increased mainly by the new const table for the rune rules.

@spytheman
Copy link
Member

spytheman commented Nov 5, 2024

Here is a small comparison while compiling/running tests, between master and the PR here.
I've chosen vlib/builtin/, vlib/math/, vlib/flag/, vlib/encoding/, because those are relatively small tests, that do not do much IO, or network operations (i.e. their runtime is small compared to the compile time, and is mostly stable).

#0 16:46:23 ^ master /space/v/oo>
#0 16:46:23 ^ master /space/v/oo>./v self
V self compiling ...
V built successfully as executable "v".
#0 16:46:28 ^ master /space/v/oo>./v self
V self compiling ...
V built successfully as executable "v".
#0 16:46:31 ^ master /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   520.1 ms, R:     3.035 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 55923 ms, on 1 job. Comptime: 49685 ms. Runtime: 6179 ms.
#0 16:47:30 ^ master /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   516.2 ms, R:     3.041 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 53822 ms, on 1 job. Comptime: 47581 ms. Runtime: 6183 ms.
#0 16:48:26 ^ master /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   520.4 ms, R:     2.985 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 53661 ms, on 1 job. Comptime: 47442 ms. Runtime: 6163 ms.
#0 16:49:26 ^ master /space/v/oo>
#0 16:49:28 ^ master /space/v/oo>git switch -
Switched to branch 'implement_rune_to_upper'
#0 16:49:33 ^ implement_rune_to_upper /space/v/oo>git pull --rebase origin master
From github.com:vlang/v
 * branch                master     -> FETCH_HEAD
Current branch implement_rune_to_upper is up to date.
#0 16:49:38 ^ implement_rune_to_upper /space/v/oo>./v self
V self compiling ...
V built successfully as executable "v".
#0 16:49:42 ^ implement_rune_to_upper /space/v/oo>./v self
V self compiling ...
V built successfully as executable "v".
#0 16:49:46 ^ implement_rune_to_upper /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   537.2 ms, R:     3.167 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 57700 ms, on 1 job. Comptime: 51363 ms. Runtime: 6276 ms.
#0 16:50:46 ^ implement_rune_to_upper /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   532.8 ms, R:     2.996 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 55495 ms, on 1 job. Comptime: 49182 ms. Runtime: 6255 ms.
#0 16:51:44 ^ implement_rune_to_upper /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   535.7 ms, R:     3.065 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 55455 ms, on 1 job. Comptime: 49141 ms. Runtime: 6255 ms.
#0 16:52:41 ^ implement_rune_to_upper /space/v/oo>

@spytheman
Copy link
Member

I also tried testing with v run .github/workflows/compare_pr_to_master.v , but that is not very suitable for measuring changes like this, because it runs the old and new versions of the compiler, on the same codebase (the one from the PR), where both will have to deal with the new table etc.

(for what is worth, the new compiler version is slightly faster in that particular situation, compared to the version on master, even though both produce bigger binaries)

@spytheman
Copy link
Member

spytheman commented Nov 5, 2024

On the other hand, this PR does fix the correctness of the methods, and one can argue, that is more important, compared to the increased resources needed for compilation and storage.

@spytheman
Copy link
Member

I wonder if we can compress the rule table somehow 🤔 .

@medvednikov
Copy link
Member

That's why I think it should be moved to a unicode module, like in Go.

On the other hand Go's string.ToUpper() imports unicode, so that doesn't help.

@JalonSolov
Copy link
Contributor

I wonder if we can compress the rule table somehow 🤔 .

Don't see why not. Make it a fixed array of fixed arrays of 5 ints.

Slightly more complex code to deal with it, but it removes the overhead of all the structs, instead making it one large block of memory.

It'll load faster, as well.

@medvednikov
Copy link
Member

I'm fine with making str.to_upper() work only with ASCII and print a warning for non-ASCII: use unicode.to_upper()

@spytheman
Copy link
Member

spytheman commented Nov 5, 2024

The table is ~2.3KB when compressed with xz, so there is a lot of redundancy in it.

Before compressing, the .c delta for the _const_rune_maps is ~20KB, so a general purpose text compression algorithm achieves ~9x compression rate. With some effort, we can do similar or even better.

@spytheman
Copy link
Member

spytheman commented Nov 5, 2024

Turning it into a fixed array (without restructuring into a flat array), did not work (I hit a cgen error edge case, which perhaps needs a separate issue).

@spytheman
Copy link
Member

A flat array of just integers works fine, and reduces the overhead (compared to master, for hello_world) a lot:

  • the overhead in generated .c is now down to 12KB (from 38KB) with a flat array of plain integers
  • the overhead in the binary size is down to 9.25KB (from ~36KB)

@spytheman
Copy link
Member

#0 18:12:49 ^ implement_rune_to_upper /space/v/oo>ls -larS mhw* ohw* nhw* fhw*
-rw-rw-r-- 1 delian delian 221620 Nov  5 17:41 mhw_su.c
-rw-rw-r-- 1 delian delian 233769 Nov  5 18:08 fhw_su.c
-rw-rw-r-- 1 delian delian 243791 Nov  5 17:52 nhw_su.c
-rw-rw-r-- 1 delian delian 259943 Nov  5 17:31 ohw_su.c
-rwxrwxr-x 1 delian delian 444384 Nov  5 17:41 mhw_su
-rw-rw-r-- 1 delian delian 449928 Nov  5 17:41 mhw.c
-rwxrwxr-x 1 delian delian 452760 Nov  5 18:08 fhw_su
-rw-rw-r-- 1 delian delian 462710 Nov  5 18:08 fhw.c
-rw-rw-r-- 1 delian delian 472732 Nov  5 17:52 nhw.c
-rwxrwxr-x 1 delian delian 481208 Nov  5 17:31 ohw_su
-rwxrwxr-x 1 delian delian 481208 Nov  5 17:52 nhw_su
-rw-rw-r-- 1 delian delian 488884 Nov  5 17:31 ohw.c
-rwxrwxr-x 1 delian delian 718192 Nov  5 17:41 mhw
-rwxrwxr-x 1 delian delian 727664 Nov  5 18:08 fhw
-rwxrwxr-x 1 delian delian 756080 Nov  5 17:31 ohw
-rwxrwxr-x 1 delian delian 756080 Nov  5 17:52 nhw
#0 18:12:50 ^ implement_rune_to_upper /space/v/oo>
  • the mhw variants are from master
  • the fhw variants are from my latest changes (using a flat fixed array of ints), on the PR
  • the ohw variants are from the original code in the PR
  • the nhw variants are from an intermediate version, where only the fields and const name were shortened

@spytheman
Copy link
Member

As a bonus, the flat fixed array of ints, is initialized statically, not in _vinit(), and it does not need to allocate at runtime.

@StunxFS
Copy link
Contributor

StunxFS commented Nov 5, 2024

Hm, if the problem is the size of the .c file, then why not have the compiler generate a .c file for each module?

@JalonSolov
Copy link
Contributor

Flat fixed array is even better, though it makes the code even more interesting, but pretty standard for C-style programming.

… to a flat fixed array of ints, to reduce overhead
@spytheman
Copy link
Member

#0 18:39:10 ^ master /space/v/oo>./v self
V self compiling ...
V built successfully as executable "v".
#0 18:39:15 ^ master /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   520.0 ms, R:     3.023 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 56013 ms, on 1 job. Comptime: 49761 ms. Runtime: 6195 ms.
#0 18:40:15 ^ master /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   518.7 ms, R:     3.067 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 53739 ms, on 1 job. Comptime: 47483 ms. Runtime: 6193 ms.
#0 18:41:10 ^ master /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   517.7 ms, R:     3.216 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 53697 ms, on 1 job. Comptime: 47447 ms. Runtime: 6191 ms.
#0 18:42:06 ^ master /space/v/oo>
#0 18:42:07 ^ master /space/v/oo>git switch -
Switched to branch 'implement_rune_to_upper'
#0 18:42:09 ^ implement_rune_to_upper /space/v/oo>./v self
V self compiling ...
V built successfully as executable "v".
#0 18:42:13 ^ implement_rune_to_upper /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   522.4 ms, R:     3.075 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 56407 ms, on 1 job. Comptime: 50158 ms. Runtime: 6188 ms.
#0 18:43:20 ^ implement_rune_to_upper /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   525.5 ms, R:     3.017 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 54323 ms, on 1 job. Comptime: 48084 ms. Runtime: 6179 ms.
#0 18:44:15 ^ implement_rune_to_upper /space/v/oo>./v -progress test vlib/builtin/ vlib/math/ vlib/flag/ vlib/encoding/
---- Testing... ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 OK    [117/117] C:   525.1 ms, R:     3.091 ms vlib/math/vec/vec4_test.v
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Summary for all V _test.v files: 117 passed, 117 total. Elapsed time: 54199 ms, on 1 job. Comptime: 47961 ms. Runtime: 6181 ms.
#0 18:45:10 ^ implement_rune_to_upper /space/v/oo>

^ that is with the flat fixed array. The compiletime and runtime overhead is now smaller.

@spytheman spytheman force-pushed the implement_rune_to_upper branch from c0180b1 to 0bfb5e3 Compare November 5, 2024 16:48
@spytheman
Copy link
Member

spytheman commented Nov 5, 2024

Hm, if the problem is the size of the .c file, then why not have the compiler generate a .c file for each module?

That is planned too (as an extension of -usecache), but does not work as well as it could right now, since the parser and the checker still have to process all .v files to get the type information, even when ultimately the produced .c that will get compiled and linked is smaller.

I.e. you will get a smaller cost for the C compilation with -usecache, but nearly the full cost for parsing/checking, even though none of the .v files for the module dependencies have changed.

The overhead of the generated binary at the end remains though even with -usecache, because the table is in builtin, and that gets imported by everything.

@spytheman
Copy link
Member

Just tested v test-all on the same machine, on master:

---- Summary of `v test-all`: -----------------------------------------------------------------
Total runtime: 1010710 ms

on the PR:

---- Summary of `v test-all`: -----------------------------------------------------------------
Total runtime: 1114622 ms

i.e. for me, the full test suite is still 10% slower now :-| .

@JalonSolov
Copy link
Contributor

Still not that bad, considering the CI is what mainly runs test-all. Even making a compiler change locally, you don't run test-all after every minor change, only when you're ready to push to a PR.

@spytheman
Copy link
Member

spytheman commented Nov 5, 2024

Still not that bad, considering the CI is what mainly runs test-all. Even making a compiler change locally, you don't run test-all after every minor change, only when you're ready to push to a PR.

Yes, and after re-running it again, I got a much closer result (+ ~2% overhead) ... Also some of the tests do network ops (i.e. high variability of the runtime), so the diff is not so bad after all, and it can be optimized later, after merging.

@spytheman
Copy link
Member

I'll wait till the full CI passes here (because of the unsafe{} pointer arithmetic involved in the latest commit), then merge it.

@spytheman
Copy link
Member

I think also that some of the overhead, can be explained not just by the increased .c file size by the new table, but also by the change to string.to_upper and string.to_lower to use a linear scan for non ASCII characters, and then the new rune based implementation to handle the non utf8 strings.

The compiler may have to be changed, to selectively use a .to_upper_ascii and .to_lower_ascii methods, that will work as before, to avoid the non ASCII check completely (since we do not allow non ASCII characters for the identifiers/keywords etc)

@yuyi98
Copy link
Member Author

yuyi98 commented Nov 6, 2024

@spytheman I think we can get rid of the to_title item and shrink it.

@spytheman
Copy link
Member

spytheman commented Nov 6, 2024

There were some entries, for which the upper offset was != than the title one:

       0x1C4, 0x1C4, 0, 2, 1,
       0x1C5, 0x1C5, -1, 1, 0,
       0x1C6, 0x1C6, -2, 0, -1,
       0x1C7, 0x1C7, 0, 2, 1,
       0x1C8, 0x1C8, -1, 1, 0,
       0x1C9, 0x1C9, -2, 0, -1,
       0x1CA, 0x1CA, 0, 2, 1,
       0x1CB, 0x1CB, -1, 1, 0,
       0x1CC, 0x1CC, -2, 0, -1,
       0x1F1, 0x1F1, 0, 2, 1,
       0x1F2, 0x1F2, -1, 1, 0,
       0x1F3, 0x1F3, -2, 0, -1,
       0x10D0, 0x10FA, 3008, 0, 0,
       0x10FD, 0x10FF, 3008, 0, 0,

They could be handled separately in pub fn (c rune) to_title() rune { if needed (we do not have tests for them currently).

@spytheman
Copy link
Member

hm, interesting ... tcc can not handle const integer values, used in const fixed array inits, failing with: error: initializer element is not constant, for code looking like this:

typedef int Array_fixed_int_3 [3];
const int _const_ul = -3;
Array_fixed_int_3 _const_abc = {0, 1, _const_ul};

gcc and clang have no problems with the above.

@spytheman spytheman merged commit b30be51 into vlang:master Nov 6, 2024
71 checks passed
@yuyi98
Copy link
Member Author

yuyi98 commented Nov 6, 2024

@spytheman Thank you very much for doing a lot of optimization!

@yuyi98 yuyi98 deleted the implement_rune_to_upper branch November 6, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

string to_upper() latin (utf8?) characters error
6 participants