Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode codepoint flags for custom regexs #7245

Merged
merged 10 commits into from
May 17, 2024

Conversation

jaime-m-p
Copy link
Collaborator

Use flags for each unicode category (\p{N}, \p{L}, \p{Z}, ...) instead of definitions CODEPOINT_TYPE_*.

Including helper flags for common regex params like \s (only this for now), \d, \w...

This simplifies writing custom regexs.

All flags are precomputed in unicode-data.cpp generated by gen-unicode-data.py.

@mofosyne mofosyne added Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level enhancement New feature or request labels May 13, 2024
Copy link
Contributor

github-actions bot commented May 14, 2024

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 568 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8218.58ms p(95)=19284.72ms fails=, finish reason: stop=510 truncated=58
  • Prompt processing (pp): avg=98.81tk/s p(95)=467.94tk/s
  • Token generation (tg): avg=34.6tk/s p(95)=46.94tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=codepoint-flags commit=2642da0ca8883994d20a73bebcd80f6f59b06c69

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 552.77, 552.77, 552.77, 552.77, 552.77, 530.25, 530.25, 530.25, 530.25, 530.25, 565.57, 565.57, 565.57, 565.57, 565.57, 594.7, 594.7, 594.7, 594.7, 594.7, 663.85, 663.85, 663.85, 663.85, 663.85, 666.18, 666.18, 666.18, 666.18, 666.18, 671.2, 671.2, 671.2, 671.2, 671.2, 711.84, 711.84, 711.84, 711.84, 711.84, 726.85, 726.85, 726.85, 726.85, 726.85, 730.85, 730.85, 730.85, 730.85, 730.85, 759.78, 759.78, 759.78, 759.78, 759.78, 806.08, 806.08, 806.08, 806.08, 806.08, 838.98, 838.98, 838.98, 838.98, 838.98, 755.13, 755.13, 755.13, 755.13, 755.13, 763.35, 763.35, 763.35, 763.35, 763.35, 765.57, 765.57, 765.57, 765.57, 765.57, 789.55, 789.55, 789.55, 789.55, 789.55, 790.63, 790.63, 790.63, 790.63, 790.63, 792.19, 792.19, 792.19, 792.19, 792.19, 800.03, 800.03, 800.03, 800.03, 800.03, 803.15, 803.15, 803.15, 803.15, 803.15, 821.46, 821.46, 821.46, 821.46, 821.46, 819.94, 819.94, 819.94, 819.94, 819.94, 822.38, 822.38, 822.38, 822.38, 822.38, 804.32, 804.32, 804.32, 804.32, 804.32, 802.09, 802.09, 802.09, 802.09, 802.09, 802.51, 802.51, 802.51, 802.51, 802.51, 802.77, 802.77, 802.77, 802.77, 802.77, 807.95, 807.95, 807.95, 807.95, 807.95, 809.86, 809.86, 809.86, 809.86, 809.86, 811.02, 811.02, 811.02, 811.02, 811.02, 819.16, 819.16, 819.16, 819.16, 819.16, 833.24, 833.24, 833.24, 833.24, 833.24, 834.96, 834.96, 834.96, 834.96, 834.96, 844.27, 844.27, 844.27, 844.27, 844.27, 842.2, 842.2, 842.2, 842.2, 842.2, 842.25, 842.25, 842.25, 842.25, 842.25, 842.5, 842.5, 842.5, 842.5, 842.5, 842.64, 842.64, 842.64, 842.64, 842.64, 843.79, 843.79, 843.79, 843.79, 843.79, 839.55, 839.55, 839.55, 839.55, 839.55, 839.09, 839.09, 839.09, 839.09, 839.09, 838.12, 838.12, 838.12, 838.12, 838.12, 840.85, 840.85, 840.85, 840.85, 840.85, 843.17, 843.17, 843.17, 843.17, 843.17, 842.97, 842.97, 842.97, 842.97, 842.97, 846.65, 846.65, 846.65, 846.65, 846.65, 846.46, 846.46, 846.46, 846.46, 846.46, 849.43, 849.43, 849.43, 849.43, 849.43, 853.25, 853.25, 853.25, 853.25, 853.25, 852.79, 852.79, 852.79, 852.79, 852.79, 858.26, 858.26, 858.26, 858.26, 858.26, 859.93, 859.93, 859.93, 859.93, 859.93, 859.92, 859.92, 859.92, 859.92, 859.92, 861.01, 861.01, 861.01, 861.01, 861.01, 860.51, 860.51, 860.51, 860.51, 860.51, 861.91, 861.91, 861.91, 861.91, 861.91, 864.81, 864.81, 864.81, 864.81, 864.81, 863.75, 863.75, 863.75, 863.75, 863.75, 862.4, 862.4, 862.4, 862.4, 862.4, 861.87, 861.87, 861.87]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 40.6, 40.6, 40.6, 40.6, 40.6, 41.18, 41.18, 41.18, 41.18, 41.18, 31.88, 31.88, 31.88, 31.88, 31.88, 32.84, 32.84, 32.84, 32.84, 32.84, 33.39, 33.39, 33.39, 33.39, 33.39, 33.82, 33.82, 33.82, 33.82, 33.82, 34.63, 34.63, 34.63, 34.63, 34.63, 35.41, 35.41, 35.41, 35.41, 35.41, 35.8, 35.8, 35.8, 35.8, 35.8, 35.56, 35.56, 35.56, 35.56, 35.56, 35.49, 35.49, 35.49, 35.49, 35.49, 34.85, 34.85, 34.85, 34.85, 34.85, 34.19, 34.19, 34.19, 34.19, 34.19, 33.99, 33.99, 33.99, 33.99, 33.99, 33.58, 33.58, 33.58, 33.58, 33.58, 33.74, 33.74, 33.74, 33.74, 33.74, 33.49, 33.49, 33.49, 33.49, 33.49, 33.23, 33.23, 33.23, 33.23, 33.23, 33.16, 33.16, 33.16, 33.16, 33.16, 33.02, 33.02, 33.02, 33.02, 33.02, 33.09, 33.09, 33.09, 33.09, 33.09, 33.2, 33.2, 33.2, 33.2, 33.2, 32.92, 32.92, 32.92, 32.92, 32.92, 33.01, 33.01, 33.01, 33.01, 33.01, 33.13, 33.13, 33.13, 33.13, 33.13, 32.86, 32.86, 32.86, 32.86, 32.86, 32.22, 32.22, 32.22, 32.22, 32.22, 32.17, 32.17, 32.17, 32.17, 32.17, 32.38, 32.38, 32.38, 32.38, 32.38, 32.5, 32.5, 32.5, 32.5, 32.5, 32.74, 32.74, 32.74, 32.74, 32.74, 32.85, 32.85, 32.85, 32.85, 32.85, 32.6, 32.6, 32.6, 32.6, 32.6, 32.53, 32.53, 32.53, 32.53, 32.53, 32.14, 32.14, 32.14, 32.14, 32.14, 32.16, 32.16, 32.16, 32.16, 32.16, 32.23, 32.23, 32.23, 32.23, 32.23, 32.25, 32.25, 32.25, 32.25, 32.25, 32.42, 32.42, 32.42, 32.42, 32.42, 32.58, 32.58, 32.58, 32.58, 32.58, 31.92, 31.92, 31.92, 31.92, 31.92, 31.81, 31.81, 31.81, 31.81, 31.81, 31.46, 31.46, 31.46, 31.46, 31.46, 30.81, 30.81, 30.81, 30.81, 30.81, 30.72, 30.72, 30.72, 30.72, 30.72, 30.75, 30.75, 30.75, 30.75, 30.75, 30.83, 30.83, 30.83, 30.83, 30.83, 30.92, 30.92, 30.92, 30.92, 30.92, 30.99, 30.99, 30.99, 30.99, 30.99, 30.94, 30.94, 30.94, 30.94, 30.94, 30.83, 30.83, 30.83, 30.83, 30.83, 30.79, 30.79, 30.79, 30.79, 30.79, 30.87, 30.87, 30.87, 30.87, 30.87, 31.01, 31.01, 31.01, 31.01, 31.01, 31.09, 31.09, 31.09, 31.09, 31.09, 31.13, 31.13, 31.13, 31.13, 31.13, 31.21, 31.21, 31.21, 31.21, 31.21, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.26, 31.34, 31.34, 31.34, 31.34, 31.34, 31.43, 31.43, 31.43]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.13, 0.13, 0.13, 0.13, 0.13, 0.41, 0.41, 0.41, 0.41, 0.41, 0.17, 0.17, 0.17, 0.17, 0.17, 0.23, 0.23, 0.23, 0.23, 0.23, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.17, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.27, 0.27, 0.27, 0.27, 0.27, 0.25, 0.25, 0.25, 0.25, 0.25, 0.26, 0.26, 0.26, 0.26, 0.26, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.3, 0.3, 0.3, 0.3, 0.3, 0.2, 0.2, 0.2, 0.2, 0.2, 0.15, 0.15, 0.15, 0.15, 0.15, 0.19, 0.19, 0.19, 0.19, 0.19, 0.12, 0.12, 0.12, 0.12, 0.12, 0.33, 0.33, 0.33, 0.33, 0.33, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.27, 0.27, 0.27, 0.27, 0.27, 0.33, 0.33, 0.33, 0.33, 0.33, 0.27, 0.27, 0.27, 0.27, 0.27, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.1, 0.1, 0.1, 0.1, 0.1, 0.16, 0.16, 0.16, 0.16, 0.16, 0.17, 0.17, 0.17, 0.17, 0.17, 0.19, 0.19, 0.19, 0.19, 0.19, 0.18, 0.18, 0.18, 0.18, 0.18, 0.16, 0.16, 0.16, 0.16, 0.16, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.1, 0.1, 0.1, 0.1, 0.1, 0.31, 0.31, 0.31, 0.31, 0.31, 0.42, 0.42, 0.42, 0.42, 0.42, 0.49, 0.49, 0.49, 0.49, 0.49, 0.4, 0.4, 0.4, 0.4, 0.4, 0.21, 0.21, 0.21, 0.21, 0.21, 0.16, 0.16, 0.16, 0.16, 0.16, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.12, 0.12, 0.12, 0.12, 0.12, 0.16, 0.16, 0.16, 0.16, 0.16, 0.26, 0.26, 0.26, 0.26, 0.26, 0.14, 0.14, 0.14, 0.14, 0.14, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.12, 0.09, 0.09, 0.09, 0.09, 0.09, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.18, 0.18, 0.18, 0.18, 0.18, 0.17, 0.17, 0.17, 0.17, 0.17, 0.16, 0.16, 0.16, 0.16, 0.16, 0.13, 0.13, 0.13]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 568 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715781090 --> 1715781724
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 6.0, 6.0, 6.0, 6.0, 6.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 7.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
                    
Loading

@ggerganov
Copy link
Owner

Looks like the tokenizer tests are failing on Windows for some reason:

https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583

@jaime-m-p
Copy link
Collaborator Author

Looks like the tokenizer tests are failing on Windows for some reason:

https://github.com/ggerganov/llama.cpp/actions/runs/9096294810/job/25001393493?pr=7245#step:12:2583

I can not debug this in local, it is possible to skip all but the failing test?

I have reviewed the previous logs but that test was not executed, so I think i'm going to start from a clean point and redo all commits until I see the fail.

Also I found that compiling tests with BUILD_SHARED_LIBS ON fails with missing pthread_create, I will check later, but seems -pthread flag or -lpthread lib.

@jaime-m-p
Copy link
Collaborator Author

The problem is the stack size limit in Windows.

According to MSVC \STACK documentation:
For ARM64, x86, and x64 machines, the default stack size is 1 MB.

sizeof( std::array<codepoint_flags, MAX_CODEPOINTS> ) ~ 2MB.

unicode-data.h Outdated Show resolved Hide resolved
@jaime-m-p
Copy link
Collaborator Author

I think I'm done here.

Now I have the base to fix tokenizers.
Brute force test found fail cases while testing more models (even llama-3 custom regex is failing).

@jaime-m-p jaime-m-p merged commit b43272a into ggerganov:master May 17, 2024
62 of 66 checks passed
Nexesenex pushed a commit to Nexesenex/croco.cpp that referenced this pull request May 18, 2024
* Replace CODEPOINT_TYPE_* with codepoint_flags
* Update and bugfix brute force random test
* Deterministic brute force random test
* Unicode normalization NFD
* Get rid of BOM
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants