GitHub - thautwarm/Fable.Sedlex: get you an ultimate lexer generator using Fable; port OCaml sedlex to FSharp, Python and more!

NOTE: currently we support interpreted mode and source code generation for Python and Julia.

It's EASY to compile compiled_unit into source code for C#, F# and others. See CodeGen.Python.fs for how to write a custom backend.

Fable.Sedlex

Thanks to the Fable.Python compiler, this project managed to modify code from the amazing OCaml sedlex, implementing an ultimate lexer generator for (currently) both Python and F#.

The most impressive feature of sedlex is that Sedlex statically analyses regular expressions built with lexer combinators, correctly ordering/merging/optimizing the lexical rules. In short, sedlex is correct and efficient.

For Python users: the generated Python code is located in fable_sedlex directory, and you can directly copy them to your Python package to access such lexer generator.

For .NET(C#, F#) users: copying Sedlex.fs will give you such lexer generator in .NET.

For users in other programming languages: you might access compiled_unit and write a code generator.

from fable_sedlex.sedlex import *
from fable_sedlex.code_gen_python import codegen_python
from fable_sedlex.code_gen_julia import codegen_julia
from fable_sedlex.code_gen import show_doc

digit = pinterval(ord('0'), ord('9'))

dquote = pchar('"')
backslash = pchar('\\')
dot = pchar('.')

exp = pseq([por(pchar('E'), pchar('e')), pplus(digit)])
flt = pseq([pstar(digit), dot, pplus(digit), popt(exp)])
integral = pseq([pplus(digit), popt(exp)])
string_lit = pseq([dquote, pstar(por(pcompl(dquote), pseq([backslash,  pany]))), dquote])
space = pchars(["\n", "\t", "\r", ' '])

EOF_ID = 0
cu = build(
    [
        (space, Lexer_discard),
        (flt, Lexer_tokenize(1)),
        (integral, Lexer_tokenize(2)),
        (string_lit, Lexer_tokenize(3)),
        (pstring("+"), Lexer_tokenize(4)),
        (pstring("+="), Lexer_tokenize(4)),
        (peof, Lexer_tokenize(EOF_ID))
    ], "my error")

# julia_code = codegen_julia("using Sedlex", cu)

@dataclass
class MyToken:
  token_id: int
  lexeme : str
  line: int
  col: int
  span: int
  offset: int
  file: str

f = inline_thread(cu, lambda args: MyToken(*args))
# we can also generate Python source code for equivalent lexer via:
#    `code = show_doc(codegen_python(cu))`
# see `test_codegen_py.py`, `generated.py` and `test_run_generated.py` 
# for more details


buf = from_ustring(r'123 2345 + += 2.34E5 "sada\"sa" ')

tokens = []
while True:
    try:
        x = f(buf)
    except Exception as e:
        print(e)
        raise
    
    if x is None:
        continue
    if x.token_id == EOF_ID:
        break
    tokens.append(x)

print(tokens)
> python test.py

[MyToken(token_id=2, lexeme='123', line=0, col=3, span=3, offset=0, file=''), MyToken(token_id=2, lexeme='2345', line=0, col=8, span=4, offset=4, file=''), MyToken(token_id=4, lexeme='+', line=0, col=10, span=1, offset=9, file=''), MyToken(token_id=4, lexeme='+=', line=0, col=13, span=2, offset=11, file=''), MyToken(token_id=1, lexeme='2.34E5', line=0, col=20, span=6, offset=14, file=''), MyToken(token_id=3, lexeme='"sada\\"sa"', line=0, col=31, span=10, offset=21, file='')]

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
fable_sedlex		fable_sedlex
.gitignore		.gitignore
CodeGen.Julia.fs		CodeGen.Julia.fs
CodeGen.Python.fs		CodeGen.Python.fs
Fable.Sedlex.fsproj		Fable.Sedlex.fsproj
Fable.Sedlex.sln		Fable.Sedlex.sln
LICENSE		LICENSE
PrettyDoc.fs		PrettyDoc.fs
README.md		README.md
Sedlex.fs		Sedlex.fs
build.sh		build.sh
generated.jl		generated.jl
package-lock.json		package-lock.json
package.json		package.json
project.comf		project.comf
pyproject.toml		pyproject.toml
runtest.ps1		runtest.ps1
test-julia.sh		test-julia.sh
test_julia.js		test_julia.js
test_julia.py		test_julia.py
test_run_generated.py		test_run_generated.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fable.Sedlex

About

Releases 1

Packages

Languages

License

thautwarm/Fable.Sedlex

Folders and files

Latest commit

History

Repository files navigation

Fable.Sedlex

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages