-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A hello from a similar project 👋🏻 #56
Comments
Hi @federicotdn! it is so cool to see your project! I think there could be a lot for me to learn from your implementation. I read through your design doc and ran some sample programs. I think Github issues are a fine medium for discussion, so if it is okay with you I have a few questions/observations. why go?Why did you choose go for this project? having a builtin in garbage collector seems useful. Also goroutines can make parallelism easier. However I feel like the tricky part of adding goroutines to Emacs is not implementing the interpreter to support it, but defining the semantics in elisp. I have written about that here and am curious what your thoughts are about using threads in Emacs. We currently support multi-threaded elisp in the most basic sense with the go function. But most of the elisp semantics have yet to be nailed down. elisp string compatibilityIt looks like you are trying to be "bug compatible" with Emacs because you say that "The current main design guide for Pimacs is to implement an Elisp interpreter that allows for running any Elisp script without modifications". How do you plan to handle mutable strings, given that go strings are immutable? Also Curious what you plan to do about the encoding of "raw bytes" in multibyte strings? debuggerThe interpreter in Rune was initially structured similar to pimacs, in that we don't use taggingHow are your passing many argumentsWhen calling a Emacs subroutines.jsonI thought this was a very cool idea. I currently don't have any way to track which subroutines we are missing, or if we have the wrong number of arguments etc. Your solution makes it easy to upgrade Emacs version because you can always find new functions or ones that have changed. I eventually want have a testrunner that can run directly against GNU Emacs to compare functionality. |
(I'm reading a bit the project code and your blog posts, I'll get back to you!) |
We also have a design doc, but it is not as well organized as yours. |
Ok getting back to you! Why Go?Mainly, I find Go programming fun so I thought it was a cool project to test it out. And I also thought, re-implementing Emacs is such a massive project, that I'm going to need all the help I can get. The GC is great to have OOTB, plus the syntax of Go itself is quite easy, so re-writing all the subroutines would be (in theory) a quicker process than with a more complex language. It also has a good library ecosystem, which helps as well. I was definitely focusing more than anything on developer efficiency. The concurrency utilities are of course very handy as well, but I wasn't thinking of them so much when I started (given that concurrency in Emacs is not thaaat important after all). I read your blog post about concurrency, and the design doc! It is definitely a hard problem to nail down. Personally I went with having some things shared among goroutines, and others not:
goroutine-local:
So with this approach, each goroutine also has its own obarray. This allows binding variables like I haven't really tested this design so I don't know if it would work in practice. Your approach seems much more though-out (and probably tested as well). Have you looked into how Emacs does it? I believe it does have a per-thread struct where it stores some state, but maybe it's not using it for much. Elisp string compatibilityYes, they are immutable! One can actually cheat by using the So instead my type lispString struct {
valMut []byte
val string
size_ int
} where
To keep the struct as small as possible, I use If size is This was as small as I could get the strings without potentially breaking things inside the Go runtime itself. I bet you can do something much more compact with Rust. How would you describe the one in Rune at the moment? Regarding Emacs' custom UTF-8 variant (lets call it EU8) - I thought about it and decided that it wasn't worth the complexity. By using plain UTF-8 I can use Go's existing libraries (plus, the language uses UTF-8 for string literals as well), so everything fits in nicely. Do you know how common it is to store raw bytes within multibyte strings? I feel like it can't be too normal, right? The Emacs docs also state that:
what would be examples of "characters not unified with Unicode"? I wonder if Emacs makes use of this range itself. But in any case, I have a strong feeling that normal UTF-8 should cover the majority of use cases. But yeah, strings and encoding is a very cool topic to think about. DebuggerThat is an interesting point about the debugger. My Rust knowledge is basic so I couldn't find how you are currently managing the stack aspects in Rune. But if you say it could be a problem, then you are probably right, I didn't get around to implementing a debugger yet. For catching errors, I do think I managed to get it working though: when calling Thank you for the comment about TaggingFor this it took a veeeery easy solution, but probably very memory wasteful as well. Basically all my Elisp objects are structs that also satisfy an interface called I pass around I find the Rune approach very cool though, like in: #[defun(name = "%")]
pub(crate) fn remainder(x: i64, y: i64) -> i64 {
// TODO: Handle markers
x % y
} The fact that you can operate directly with defun (ec *execContext) modulo(x, y lispObject) (lispObject, error) {
if !integerp(x) {
return ec.wrongTypeArgument(ec.s.integerp, x)
} else if !integerp(y) {
return ec.wrongTypeArgument(ec.s.integerp, y)
}
return xInteger(x) % xInteger(y), nil
} Also using tagged pointers is clearly much more memory efficient! In my case, nil is just a Passing too many argumentsFor this I convert the Elisp list of arguments into a Go slice of This is a very basic approach though, maybe there is a better way of doing it. emacs_subroutines.jsonThanks. I also figured that I would "tie" Pimacs to a certain Emacs version - this is configured here: https://github.com/federicotdn/pimacs/blob/main/tools/extract/config.json - this file determines the contents of emacs_subroutines.json, but also the version of the Elisp files that I copy over! So then everythin is consistent. How does Rune decide which version of Emacs to target? more topicsUI/UXI also share your idea of separating the core part of the editor from the UI, like in the xi editor. However I've looked around extensively and the status of GUI libraries is not great, for Go at least. The problem is that you need an extremely flexible library - after all, you need to implement all Emacs features, which do not really follow any the guidelines of any native GUI toolkit. So then, you maybe can use something a bit more low-level, like some immediate-mode library, but then you need to implement a lot of things yourself. Text rendering in particular seems like a scary topic. I am curious as if you've had ideas on this yet. A UI for the terminal seems a bit easier to achieve, and still very useful (thankfully). Maybe that would be a good starting point for both projects. Gap buffer vs. RopeI agree that the Gap buffer is fast enough. The fact that they are easy to implement is a huge plus. But additionally, I think every design decision that makes Rune/Pimacs internally more similar to Emacs is also a good thing in the long term; it can potentially make it easier to port over more code from Emacs (even though technically how a buffer is implemented should not really matter, but in practice things tend to be more complicated). The Emacs codebase has a lot of surprises. Bytecode / source interpeterMy understanding is that currently Rune can read Elisp source code, and it can also compile it to bytecode and interpret it from there. Is that correct? Can Rune read .elc files generated by Emacs? Would that even be desireable? Project philosophyI am very suprised at how both of the projects are very similar in terms of design and purpose! It seems like we arrived to very similar ideas independently; using very different tools. |
I am going to respond in multiple comments as I get time. threadingYou seemed have arrived at a similar conclusion for what should be shared vs thread local. In Rune this is how we split it up. shared globally
All interned symbols are shared between all threads. They (currently) have an atomic field where the function pointer is stored, so functions are shared as well. values of symbols are thread local and stored in a hashmap. I think properties will be shared as well. thread local
We don't share any lisp objects between threads, because doing so makes reasoning about your program and resolving bugs much harder. We also have a local obarray for uninterned symbols. The value of symbol is first looked up in the threads variable cache, and if not found it requests the value from the "main" thread. If it is not found there, it is a void variable. As I mentioned in my post, buffers are kind of unique in that they are shared but only a single thread can have it open at any time. This is controlled by a mutex. I still have not decided how to handle buffer locals though (does each thread get a copy or does accessing a buffer local share it with that thread). |
String typeStrings in rune are defined very similarly to pimacs. rune/src/core/object/string.rs Lines 11 to 24 in 2d03e7b
We have an enum that can either be a UTF-8 string or a raw byte array. I agree with you that I don't think supporting raw bytes is worth the extra complexity.
I don't think it is that common, but Emacs has support for it. If you want to fully compatible with emacs lisp you would need to handle it though. For example: (string-to-multibyte (unibyte-string 128))
These are the raw bytes. For example if you go to a buffer and eval taggingthe
A |
UI
I will admit that I am pretty ignorant about GUI's. The Rust ecosystem does not have great options either. There are some new projects like xilem that are exploring this space. Though it seems like most of the requirements for an Emacs GUI are pretty simple. We need fast text layout and rendering in multiple panes (what Emacs would call windows). It doesn't have fancy widgets or graphics or animations. I think using something lower level (like GTK) is probably the way to go. As far as TUI's go, I am trying to avoid going down that path first. A lot of the functionality in Emacs is hamstrung by being a "TUI first" program. For example it treating Bytecode / source interpreterRune is currently to the point where the interpreter can bootstrap enough elisp to load the byte compiler (written in lisp) and then compile the functions. The bytecode VM is also implemented and is run as well. We don't load bytecode functions (.elc) directly, but it would be trivial to implement that. However given that rust is a memory safe language, I would like to have some way to verify bytecode that we load before we execute it (I have crashed Emacs many times playing around with bytecode). Project philosophyI do find it interesting that we arrived at the same approach on many things, especially around concurrency. I think it would be so cool to have the capability in Emacs. |
This has been cool to read! Thanks both! |
Amazing, thank you for taking the time to reply with so many details. I'm going to be keeping an eye on this project to see how it progresses. If some day I decide to resume learning Rust, maybe I can try submitting a PR here! |
@federicotdn I have been thinking there might be a way for the projects to collaborate. I have been wanting to create a tool to test the implementation against GNU Emacs, since that is accurate measure of correctness. I was planning on creating a tool that can extract the arguments for builtin function from Emacs and then generate inputs for them. We would than feed this into both GNU Emacs and Rune and ensure that the inputs match. Essentially property testing against Emacs. That will hopefully help flush out a bunch of issues, and anything that is found can be added to normal unit tests. This got me thinking that you already have a great start to this with your extract tool in pimacs! It can already parse the source files of GNU Emacs format them nicely in JSON. We could build on this to make a tool that read the JSON and generates the inputs and then runs it against both GNU Emacs and the other implementation. This could be used for both Rune and pimacs. What do you think? |
Yes! That sounds like a good idea. I'm curious as to how you would generate inputs for each function though, as you would need to know the type of each argument beforehand, and also have a set of expected outputs for those arguments. So basically you would need hand-written test cases for each function. And at that point, automatic extraction of all functions stops being so useful 🤔 . Unless you have something else in mind. In Pimacs I was using the JSON file to quickly test that any subroutine defined in Pimacs, with a name identical to one in Emacs, had the same set of attributes: arity, lack of return value, etc. (well actually I only got to test the arity but the rest would not be so difficult to add). (https://github.com/federicotdn/pimacs/blob/main/core/exec_context_test.go#L75) But yeah, in any case of course we can bring the tool into Rune 👍🏻 |
The idea is that you could generate random input for a function and then feed it into both Emacs and the alternative implementation and make sure they get the same output. So the expected output is whatever Emacs returns (including errors). We already know the arity of functions thanks to your script, and that would be enough for a first pass. But rune also has type annotations on the builtin functions. That could be fed into the JSON and then used to have more specific inputs. For example if we picked the I see this as a low effort way to flush out bugs, because we don't need to define manual tests or types. |
Aah ok, I see, then more or less the flow would be:
To me the fact that you are already using Emacs itself in this process makes the JSON file less valuable, since you can probably evaluate some Elisp code within Emacs that will return you the subroutine metadata anyway. For example you could do:
|
Or maybe you could do two separate things: one where you use the JSON file as a very quick arity/subroutines name check, and then a slower one using the second method described above. Because I can imagine that for ~1500 subroutines, doing multiple tests for both Rune and Emacs might get slow! |
You are right that we could just do this all from Rust. Have it parse the arity and types and then generate the input forms, run the Emacsen, and report the results. But the idea is that if you instead write out a JSON with that information you can feed that into a python program that does the generation and comparison of functions. That way Rune could have a small program to create the JSON and pimacs could have one to create the JSON. This let's both projects share the common python test framework to compare to GNU Emacs. |
Sounds good. I can start in the following weeks by bringing over the Python script to this repo and outputting the JSON somewhere. In this regard:
|
I would think that the Emacs source code as well as the actually Emacs executables should be arguments to the script. That way if a project is tracking a different version (we are currently 29.1) there are no builtin assumptions. I think is safe to ignore MS-DOS or anything else that does not seem very useful. I am hoping that as I dive into this that there will be more useless functions that I can just throw out. The easiest code to maintain is code that does not exist! As far the output directory goes, Rust usually puts things in target/ but we could create another one called output/ if we wanted. |
Hello! I have also been working on a tiny Emacs reimplementation project (https://github.com/gudzpoz/Juicemacs) and learned a lot from the design docs and implementation of rune and pimacs! stringsWhen implementing ELisp strings, however, I ended up first using the last straw: "characters not unified with Unicode"(Well, the haystack under the last straw was that I needed to implement my own regexp engine.) It was not easy to come up with an example, or, at least I didn't find any when searching in English. But following a Japanese blog, I managed to produce an example with the Basically, when decoded with Edit: just found another example right in Emacs' source code: reinventing stringsI don't use the UTF-8-like encoding as Emacs does, however. For example, nowadays, Java uses compact strings to encode latin-1 strings in bytes and use only 16-bit Compared to UTF-8-like encoding, these "transparently compacting" methods simplify things quite a lot. (Also, this makes fetching by codepoint naturally an The switch from auto extraction & codegenI have been using the For example, the initialization of the I have also thought about other ways. For example, it might be possible to let Emacs initialize itself and replace its Anyway, most of the operations in Part of the codegen: (https://github.com/gudzpoz/Juicemacs/blob/main/elisp/src/main/java/party/iroiro/juicemacs/elisp/runtime/ELispGlobals.java) ERROR.putProperty(ERROR_CONDITIONS, new ELispCons(ERROR, NIL));
ERROR.putProperty(ERROR_MESSAGE, new ELispString("error"));
QUIT.putProperty(ERROR_CONDITIONS, new ELispCons(QUIT, NIL));
QUIT.putProperty(ERROR_MESSAGE, new ELispString("Quit"));
MINIBUFFER_QUIT.putProperty(ERROR_CONDITIONS, new ELispCons(MINIBUFFER_QUIT, new ELispCons(QUIT, NIL)));
MINIBUFFER_QUIT.putProperty(ERROR_MESSAGE, new ELispString("Quit"));
USER_ERROR.putProperty(ERROR_CONDITIONS, new ELispCons(USER_ERROR, new ELispCons(ERROR, NIL)));
USER_ERROR.putProperty(ERROR_MESSAGE, new ELispString(""));
WRONG_LENGTH_ARGUMENT.putProperty(ERROR_CONDITIONS, new ELispCons(WRONG_LENGTH_ARGUMENT, new ELispCons(ERROR, NIL)));
WRONG_LENGTH_ARGUMENT.putProperty(ERROR_MESSAGE, new ELispString("Wrong length argument"));
WRONG_TYPE_ARGUMENT.putProperty(ERROR_CONDITIONS, new ELispCons(WRONG_TYPE_ARGUMENT, new ELispCons(ERROR, NIL)));
WRONG_TYPE_ARGUMENT.putProperty(ERROR_MESSAGE, new ELispString("Wrong type argument")); |
Hi @gudzpoz, When you are talking about "characters not unified with Unicode" I assume you are talking about this. Basically Emacs uses a custom encoding to handle raw bytes in text. I have not tried to tackle this issue yet, because as you said, it would require reimplementing a bunch of primitives. I am trying to see how long I can avoid it 😁 . I have never thought about using a script to generate code from Emacs source. But that would save bunch of effort for some symbols. If we could export the output into some common format (maybe JSON) then it could be used in multiple projects and they could handle transforming into the correct output code. |
On the raw bytes, Emacs uses =#x110000..#x3FFFFF= for both "characters that are not unified with Unicode and raw 8-bit bytes" and I just wanted to provide an example for the former case, although codepoints not yet unified might be much rarer that raw bytes. As for the Python scripts, I've set up a repo for it here (https://github.com/gudzpoz/emacs-extractor). Currently, it seems basically working (tested against some source files in Emacs 30) and I've generated runnable Java code with the extracted info. To extract the initialization logic, it also extracts some =#define/enum= constants as well as =DEFVAR_*= variables. I haven't tested against Emacs 29 yet, but I hope the convoluted config file it uses can be enough for adapting it... (BTW, I just watched your EmacsConf talk, and it was great! It's also very interesting to see three Emacs implementations on EmacsConf. 😃) |
Hey! I found your project some weeks ago and found it super interesting - I am doing something similar, but in Go: https://github.com/federicotdn/pimacs.
Your project is more active than mine, but I thought it would be interesting to share ideas. Go and Rust are very different languages of course but some challenges are more or less the same regardless of the language. Some of the conversations For example I've seen the following topics discussed on this repo, that I've also thought about!
I've written about some of my design decisions here: https://github.com/federicotdn/pimacs/blob/main/etc/design.md. As a part of the project, I've written a Python script called
extract.py
that creates a JSON file with information about all the subroutine decalarations in Emacs, perhaps that could be useful for Rune as well! (https://github.com/federicotdn/pimacs/blob/main/test/data/emacs_subroutines.json)I'm aware that a GitHub issue is not a great medium for these types of discussions, so please feel free to close it, as it is not really an "issue" with Rune itself. Cheers!
The text was updated successfully, but these errors were encountered: