Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer performance improvements #791

Closed
Tyriar opened this issue Jul 13, 2017 · 73 comments
Closed

Buffer performance improvements #791

Tyriar opened this issue Jul 13, 2017 · 73 comments
Assignees
Labels
area/performance type/plan A meta issue that consists of several sub-issues type/proposal A proposal that needs some discussion before proceeding
Milestone

Comments

@Tyriar
Copy link
Member

Tyriar commented Jul 13, 2017

Problem

Memory

Right now our buffer is taking up too much memory, particularly for an application that launches multiple terminals with large scrollbacks set. For example, the demo using a 160x24 terminal with 5000 scrollback filled takes around 34mb memory (see microsoft/vscode#29840 (comment)), remember that's just a single terminal and 1080p monitors would likely use wider terminals. Also, in order to support truecolor (#484), each character will need to store 2 additional number types which will almost double the current memory consumption of the buffer.

Slow fetching of a row's text

There is the other problem of needing to fetch the actual text of a line swiftly. The reason this is slow is due to the way that the data is laid out; a line contains an array of characters, each having a single character string. So we will construct the string and then it will be up for garbage collection immediately afterwards. Previously we didn't need to do this at all because the text is pulled from the line buffer (in order) and rendered to the DOM. However, this is becoming an increasingly useful thing to do though as we improve xterm.js further, features like the selection and links both pull this data. Again using the 160x24/5000 scrollback example, it takes 30-60ms to copy the entire buffer on a Mid-2014 Macbook Pro.

Supporting the future

Another potential problem in the future is when we look at introducing some view model which may need to duplicate some or all of the data in the buffer, this sort of thing will be needed to implement reflow (#622) properly (#644 (comment)) and maybe also needed to properly support screen readers (#731). It would certainly be good to have some wiggle room when it comes to memory.

This discussion started in #484, this goes into more detail and proposes some additional solution.

I'm leaning towards solution 3 and moving towards solution 5 if there is time and it shows a marked improvement. Would love any feedback! /cc @jerch, @mofux, @rauchg, @parisk

1. Simple solution

This is basically what we're doing now, just with truecolor fg and bg added.

// [0]: charIndex
// [1]: width
// [2]: attributes
// [3]: truecolor bg
// [4]: truecolor fg
type CharData = [string, number, number, number, number];

type LineData = CharData[];

Pros

  • Very simple

Cons

  • Too much memory consumed, would nearly double our current memory usage which is already too high.

2. Pull text out of CharData

This would store the string against the line rather than the line, this would probably see very large gains in selection and linkifying and would be more useful as time goes on having quick access to a line's entire string.

interface ILineData {
  // This would provide fast access to the entire line which is becoming more
  // and more important as time goes on (selection and links need to construct
  // this currently). This would need to reconstruct text whenever charData
  // changes though. We cannot lazily evaluate text due to the chars not being
  // stored in CharData
  text: string;
  charData: CharData[];
}

// [0]: charIndex
// [1]: attributes
// [2]: truecolor bg
// [3]: truecolor fg
type CharData = Int32Array;

Pros

  • No need to reconstruct the line whenever we need it.
  • Lower memory than today due to the use of an Int32Array

Cons

  • Slow to update individual characters, the entire string would need to be regenerated for single character changes.

3. Store attributes in ranges

Pulling the attributes out and associating them with a range. Since there can never be overlapping attributes, this can be laid out sequentially.

type LineData = CharData[]

// [0]: The character
// [1]: The width
type CharData = [string, number];

class CharAttributes {
  public readonly _start: [number, number];
  public readonly _end: [number, number];
  private _data: Int32Array;

  // Getters pull data from _data (woo encapsulation!)
  public get flags(): number;
  public get truecolorBg(): number;
  public get truecolorFg(): number;
}

class Buffer extends CircularList<LineData> {
  // Sorted list since items are almost always pushed to end
  private _attributes: CharAttributes[];

  public getAttributesForRows(start: number, end: number): CharAttributes[] {
    // Binary search _attributes and return all visible CharAttributes to be
    // applied by the renderer
  }
}

Pros

  • Lower memory than today even though we're also storing truecolor data
  • Can optimize application of attributes, rather than checking every single character's attribute and diffing it to the one before
  • Encapsulates the complexity of storing the data inside an array (.flags instead of [0])

Cons

  • Changing attributes of a range of characters inside another range is more complex

4. Put attributes in a cache

The idea here is to leverage the fact that there generally aren't that many styles in any one terminal session, so we should not create as few as necessary and reuse them.

// [0]: charIndex
// [1]: width
type CharData = [string, number, CharAttributes];

type LineData = CharData[];

class CharAttributes {
  private _data: Int32Array;

  // Getters pull data from _data (woo encapsulation!)
  public get flags(): number;
  public get truecolorBg(): number;
  public get truecolorFg(): number;
}

interface ICharAttributeCache {
  // Never construct duplicate CharAttributes, figuring how the best way to
  // access both in the best and worst case is the tricky part here
  getAttributes(flags: number, fg: number, bg: number): CharAttributes;
}

Pros

  • Similar memory usage to today even though we're also storing truecolor data
  • Encapsulates the complexity of storing the data inside an array (.flags instead of [0])

Cons

  • Less memory savings than the ranges approach

5. Hybrid of 3 & 4

type LineData = CharData[]

// [0]: The character
// [1]: The width
type CharData = [string, number];

class CharAttributes {
  private _data: Int32Array;

  // Getters pull data from _data (woo encapsulation!)
  public get flags(): number;
  public get truecolorBg(): number;
  public get truecolorFg(): number;
}

interface CharAttributeEntry {
  attributes: CharAttributes,
  start: [number, number],
  end: [number, number]
}

class Buffer extends CircularList<LineData> {
  // Sorted list since items are almost always pushed to end
  private _attributes: CharAttributeEntry[];
  private _attributeCache: ICharAttributeCache;

  public getAttributesForRows(start: number, end: number): CharAttributeEntry[] {
    // Binary search _attributes and return all visible CharAttributeEntry's to
    // be applied by the renderer
  }
}

interface ICharAttributeCache {
  // Never construct duplicate CharAttributes, figuring how the best way to
  // access both in the best and worst case is the tricky part here
  getAttributes(flags: number, fg: number, bg: number): CharAttributes;
}

Pros

  • Protentially the fastest and most memory efficient
  • Very memory efficient when the buffer contains many blocks with styles but only from a few styles (the common case)
  • Encapsulates the complexity of storing the data inside an array (.flags instead of [0])

Cons

  • More complex than the other solutions, it may not be worth including the cache if we already keep a single CharAttributes per block?
  • Extra overhead in CharAttributeEntry object
  • Changing attributes of a range of characters inside another range is more complex

6. Hybrid of 2 & 3

This takes the solution of 3 but also adds in a lazily evaluates text string for fast access to the line text. Since we're also storing the characters in CharData we can lazily evaluate it.

type LineData = {
  text: string,
  CharData[]
}

// [0]: The character
// [1]: The width
type CharData = [string, number];

class CharAttributes {
  public readonly _start: [number, number];
  public readonly _end: [number, number];
  private _data: Int32Array;

  // Getters pull data from _data (woo encapsulation!)
  public get flags(): number;
  public get truecolorBg(): number;
  public get truecolorFg(): number;
}

class Buffer extends CircularList<LineData> {
  // Sorted list since items are almost always pushed to end
  private _attributes: CharAttributes[];

  public getAttributesForRows(start: number, end: number): CharAttributes[] {
    // Binary search _attributes and return all visible CharAttributes to be
    // applied by the renderer
  }

  // If we construct the line, hang onto it
  public getLineText(line: number): string;
}

Pros

  • Lower memory than today even though we're also storing truecolor data
  • Can optimize application of attributes, rather than checking every single character's attribute and diffing it to the one before
  • Encapsulates the complexity of storing the data inside an array (.flags instead of [0])
  • Faster access to the actual line string

Cons

  • Extra memory due to hanging onto line strings
  • Changing attributes of a range of characters inside another range is more complex

Solutions that won't work

  • Storing the string as an int inside an Int32Array will not work as it takes far to long to convert the int back to a character.
@Tyriar Tyriar added type/enhancement Features or improvements to existing features type/proposal A proposal that needs some discussion before proceeding labels Jul 13, 2017
@Tyriar Tyriar self-assigned this Jul 13, 2017
@mofux
Copy link
Contributor

mofux commented Jul 13, 2017 via email

@parisk
Copy link
Contributor

parisk commented Jul 13, 2017

Great proposal. I agree with that 3. is the best way to go for now as it lets us save memory while supporting true color as well.

If we reach there and things continue to go well, we can then optimize as proposed in 5. or in any other way that comes in our minds at that time and makes sense.

3. is great 👍.

@parisk
Copy link
Contributor

parisk commented Jul 13, 2017

@mofux, while there is definitely a use case for using disk-storage-backed techniques to reduce the memory footprint, this can degrade the user experience of the library in browser environments that ask the user for permission for using disk storage.

@mofux
Copy link
Contributor

mofux commented Jul 13, 2017

Regarding Supporting the future:
The more I think about it, the more the idea of having a WebWorker that does all the heavy work of parsing the tty data, maintaining the line buffers, matching links, matching search tokens and such appeals to me. Basically doing the heavy work in a separate background thread without blocking the ui. But I think this should be part of a separate discussion maybe towards a 4.0 release 😉

@AndrienkoAleksandr
Copy link
Contributor

AndrienkoAleksandr commented Jul 13, 2017

+100 about WebWorker in the future, but I think we need change list versions of browsers which we are supporting because not all off them can use it...

@Tyriar
Copy link
Member Author

Tyriar commented Jul 13, 2017

When I say Int32Array, this will be a regular array if it is not supported by the environment.

@mofux good thinking with WebWorker in the future 👍

@AndrienkoAleksandr yeah if we wanted to use WebWorker we would need to still support the alternative as well via feature detection.

@jerch
Copy link
Member

jerch commented Jul 13, 2017

Wow nice list :)

I also tend to lean towards 3. since it promises a big cut in memory consumption for over 90% of the typical terminal usage. Imho memory optimisation should be the main goal at this stage. Further optimization for specific use cases might be applicable on top of this (what comes to my mind: "canvas like apps" like ncurses and such will use tons of single cell updates and kinda degrade the [start, end] list over time).

@AndrienkoAleksandr yeah I like the webworker idea too since it could lift some burden from the mainthread. Problem here is (beside the fact that it might not be supported by all wanted target systems) is the some - the JS part is not such a big deal anymore with all the optimizations, xterm.js has seen over the time. Real issue performance-wise is the layouting/rendering of the browser...

@mofux The paging thing to some "foreign memory" is a good idea, though it should be part of some higher abstraction and not of the "gimme an interactive terminal widget thing" that xterm.js is. This could be achieved by an addon imho.

Offtopic: Did some tests with arrays vs. typedarrays vs. asm.js. All I can say - OMG, it is like 1 : 1,5 : 10 for simple variable loads and sets (on FF even more). If pure JS speed really starts to hurt, "use asm" might be there for the rescue. But I would see this as a last resort since it would imply fundamental changes. And webassembly is not yet ready to ship.

@Tyriar
Copy link
Member Author

Tyriar commented Jul 13, 2017

Offtopic: Did some tests with arrays vs. typedarrays vs. asm.js. All I can say - OMG, it is like 1 : 1,5 : 10 for simple variable loads and sets (on FF even more)

@jerch to clarify, is that arrays vs typedarrays is 1:1 to 1:5?

@jerch
Copy link
Member

jerch commented Jul 13, 2017

Woops nice catch with the comma - i meant 10:15:100 speed wise. But only on FF typed arrays were slightly faster than normal arrays. asm is at least 10 times faster than js arrays on all browsers - tested with FF, webkit (Safari), blink/V8 (Chrome, Opera).

@Tyriar
Copy link
Member Author

Tyriar commented Jul 13, 2017

@jerch cool, a 50% speed up from typedarrays in addition to better memory would definitely be worth investing in for now.

Tyriar added a commit to Tyriar/xterm.js that referenced this issue Jul 15, 2017
Very fragile right now, only works for basic IO

Part of xtermjs#791
@jerch
Copy link
Member

jerch commented Jul 15, 2017

Idea for memory saving - maybe we could get rid of the width for every character. Gonna try to implement a less expensive wcwidth version.

@Tyriar
Copy link
Member Author

Tyriar commented Jul 15, 2017

@jerch we need to access it quite a bit, and we can't lazy load it or anything because when reflow comes we will need the width of every character in the buffer. Even if it was fast we might still want to keep it around.

Might be better to make it optional, assuming 1 if it's not specified:

type CharData = [string, number?]; // not sure if this is valid syntax

[
  // 'a'
  ['a'],
  // '文'
  ['文', 2],
  // after wide
  ['', 0],
  ...
]

@jerch
Copy link
Member

jerch commented Jul 16, 2017

@Tyriar Yeah - well since Ive already written it, please have a look at it in PR #798
Speedup is 10 to 15 times on my computer for the cost of 16k bytes for the lookup table. Maybe a combination of both is possible if still needed.

@Tyriar
Copy link
Member Author

Tyriar commented Jul 21, 2017

Some more flags we'll support in the future: #580

@Tyriar Tyriar mentioned this issue Aug 3, 2017
13 tasks
@parisk parisk added this to the 3.0.0 milestone Aug 3, 2017
Tyriar added a commit to Tyriar/xterm.js that referenced this issue Aug 7, 2017
Tyriar added a commit to Tyriar/xterm.js that referenced this issue Aug 7, 2017
Tyriar added a commit to Tyriar/xterm.js that referenced this issue Aug 7, 2017
@Tyriar
Copy link
Member Author

Tyriar commented Aug 7, 2017

Another thought: Only the bottom portion of the terminal (Terminal.ybase to Terminal.ybase + Terminal.rows) is dynamic. The scrollback which makes up the bulk of the data is completely static, perhaps we can leverage this. I didn't know this until recently, but even things like delete lines (DL, CSI Ps M) don't bring the scrollback back down but rather insert another line. Similarly, scroll up (SU, CSI Ps S) deletes the item at Terminal.scrollTop and inserts an item at Terminal.scrollBottom.

Managing the bottom dynamic portion of the terminal independently and pushing to scrollback when the line is pushed out could lead to some significant gains. For example, the bottom portion could be more verbose to favor modifying of attributes, faster access, etc. whereas the scrollback can be more of an archival format as proposed in the above.

@Tyriar
Copy link
Member Author

Tyriar commented Aug 8, 2017

Another thought: it's probably a better idea to restrict CharAttributeEntry to rows as that's how most applications seem to work. Also if the terminal is resized then "blank" padding is added to the right which doesn't share the same styles.

eg:

screen shot 2017-08-07 at 8 51 52 pm

To the right of the red/green diffs are unstyled "blank" cells.

Tyriar added a commit to Tyriar/xterm.js that referenced this issue Aug 9, 2017
@Tyriar Tyriar mentioned this issue Aug 21, 2017
10 tasks
@jerch
Copy link
Member

jerch commented Dec 17, 2018

Regarding a possible UTF8 input encoding and the internal buffer layout I did a rough test. To rule out the much higher impact of the canvas renderer on the total runtime I did it with the upcoming webgl renderer. With my ls -lR /usr/lib benchmark I get the following results:

The playground branch does an early conversion from UTF8 to UTF32 before doing the parsing and storing (conversion adds ~ 30 ms). The speedup is mainly gained by the 2 hot functions during input flow, EscapeSequenceParser.parse (120 ms vs. 35 ms) and InputHandler.print (350 ms vs. 75 ms). Both benefit alot from the typed array switch by saving .charCodeAt calls.
I also compared these results with an UTF16 intermediate typed array - EscapeSequenceParser.parse is slightly faster (~ 25 ms) but InputHandler.print falls behind due to the needed surrogate pairing and codepoint lookup in wcwidth (120 ms).
Also note that I am already at the limit the system can provide the ls data (i7 with SSD) - the gained speedup adds idle time instead of making the run any faster.

Summary:
Imho the fastest input handling we can get is a mixture of UTF8 transport + UTF32 for buffer representation. While the UTF8 transport has the best byte packrate for typical terminal input and removes nonsense conversions from pty through several layers of buffers up to Terminal.write, the UTF32 based buffer can store the data pretty fast. The latter comes with a slightly higher memory footprint than UTF16 while UTF16 is slightly slower due to more complicated char handling with more indirections.

Conclusion:
We should go with the UTF32 based buffer layout for now. We should also consider switching to UTF8 input encoding, but this still needs some more thinking about the API changes and implications for integrators (seems electron's ipc mechanism cannot handle binary data without BASE64 encoding and JSON wrapping, which would counteract the perf efforts).

@jerch
Copy link
Member

jerch commented Dec 18, 2018

Buffer layout for the upcoming true color support:

Currently the typed array based buffer layout is the following (one cell):

|    uint32_t    |    uint32_t    |    uint32_t    |
|      attrs     |    codepoint   |     wcwidth    |

where attrs contains all needed flags + 9 bit based FG and BG colors. codepoint uses 21 bits (max. is 0x10FFFF for UTF32) + 1 bit to indicate combining chars and wcwidth 2 bits (ranges from 0-2).

Idea is to rearrange the bits to a better packrate to make room for the additional RGB values:

  • put wcwidth into unused high bits of codepoint
  • split attrs into a FG and BG group with 32 bit, distribute flags into unused bits
|             uint32_t             |        uint32_t         |        uint32_t         |
|              content             |            FG           |            BG           |
| comb(1) wcwidth(2) codepoint(21) | flags(8) R(8) G(8) B(8) | flags(8) R(8) G(8) B(8) |

Proside of this approach is the relatively cheap access to every value by one index acess and max. 2 bit operations (and/or + shift).

The memory footprint is stable to the current variant, but still quite high with 12 bytes per cell. This could be further optimized by sacrificing some runtime with switching to UTF16 and an attr indirection:

|        uint16_t        |              uint16_t               |
|    BMP codepoint(16)   | comb(1) wcwidth(2) attr pointer(13) |

Now we are down to 4 bytes per cell + some room for the attrs. Now the attrs could be recycled for other cells, too. Yay mission accomplished! - Ehm, one second...

Comparing the memory footprint the second approach clearly wins. Not so for the runtime, there are three major factors that increase the runtime alot:

  • attr indirection
    The attr pointer needs one additional memory lookup into another data container.
  • attr matching
    To really save room with the second approach the given attr would have to be matched against already saved attrs. This is a cumbersome action, a direct approach by simply looking through all existing values is in O(n) for n attrs saved, my RB tree experiments ended in almost no memory benefit while still being in O(log n), compared to the index access in the 32 bit approach with O(1). Plus a tree has a worse runtime for few elements saved (pays off somewhat around >100 entries with my RB tree impl).
  • UTF16 surrogate pairing
    With a 16 bit typed array we have to degrade to UTF16 for the codepoints, which introduced runtime penalty as well (as decribed on the comment above). Note that higher than BMP codepoints hardly occur, still the check alone whether a codepoint would form a surrogate pair adds ~ 50 ms.

The sexiness of the second approach is the additional memory saving. Therefore I tested it with the playground branch (see comment above) with a modified BufferLine implementation:

grafik

Yeah, we are kinda back to where we started from before changing to UTF8 + typed arrays in the parser. Memory usage dropped from ~ 1.5 MB to ~ 0.7 MB though (demo app with 87 cells and 1000 lines scrollback).

From here on its a matter of saving memory vs. speed. Since we already saved alot memory by switching from js arrays to typed arrays (dropped from ~ 5.6 MB to ~ 1.5 MB for the C++ heap, cutting off the toxic JS Heap behavior and GC) I think we should go here with the speedier variant. Once memory usage gets a pressing issue again we still can switch over to a more compact buffer layout as described in the second approach here.

@mofux
Copy link
Contributor

mofux commented Dec 18, 2018

I agree, let's optimise for speed as long as memory consumption is not a concern. I'd also like to avoid the indirection as far as possible because it makes the code harder to read and maintain. We already have quite a lot of concepts and tweaks in our codebase that make it hard for people (including me 😅) to follow the code flow - and bringing in more of these should always be justified by a very good reason. IMO saving another megabyte of memory doesn't justify it.

Nevertheless, I'm really enjoying reading and learning from your exercises, thanks for sharing it in such a great detail!

@jerch
Copy link
Member

jerch commented Dec 18, 2018

@mofux Yeah thats true - code complexity is much higher (UTF16 surrogate read ahead, intermediate codepoint calcs, tree container with ref counting on attr entries).
And since the 32 bit layout is mostly flat memory (only combining chars need indirection), there are more optimizations possible (also part of #1811, not yet tested for the renderer).

@PerBothner
Copy link
Contributor

There is one big advantage of indirecting to an attr object: It is much more extensible. You can add annotations, glyphs, or custom painting rules. You can store link information in a possibly-cleaner and more efficient way. Perhaps define an ICellPainter interface that knows how to render its cell, and that you can also hang custom properties on.

One idea is to use two arrays per BufferLine: a Uint32Array, and ICellPainter array, with one element each for each cell. The current ICellPainter is a property of the parser state, and so you just reuse the same ICellPainter as long as the color/attribute state doesn't change. If you need to add special properties to a cell, you first clone the ICellPainter (if it might be shared).

You can pre-allocate ICellPainter for the most common color/attribute combinations - at the very least have a unique object corresponding to the default colors/attributes.

Style changes (such as changing default foreground/background colors) can be implemented by just updating the corresponding ICellPainter instance(s), without having to update each cell.

There are possible optimizations: For example use different ICellPainter instances for single-width and double-width characters (or zero-width characters). (That saves 2 bits in each Uint32Array element.) There are 11 available attribute bits in Uint32Array (more if we optimize for BMP characters). These can be used to encode the most common/useful color/attribute combinations, which can be used to index the most common ICellPainter instances. If so, the ICellPainter array can be allocated lazily - i.e. only if some cell in the line requires a "less-common" ICellPainter.

One could also remove the _combined array for non-BMP characters, and store those in the ICellPainter. (That requires a unique ICellPainter for each non-BMP character, so there is a tradeoff here.)

@jerch
Copy link
Member

jerch commented Dec 19, 2018

@PerBothner Yeah an indirection is more versatile and thus better suited for uncommon extras. But since they are uncommon Id like not to optimize for them in the first place.

Few notes on what I've tried in several testbeds:

  • cell string content
    Coming myself from C++ I tried to look at the problem as I would in C++, so I started with pointers for the content. This was a simple string pointer, but pointing most of the time to a single char string. What a waste. My first optimization therefore was to get rid of the string abstraction by directly saving the codepoint instead of the address (much easier in C/C++ than in JS). This almost doubled the read/write access while saving 12 bytes per cell (8 bytes pointer + 4 bytes on string, 64bit with 32bit wchar_t). Sidenote - half of the speed boost here is cache related (cache misses due to random string locations). This became clear with my workaround for combining cell content - a chunk of memory I indexed into when codepoint had the combined bit set (access was faster here due to better cache locality, tested with valgrind). Carried over to JS the speed boost was not that great due to the needed string to number conversion (still faster though), but the mem saving was even greater (guess due to some additional management room for JS types). Problem was the global StringStorage for the combined stuff with the explicit memory management, a big antipattern in JS. A quickfix for that was the _combined object, which delegates the cleanup to the GC. Its still a subject to change and btw is meant to store arbitrary cell related string content (did this with graphemes in mind, but we will not see them soon as they are not supported by any backend). So this is the place to store additional string content on a cell by cell basis.
  • attrs
    With the attrs I started "think big" - with a global AttributeStorage for all attrs ever used in all terminal instances (see https://github.com/jerch/xterm.js/tree/AttributeStorage). Memory wise this worked out pretty good, mainly because ppl use only a small set of attrs even with true color support. Performance was not so good - mainly due ref counting (every cell had to peek into this foreign memory twice) and attr matching. And when I tried to adopt the ref thing to JS it felt just wrong - the point I pressed the "STOP" button. In between it turned out that we already saved tons of memory and GC calls by switching to typed array, thus the slightly more costly flat memory layout can pay off its speed advantage here.
    What I tested yday (last comment) was a second typed array on line level for the attrs with the tree from https://github.com/jerch/xterm.js/tree/AttributeStorage for the matching (pretty much like your ICellPainter idea). Well the results are not promising, therefore I lean towards the flat 32 bit layout for now.

Now this flat 32 bit layout turns out to be optimized for the common stuff and uncommon extras are not possible with it. True. Well we still have markers (not used to them so I cannot tell right now what they are capable of), and yepp - there are still free bits in the buffer (which is a good thing for future needs, e.g. we could use them as flags for special treatment and such).

Tbh for me its a pity that the 16 bit layout with attrs storage performs that bad, halving the memory usage is still a big deal (esp. when ppl start to use scroll lines >10k), but the runtime penalty and the code complexity outweight the higher mem needs atm imho.

Can you elaborate on the ICellPainter idea? Maybe I missed some crucial feature so far.

@PerBothner
Copy link
Contributor

My goal for DomTerm was to enable and encourage richer interaction just what is enabled by a traditional terminal emulator. Using web technologies enables many interesting things, so it would be a shame to just focus on being a fast traditional terminal emulator. Especially since many use cases for xterm.js (such as REPLs for IDEs) can really benefit from going beyond simple text. Xterm.js does well on the speed side (is anyone complaining about speed?), but it does not do so well on features (people are complaining about missing truecolor and embedded graphics, for example). I think it may be worthwhile to focus a bit more on flexibility and slightly less on performance.

"Can you elaborate on the ICellPainter idea?"

In general, ICellPainter encapsulates all the per-cell data except the character code/value, which comes from the Uint32Array. That is for "normal" character cells - for embedded images and other "boxes" the character code/value may not make sense.

interface ICellPainter {
    drawOnCanvas(ctx: CanvasRenderingContext2D, code: number, x: number, y: number);
    // transitional - to avoid allocating IGlyphIdentifier we should replace
    //  uses by pair of ICellPainter and code.  Also, a painter may do custom rendering,
    // such that there is no 'code' or IGlyphIdentifier.
    asGlyph(code: number): IGlyphIdentifier;
    width(): number; // in pixels for flexibility?
    height(): number;
    clone(): ICellPainter;
}

Mapping a cell to ICellPainter can be done various ways. The obvious is for each BufferLine to have a ICellPainter array, but that requires an 8-byte pointer (at least) per cell. One possibility is to combine the _combined array with the ICellPainter array: If the IS_COMBINED_BIT_MASK is set, then the ICellPainter also includes the combined string. Another possible optimization is to use the available bits in the Uint32Array as an index into an array: That adds some extra complication and indirection, but saves space.

@mofux
Copy link
Contributor

mofux commented Dec 20, 2018

I'd like to encourage us to check if we can do it the way monaco-editor does it (I think they found a really smart and performant way). Instead of storing such information in the buffer, they allow you to create decorations. You create a decoration for a row / column range and it will stick to that range:

// decorations are buffer-dependant (we need to know which buffer to decorate)
const decoration = buffer.createDecoration({
  type: 'link',
  data: 'https://www.google.com',
  range: { startRow: 2, startColumn: 5, endRow: 2, endColumn: 25 }
});

Later on a renderer could pick up those decorations and draw them.

Please check out this small example that shows how the monaco-editor api looks like:
https://microsoft.github.io/monaco-editor/playground.html#interacting-with-the-editor-line-and-inline-decorations

For things like rendering pictures inside the terminal monaco uses a concept of view zones that can be seen (among other concepts) in an example here:
https://microsoft.github.io/monaco-editor/playground.html#interacting-with-the-editor-listening-to-mouse-events

@jerch
Copy link
Member

jerch commented Dec 20, 2018

@PerBothner Thx for clarification and the sketchup. A few notes on that.

We eventually plan to move the input chain + buffer into a webworker in the future. Thus the buffer is meant to operate on an abstract level and we cannot use any render/representation related stuff there yet like pixel metrics or any DOM nodes. I see your needs for this due to DomTerm being highly customizable, but I think we should do that with an enhanced internal marker API and can learn here from monaco/vscode (thx for th pointers @mofux).
I really would like to keep the core buffer clean of uncommon things, maybe we should discuss possible marker strategies with a new issue?

I am still not satisfied with the outcome of the 16 bit layout test results. Since a final decision is not yet pressing (we wont see any of this before 3.11), I gonna keep testing it with a few changes (its still the more intruiging solution for me than the 32 bit variant).

@Tyriar
Copy link
Member Author

Tyriar commented Dec 21, 2018

|             uint32_t             |        uint32_t         |        uint32_t         |
|              content             |            FG           |            BG           |
| comb(1) wcwidth(2) codepoint(21) | flags(8) R(8) G(8) B(8) | flags(8) R(8) G(8) B(8) |

I also think we should go with something close to this to start, we can explore other options later but this will probably be the easiest to get up and running. Attribute indirection definitely has promise IMO as there typically aren't that many distinct attributes in a terminal session.

I'd like to encourage us to check if we can do it the way monaco-editor does it (I think they found a really smart and performant way). Instead of storing such information in the buffer, they allow you to create decorations. You create a decoration for a row / column range and it will stick to that range:

Something like this is where I'd like to see things go. One idea I had along these lines was to allow embedders to attach DOM elements to ranges to enable custom things to be drawn. There are 3 things I can think of at the moment that I'd like to accomplish with this:

  • Draw link underlines this way (will simplify how they are drawn significantly)
  • Allow markers on rows, like a * or something
  • Allow rows to "flash" to indicate something happened

All of these could be achieved with an overlay and it's a pretty approachable type of API (exposing a DOM node) and can work regardless of renderer type.

I'm not sure we want to get into the business of allowing embedders to change how background and foreground colors are drawn.


@jerch I'll put this on the 3.11.0 milestone as I consider this issue finished when we remove the JS array implementation which is planned for then. #1796 is also planned to be merged then, but this issue was always meant to be about improving the buffer's memory layout.

Also, a lot of this later discussion would probably be better had over at #484 and #1852 (created as there wasn't a decorations issue).

@Tyriar Tyriar added this to the 3.11.0 milestone Dec 21, 2018
@jerch jerch mentioned this issue Jan 3, 2019
@Tyriar Tyriar modified the milestones: 3.11.0, 3.12.0 Feb 1, 2019
@Tyriar Tyriar modified the milestones: 3.12.0, 3.13.0 Mar 8, 2019
@jerch
Copy link
Member

jerch commented Apr 25, 2019

@Tyriar Woot - finally closed 😅

@Tyriar
Copy link
Member Author

Tyriar commented Apr 25, 2019

🎉 🕺 🍾

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance type/plan A meta issue that consists of several sub-issues type/proposal A proposal that needs some discussion before proceeding
Projects
None yet
Development

No branches or pull requests

7 participants