Data Processing

⚠⚠ ⚠⚠ ⚠⚠

This wiki served in the early days of CKEditor 5 development and can be severely outdated.

Refer to the official CKEditor 5 documentation for up-to-date information.

Input and Output "Data"

As long as proper plugins are provided, CKEditor is able to accept any kind of data on input and output. By input and output, we mean the data that is loaded during editor creation and the final data produced by the editor when it is destroyed.

The following chart illustrates the data processing workflow:

The API way to set the input and retrieve the output data at runtine is by calling the editor.setData() and editor.getData() methods, relatively.

The "Data Processor"

By accepting different data formats, CKEditor needs a way to transform such input into a common format used during the editing experience. Being CKEditor a web oriented application, HTML is the format used for that.

The API that is responsible for transforming input data into HTML and from HTML back to the output data format is called "Data Processor". Those are usually plugins that specialize into a data format, like markdown or wiki-markup.

On input, the Data Processor expects a string in data format and converts it into a DOM Document. On output, it does exactly the opposite, as expected.

HTML Produced by the Data Processor

There are no expectations for the Data Processor to filter the DOM produced by itself. It should work on the source data and do its best to produce HTML that better reflects it without caring about how the editor will use it. Still some best practices should be followed:

The document structure should have proper semantics:
- Paragraphs should be inside <p>.
- Headers and titles should be inside <h[1-6]>.
- Lists should be clean.
- Tables should be used for tabular data.
Actually, focus should be on semantics, not formatting.
Deprecated elements should be avoided.
Whenever possible, matching the default strategy used by CKEditor features should prevale. For example, the kind of element to use to represent colors in text.

In any case, the Data Processor should not be too strict when it comes to the above. It should focus on being reliable when it comes to converting from the original format to HTML. It will be then the editor job to filter and manipulate the DOM produced by the Data Processor according to its features and configurations.

What if "Data" is "HTML"?

The most common format of "data" used for input and output in CKEditor though is HTML. In such case, the role of the "HTML Data Processor" is to simply create a DOM Document out of the HTML string received on input and "stringify" it again on output.

Document (Data) Model Convertion

On load, the Data Processor role is passing to the editor the DOM Document that represents the input "data". This DOM Document serves then as the basis to load the contents into the editor for editing.

With CKEditor 5, we are introducing a new internal representation of the document which is not any more based on the DOM. The conversion from DOM Document for the internal Document Model is done by a "Converter". The Converter is coupled with editor features, which are able to translate DOM into internal data format and, based on that data, to render them properly for editing.

Security Concerns

There is the assumption that the initial data loaded into the editor IS safe.
There is the assumption that the output of the editor MAY NOT be safe. Any security filtering on it must be done by the application when receiving the data on the server side.
There must be features to protect against execution of scripts during the use of the editor. These are handled by Converters when building the Document Model.
Whenever parsing HTML string, the DomParser class should be used.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly