Skip to content

Commit

Permalink
Add support for position encoding
Browse files Browse the repository at this point in the history
  • Loading branch information
dbaeumer committed Apr 1, 2022
1 parent 1e19ebf commit f9c85d5
Show file tree
Hide file tree
Showing 2 changed files with 57 additions and 6 deletions.
55 changes: 51 additions & 4 deletions _specifications/lsp/3.17/general/initialize.md
Original file line number Diff line number Diff line change
Expand Up @@ -489,6 +489,35 @@ interface ClientCapabilities {
* @since 3.16.0
*/
markdown?: MarkdownClientCapabilities;

/**
* The position encodings supported by the client. Client and server
* have to agree on the same position encoding to ensure that offsets
* (e.g. character position in a line) are interpreted the same on both
* side.
*
* To keep the protocol backwards compatible the following applies: if
* the value 'utf-16' is missing from the array of position encodings
* server can assume that the client supports UTF-16. UTF-16 is
* therefore a mandatory encoding.
*
* If omitted it defaults to ['utf-16'].
*
* For the following standard Unicode encodings these values should be
* used:
*
* UTF-8: 'utf-8'
* UTF-16: 'utf-16'
*
* Implementation considerations: since the conversion from one encoding
* into another requires the content of the file / line the conversion
* is best done where the file is read which is usually on the server
* side.
*
* @since 3.17.0
* @proposed
*/
positionEncodings?: ('utf-16' | string)[];

This comment has been minimized.

Copy link
@michaelpj

michaelpj Apr 8, 2022

Contributor

Given that this is used in multiple places, would it make sense to define it more like CodeActionKinds, with an explicit type?

This comment has been minimized.

Copy link
@dbaeumer

dbaeumer Apr 11, 2022

Author Member

Good idea. @michaelpj interested to provide a PR?

};

/**
Expand Down Expand Up @@ -534,18 +563,21 @@ interface InitializeResult {

```typescript
/**
* Known error codes for an `InitializeError`;
* Known error codes for an `InitializeErrorCodes`;
*/
export namespace InitializeError {
export namespace InitializeErrorCodes {

/**
* If the protocol version provided by the client can't be handled by the
* server.
* If the protocol version provided by the client can't be handled by
* the server.
*
* @deprecated This initialize error got replaced by client capabilities.
* There is no version handshake in version 3.0x
*/
export const unknownProtocolVersion: 1 = 1;
}

export type InitializeErrorCodes = 1;
```

* error.data:
Expand All @@ -568,6 +600,21 @@ The server can signal the following capabilities:

```typescript
interface ServerCapabilities {

/**
* The position encoding the server picked from the encodings offered
* by the client via the client capability `general.positionEncodings`.
*
* If the client didn't provide any position encodings the only valid
* value that a server can return is 'utf-16'.
*
* If omitted it defaults to 'utf-16'.
*
* @since 3.17.0
* @proposed
*/
positionEncoding?: 'utf-16' | string;

/**
* Defines how text documents are synced. Is either a detailed structure
* defining each notification or for backwards compatibility the
Expand Down
8 changes: 6 additions & 2 deletions _specifications/lsp/3.17/types/textDocuments.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
#### <a href="#textDocuments" name="textDocuments" class="anchor"> Text Documents </a>

The current protocol is tailored for textual documents whose content can be represented as a string. There is currently no support for binary documents. A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form `a𐐀b` the character offset of the character `a` is 0, the character offset of `𐐀` is 1 and the character offset of b is 3 since `𐐀` is represented using two code units in UTF-16. To ensure that both client and server split the string into the same line representation the protocol specifies the following end-of-line sequences: '\n', '\r\n' and '\r'.
The current protocol is tailored for textual documents whose content can be represented as a string. There is currently no support for binary documents. A position inside a document (see Position definition below) is expressed as a zero-based line and character offset.

Positions are line end character agnostic. So you can not specify a position that denotes `\r|\n` or `\n|` where `|` represents the character offset.
> New in 3.17
Prior to 3.17 the offsets were always based on a UTF-16 string representation. So a string of the form `a𐐀b` the character offset of the character `a` is 0, the character offset of `𐐀` is 1 and the character offset of b is 3 since `𐐀` is represented using two code units in UTF-16. Since 3.17 clients and servers can agree on a different string encoding representation (e.g. UTF-8). The client announces it's supported encoding via the client capability [`general.positionEncodings`](#clientCapabilities). The value is an array of position encodings the client supports, with decreasing preference (e.g. the encoding at index `0` is the most preferred one). To stay backwards compatible the only mandatory encoding is UTF-16 represented via the string `utf-16`. The server can pick one of the encodings offered by the client and signals that encoding back to the client via the initialize result's property [`capabilities.positionEncoding`](#serverCapabilities). If the string value `utf-16` is missing from the client's capability `general.positionEncodings` servers can safely assume that the client supports UTF-16. If the server omits the position encoding in its initialize result the encoding defaults to the string value `utf-16`. Implementation considerations: since the conversion from one encoding into another requires the content of the file / line the conversion is best done where the file is read which is usually on the server side.

To ensure that both client and server split the string into the same line representation the protocol specifies the following end-of-line sequences: '\n', '\r\n' and '\r'. Positions are line end character agnostic. So you can not specify a position that denotes `\r|\n` or `\n|` where `|` represents the character offset.

```typescript
export const EOL: string[] = ['\n', '\r\n', '\r'];
Expand Down

0 comments on commit f9c85d5

Please sign in to comment.