Add support for position encoding

microsoft · Apr 1, 2022 · f9c85d5 · michaelpj · Apr 8, 2022 · dbaeumer
1 parent 1e19ebf
commit f9c85d5
Show file tree

Hide file tree

Showing 2 changed files with 57 additions and 6 deletions.
diff --git a/_specifications/lsp/3.17/general/initialize.md b/_specifications/lsp/3.17/general/initialize.md
@@ -489,6 +489,35 @@ interface ClientCapabilities {
 		 * @since 3.16.0
 		 */
 		markdown?: MarkdownClientCapabilities;
+
+		/**
+		 * The position encodings supported by the client. Client and server
+		 * have to agree on the same position encoding to ensure that offsets
+		 * (e.g. character position in a line) are interpreted the same on both
+		 * side.
+		 *
+		 * To keep the protocol backwards compatible the following applies: if
+		 * the value 'utf-16' is missing from the array of position encodings
+		 * server can assume that the client supports UTF-16. UTF-16 is
+		 * therefore a mandatory encoding.
+		 *
+		 * If omitted it defaults to ['utf-16'].
+		 *
+		 * For the following standard Unicode encodings these values should be
+		 * used:
+		 *
+		 * UTF-8: 'utf-8'
+		 * UTF-16: 'utf-16'
+		 *
+		 * Implementation considerations: since the conversion from one encoding
+		 * into another requires the content of the file / line the conversion
+		 * is best done where the file is read which is usually on the server
+		 * side.
+		 *
+		 * @since 3.17.0
+		 * @proposed
+		 */
+		positionEncodings?: ('utf-16' | string)[];
 	};
 
 	/**
@@ -534,18 +563,21 @@ interface InitializeResult {
 
 ```typescript
 /**
- * Known error codes for an `InitializeError`;
+ * Known error codes for an `InitializeErrorCodes`;
  */
-export namespace InitializeError {
+export namespace InitializeErrorCodes {
+
 	/**
-	 * If the protocol version provided by the client can't be handled by the
-	 * server.
+	 * If the protocol version provided by the client can't be handled by
+	 * the server.
 	 *
 	 * @deprecated This initialize error got replaced by client capabilities.
 	 * There is no version handshake in version 3.0x
 	 */
 	export const unknownProtocolVersion: 1 = 1;
 }
+
+export type InitializeErrorCodes = 1;
 ```
 
 * error.data:
@@ -568,6 +600,21 @@ The server can signal the following capabilities:
 
 ```typescript
 interface ServerCapabilities {
+
+	/**
+	 * The position encoding the server picked from the encodings offered
+	 * by the client via the client capability `general.positionEncodings`.
+	 *
+	 * If the client didn't provide any position encodings the only valid
+	 * value that a server can return is 'utf-16'.
+	 *
+	 * If omitted it defaults to 'utf-16'.
+	 *
+	 * @since 3.17.0
+	 * @proposed
+	 */
+	positionEncoding?: 'utf-16' | string;
+
 	/**
 	 * Defines how text documents are synced. Is either a detailed structure
 	 * defining each notification or for backwards compatibility the

diff --git a/_specifications/lsp/3.17/types/textDocuments.md b/_specifications/lsp/3.17/types/textDocuments.md
@@ -1,8 +1,12 @@
 #### <a href="#textDocuments" name="textDocuments" class="anchor"> Text Documents </a>
 
-The current protocol is tailored for textual documents whose content can be represented as a string. There is currently no support for binary documents. A position inside a document (see Position definition below) is expressed as a zero-based line and character offset. The offsets are based on a UTF-16 string representation. So a string of the form `a𐐀b` the character offset of the character `a` is 0, the character offset of `𐐀` is 1 and the character offset of b is 3 since `𐐀` is represented using two code units in UTF-16. To ensure that both client and server split the string into the same line representation the protocol specifies the following end-of-line sequences: '\n', '\r\n' and '\r'.
+The current protocol is tailored for textual documents whose content can be represented as a string. There is currently no support for binary documents. A position inside a document (see Position definition below) is expressed as a zero-based line and character offset.
 
-Positions are line end character agnostic. So you can not specify a position that denotes `\r|\n` or `\n|` where `|` represents the character offset.
+> New in 3.17
+
+Prior to 3.17 the offsets were always based on a UTF-16 string representation. So a string of the form `a𐐀b` the character offset of the character `a` is 0, the character offset of `𐐀` is 1 and the character offset of b is 3 since `𐐀` is represented using two code units in UTF-16. Since 3.17 clients and servers can agree on a different string encoding representation (e.g. UTF-8). The client announces it's supported encoding via the client capability [`general.positionEncodings`](#clientCapabilities). The value is an array of position encodings the client supports, with decreasing preference (e.g. the encoding at index `0` is the most preferred one). To stay backwards compatible the only mandatory encoding is UTF-16 represented via the string `utf-16`. The server can pick one of the encodings offered by the client and signals that encoding back to the client via the initialize result's property [`capabilities.positionEncoding`](#serverCapabilities). If the string value `utf-16` is missing from the client's capability `general.positionEncodings` servers can safely assume that the client supports UTF-16. If the server omits the position encoding in its initialize result the encoding defaults to the string value `utf-16`. Implementation considerations: since the conversion from one encoding into another requires the content of the file / line the conversion is best done where the file is read which is usually on the server side.
+
+To ensure that both client and server split the string into the same line representation the protocol specifies the following end-of-line sequences: '\n', '\r\n' and '\r'. Positions are line end character agnostic. So you can not specify a position that denotes `\r|\n` or `\n|` where `|` represents the character offset.
 
 ```typescript
 export const EOL: string[] = ['\n', '\r\n', '\r'];