Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cell field to JSON output format #7664

Merged
merged 1 commit into from
Oct 13, 2023
Merged

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Sep 26, 2023

Summary

This PR adds a new cell field to the JSON output format which indicates the Notebook cell this diagnostic (and fix) belongs to. It also updates the location for the diagnostic and fixes as per the NotebookIndex. It will be used in the VSCode extension to display the diagnostic in the correct cell.

The diagnostic and edit start and end source locations are translated for the notebook as per the NotebookIndex. The end source location for an edit needs some special handling.

Edit end location

To understand this, the following context is required:

  1. Visible lines in Jupyter Notebook vs JSON array strings: The newline is part of the string in the JSON format. This means that if there are 3 visible lines in a cell where the last line is empty then the JSON would contain 2 strings in the source array, both ending with a newline:

JSON format:

[
	"# first line\n",
	"# second line\n",
]

Notebook view:

1 # first line
2 # second line
3
  1. If an edit needs to remove an entire line including the newline, then the end location would be the start of the next row.

To remove a statement in the following code:

import os

The edit would be:

start: row 1, col 1
end: row 2, col 1

Now, here's where the problem lies. The notebook index doesn't have any information for row 2 because it doesn't exists in the actual notebook. The newline was added by Ruff to concatenate the source code and it's removed before writing back. But, the edit is computed looking at that newline.

This means that while translating the end location for an edit belong to a Notebook, we need to check if both the start and end location belongs to the same cell. If not, then the end location should be the first character of the next row and if so, translate that back to the last character of the previous row. Taking the above example, the translated location for Notebook would be:

start: row 1, col 1
end: row 1, col 10

Test Plan

Add test cases for notebook output in the JSON format and update existing snapshots.

@dhruvmanila
Copy link
Member Author

dhruvmanila commented Sep 26, 2023

Current dependencies on/for this PR:

This comment was auto-generated by Graphite.

@dhruvmanila dhruvmanila force-pushed the dhruv/notebook-index-stdin branch from 4c2e6e8 to a91285c Compare September 26, 2023 06:31
@dhruvmanila dhruvmanila force-pushed the dhruv/cell-idx-json-output branch from 7cee56f to 6f67097 Compare September 26, 2023 06:32
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 26, 2023

CodSpeed Performance Report

Merging #7664 will not alter performance

Comparing dhruv/cell-idx-json-output (7aca63a) with main (1e184e6)

Summary

✅ 25 untouched benchmarks

@dhruvmanila dhruvmanila force-pushed the dhruv/cell-idx-json-output branch from 6f67097 to a5a2f5b Compare September 26, 2023 09:37
Base automatically changed from dhruv/notebook-index-stdin to main September 29, 2023 20:37
@dhruvmanila dhruvmanila force-pushed the dhruv/cell-idx-json-output branch from a5a2f5b to 5ffe5ae Compare October 10, 2023 07:59
@dhruvmanila dhruvmanila changed the base branch from main to dhruv/notebook-one-indexed October 11, 2023 17:36
@dhruvmanila dhruvmanila mentioned this pull request Oct 11, 2023
3 tasks
@dhruvmanila dhruvmanila force-pushed the dhruv/cell-idx-json-output branch from ba90763 to 98fcfbe Compare October 11, 2023 18:21
@dhruvmanila dhruvmanila changed the base branch from dhruv/notebook-one-indexed to dhruv/notebook-output-test October 12, 2023 03:59
@dhruvmanila dhruvmanila force-pushed the dhruv/cell-idx-json-output branch from 98fcfbe to e057a7b Compare October 12, 2023 03:59
@dhruvmanila dhruvmanila force-pushed the dhruv/cell-idx-json-output branch from e057a7b to 4385153 Compare October 12, 2023 04:16
@dhruvmanila dhruvmanila added the linter Related to the linter label Oct 12, 2023
@dhruvmanila dhruvmanila marked this pull request as ready for review October 12, 2023 04:17
@github-actions
Copy link
Contributor

github-actions bot commented Oct 12, 2023

PR Check Results

Ecosystem

ℹ️ ecosystem check detected changes. (+6, -6, 0 error(s))

airflow (+4, -4)

- [*] 16066 fixable with the `--fix` option (6343 hidden fixes can be enabled with the `--unsafe-fixes` option).
+ [*] 16069 fixable with the `--fix` option (6343 hidden fixes can be enabled with the `--unsafe-fixes` option).
- airflow/providers/google/cloud/hooks/bigquery.py:1580:35: PYI055 Multiple `type` members in a union. Combine them into one, e.g., `type[CopyJob | QueryJob | LoadJob | ExtractJob]`.
+ airflow/providers/google/cloud/hooks/bigquery.py:1580:35: PYI055 [*] Multiple `type` members in a union. Combine them into one, e.g., `type[CopyJob | QueryJob | LoadJob | ExtractJob]`.
- airflow/providers/google/cloud/hooks/bigquery.py:1587:14: PYI055 Multiple `type` members in a union. Combine them into one, e.g., `type[CopyJob | QueryJob | LoadJob | ExtractJob]`.
+ airflow/providers/google/cloud/hooks/bigquery.py:1587:14: PYI055 [*] Multiple `type` members in a union. Combine them into one, e.g., `type[CopyJob | QueryJob | LoadJob | ExtractJob]`.
- airflow/providers/smtp/hooks/smtp.py:103:15: PYI055 Multiple `type` members in a union. Combine them into one, e.g., `type[smtplib.SMTP_SSL | smtplib.SMTP]`.
+ airflow/providers/smtp/hooks/smtp.py:103:15: PYI055 [*] Multiple `type` members in a union. Combine them into one, e.g., `type[smtplib.SMTP_SSL | smtplib.SMTP]`.

bokeh (+2, -2)

- [*] 17800 fixable with the `--fix` option (4727 hidden fixes can be enabled with the `--unsafe-fixes` option).
+ [*] 17801 fixable with the `--fix` option (4727 hidden fixes can be enabled with the `--unsafe-fixes` option).
- src/bokeh/core/validation/decorators.py:67:13: PYI055 Multiple `type` members in a union. Combine them into one, e.g., `type[Error | Warning]`.
+ src/bokeh/core/validation/decorators.py:67:13: PYI055 [*] Multiple `type` members in a union. Combine them into one, e.g., `type[Error | Warning]`.

Rules changed: 1
Rule Changes Additions Removals
PYI055 8 4 4

crates/ruff_notebook/src/index.rs Outdated Show resolved Hide resolved
Base automatically changed from dhruv/notebook-output-test to main October 13, 2023 00:54
@dhruvmanila dhruvmanila force-pushed the dhruv/cell-idx-json-output branch from 4385153 to 7aca63a Compare October 13, 2023 00:56
@dhruvmanila dhruvmanila enabled auto-merge (squash) October 13, 2023 00:57
@dhruvmanila dhruvmanila merged commit 66179af into main Oct 13, 2023
@dhruvmanila dhruvmanila deleted the dhruv/cell-idx-json-output branch October 13, 2023 01:06
dhruvmanila added a commit to astral-sh/ruff-lsp that referenced this pull request Nov 8, 2023
## Summary

This PR adds support for Jupyter Notebook. It requires client support
for LSP 3.17 which contains the [Notebook
support](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#notebookDocument_synchronization).

### Implementation

#### Context

* `Document`: LSP type representing a text file (Python file for Ruff).
* `TextDocument`: `pygls` representation of the LSP `Document`. This is
an abstraction created from a `Document` which provides some useful
methods like getting the file path, source code, etc.
* New in 3.17: `NotebookDocument` type was added to represent a Notebook
which consists of a list of cells (`NotebookCell`). Note that these are
all LSP types coming from `lsprotocol`.
* In `pygls`, a Notebook cell is represented as a text document
(`TextDocument`).

There are methods provided by `pygls` to get the object:
* `get_text_document` - Returns a `TextDocument` which either represents
a Python file or a Notebook cell
* `get_notebook_document` - Returns a `NotebookDocument` either using
the Notebook URI or a cell URI. For cell URI, it returns the
`NotebookDocument` containing the cell.

#### Document

A new `Document` type was created to facilitate the implementation. This
represents either a Python file, a Notebook or a Notebook cell. There
are various constructor methods which should be used to create this
type:
* For a URI representing a Python file, use either `from_uri` or
`from_text_document`.
* For a URI representing a Notebook file, use either `from_uri` or
`from_notebook_document`.
* For a URI representing a Notebook cell, use either
`from_cell_or_text_uri` or `from_notebook_cell`.

#### Notebook JSON

Ruff expects the source content of a Notebook file to be in JSON format
following the [Notebook format specification] but the protocol uses it's
own abstraction and doesn't store the JSON format. This means that we
need to create a JSON string representing the Notebook from the
available information. This doesn't need all the information as Ruff
only uses the cell source and version information. So, we create a
minimal JSON string representing the Notebook document and pass it to
Ruff.

<details><summary>An example JSON string representing a Notebook
Document:</summary>
<p>

```json
{
  "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5,
  "cells": [
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "import random\nimport math"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "try:\n    print()\nexcept ValueError:\n    pass"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "import random\nimport pprint\n\nrandom.randint(10, 20)"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "foo = 1\nif foo is 1:\n    msg = f\"Invalid foo: {foo}\"\n    raise ValueError(msg)"
    }
  ]
}
```

</p>
</details> 

**We need to pass in every cell including the markdown cell to get an
accurate information like the cell number.**

For the cell document kind, the source value is a JSON string containing
just a single code cell. This is required as code actions and formatting
work at both cell and notebook level.

### Configuration

For VSCode users, the `notebook.*` configuration is used to run the
formatter or code actions on save:

```jsonc
{
  // Enable formatting the entire Notebook on save
  "notebook.formatOnSave.enabled": true,
  // Run the enabled code actions on the entire Notebook on save
  "notebook.codeActionsOnSave": {
    "source.fixAll": true,
    "source.organizeImports.ruff": true
  },
}
```

The way the above settings work in VSCode is that the editor runs the
actions in parallel for every cell. This has the illusion that it was
run on the entire Notebook. The commands defined by us (`Ruff: Organize
imports` and `Ruff: Fix all auto-fixable problems`) are run on the
entire Notebook at once. This is important because in the latter case
the `ruff` command is invoked `n` number of times where `n` is the
number of cells while for the former it's run only once.

### Commands

#### Builtin

* `Ruff: Organize Imports`: Works at Notebook level
* `Ruff: Fix all auto-fixable problems`: Works at Notebook level

#### VSCode specifc

* `Format Cell`: Formats the current cell
* `Notebook: Format Notebook`: Formats the entire Notebook by running
the formatter for every cell
* `Organize Imports`: Runs the `source.organizeImports` code action on
every cell in parallel
* `Fix All`: Runs the `source.fixAll` code action on every cell in
parallel

## Feature checklist

- [x] Code actions
  - [x] Organize imports
  - [x] Fix all
  - [x] Each fixable diagnostics
  - [x] Disable rule comment
- [x] Code action resolve
- [x] Commands
  - [x] `ruff.applyAutofix`
  - [x] `ruff.applyOrganizeImports`
  - [x] `ruff.applyFormat`
- [x] Diagnostics
  - [x] On open
  - [x] On close
  - [x] On save
  - [x] On change
- [x] Formatting
- [x] Hover

## Test Plan

Manually testing for all the features mentioned above.

### How to run this locally?

1. Clone https://github.com/astral-sh/ruff-lsp and
https://github.com/astral-sh/ruff-vscode in the same directory
2. Checkout this branch `git checkout dhruv/notebook` in the `ruff-lsp`
repository
3. Install the requirements for both repositories
4. For `ruff-vscode`, uninstall `ruff-lsp` (`pip uninstall --yes
ruff-lsp`) as we'd want to use the local version. To install the local
`ruff-lsp` version in `ruff-vscode`, follow [Modifying the
LSP](https://github.com/astral-sh/ruff-vscode#modifying-the-lsp).
5. Open VSCode from `ruff-vscode` directory -> "Run and Debug" section
from the sidebar -> "Debug Extension and Python" config.

This will then open a VSCode development session which can be used to
test out the notebook features.

**Test notebooks:**
* Formatting:
https://gist.github.com/dhruvmanila/7803e5a3b98c414505384db415a635a0
* Diagnostics, Code actions, Commands:
https://gist.github.com/dhruvmanila/54c65870f167a56558d4701f57f53042

**Requires: astral-sh/ruff#7664 which was
released in `v0.1.0`**

fixes: #267 
closes: astral-sh/ruff-vscode#256
closes: astral-sh/ruff-vscode#314
closes: astral-sh/ruff-vscode#51
azurelotus0926 added a commit to azurelotus0926/ruff-lsp that referenced this pull request Jun 27, 2024
## Summary

This PR adds support for Jupyter Notebook. It requires client support
for LSP 3.17 which contains the [Notebook
support](https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#notebookDocument_synchronization).

### Implementation

#### Context

* `Document`: LSP type representing a text file (Python file for Ruff).
* `TextDocument`: `pygls` representation of the LSP `Document`. This is
an abstraction created from a `Document` which provides some useful
methods like getting the file path, source code, etc.
* New in 3.17: `NotebookDocument` type was added to represent a Notebook
which consists of a list of cells (`NotebookCell`). Note that these are
all LSP types coming from `lsprotocol`.
* In `pygls`, a Notebook cell is represented as a text document
(`TextDocument`).

There are methods provided by `pygls` to get the object:
* `get_text_document` - Returns a `TextDocument` which either represents
a Python file or a Notebook cell
* `get_notebook_document` - Returns a `NotebookDocument` either using
the Notebook URI or a cell URI. For cell URI, it returns the
`NotebookDocument` containing the cell.

#### Document

A new `Document` type was created to facilitate the implementation. This
represents either a Python file, a Notebook or a Notebook cell. There
are various constructor methods which should be used to create this
type:
* For a URI representing a Python file, use either `from_uri` or
`from_text_document`.
* For a URI representing a Notebook file, use either `from_uri` or
`from_notebook_document`.
* For a URI representing a Notebook cell, use either
`from_cell_or_text_uri` or `from_notebook_cell`.

#### Notebook JSON

Ruff expects the source content of a Notebook file to be in JSON format
following the [Notebook format specification] but the protocol uses it's
own abstraction and doesn't store the JSON format. This means that we
need to create a JSON string representing the Notebook from the
available information. This doesn't need all the information as Ruff
only uses the cell source and version information. So, we create a
minimal JSON string representing the Notebook document and pass it to
Ruff.

<details><summary>An example JSON string representing a Notebook
Document:</summary>
<p>

```json
{
  "metadata": {},
  "nbformat": 4,
  "nbformat_minor": 5,
  "cells": [
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "import random\nimport math"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "try:\n    print()\nexcept ValueError:\n    pass"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "import random\nimport pprint\n\nrandom.randint(10, 20)"
    },
    {
      "cell_type": "code",
      "metadata": null,
      "outputs": [],
      "source": "foo = 1\nif foo is 1:\n    msg = f\"Invalid foo: {foo}\"\n    raise ValueError(msg)"
    }
  ]
}
```

</p>
</details> 

**We need to pass in every cell including the markdown cell to get an
accurate information like the cell number.**

For the cell document kind, the source value is a JSON string containing
just a single code cell. This is required as code actions and formatting
work at both cell and notebook level.

### Configuration

For VSCode users, the `notebook.*` configuration is used to run the
formatter or code actions on save:

```jsonc
{
  // Enable formatting the entire Notebook on save
  "notebook.formatOnSave.enabled": true,
  // Run the enabled code actions on the entire Notebook on save
  "notebook.codeActionsOnSave": {
    "source.fixAll": true,
    "source.organizeImports.ruff": true
  },
}
```

The way the above settings work in VSCode is that the editor runs the
actions in parallel for every cell. This has the illusion that it was
run on the entire Notebook. The commands defined by us (`Ruff: Organize
imports` and `Ruff: Fix all auto-fixable problems`) are run on the
entire Notebook at once. This is important because in the latter case
the `ruff` command is invoked `n` number of times where `n` is the
number of cells while for the former it's run only once.

### Commands

#### Builtin

* `Ruff: Organize Imports`: Works at Notebook level
* `Ruff: Fix all auto-fixable problems`: Works at Notebook level

#### VSCode specifc

* `Format Cell`: Formats the current cell
* `Notebook: Format Notebook`: Formats the entire Notebook by running
the formatter for every cell
* `Organize Imports`: Runs the `source.organizeImports` code action on
every cell in parallel
* `Fix All`: Runs the `source.fixAll` code action on every cell in
parallel

## Feature checklist

- [x] Code actions
  - [x] Organize imports
  - [x] Fix all
  - [x] Each fixable diagnostics
  - [x] Disable rule comment
- [x] Code action resolve
- [x] Commands
  - [x] `ruff.applyAutofix`
  - [x] `ruff.applyOrganizeImports`
  - [x] `ruff.applyFormat`
- [x] Diagnostics
  - [x] On open
  - [x] On close
  - [x] On save
  - [x] On change
- [x] Formatting
- [x] Hover

## Test Plan

Manually testing for all the features mentioned above.

### How to run this locally?

1. Clone https://github.com/astral-sh/ruff-lsp and
https://github.com/astral-sh/ruff-vscode in the same directory
2. Checkout this branch `git checkout dhruv/notebook` in the `ruff-lsp`
repository
3. Install the requirements for both repositories
4. For `ruff-vscode`, uninstall `ruff-lsp` (`pip uninstall --yes
ruff-lsp`) as we'd want to use the local version. To install the local
`ruff-lsp` version in `ruff-vscode`, follow [Modifying the
LSP](https://github.com/astral-sh/ruff-vscode#modifying-the-lsp).
5. Open VSCode from `ruff-vscode` directory -> "Run and Debug" section
from the sidebar -> "Debug Extension and Python" config.

This will then open a VSCode development session which can be used to
test out the notebook features.

**Test notebooks:**
* Formatting:
https://gist.github.com/dhruvmanila/7803e5a3b98c414505384db415a635a0
* Diagnostics, Code actions, Commands:
https://gist.github.com/dhruvmanila/54c65870f167a56558d4701f57f53042

**Requires: astral-sh/ruff#7664 which was
released in `v0.1.0`**

fixes: #267 
closes: astral-sh/ruff-vscode#256
closes: astral-sh/ruff-vscode#314
closes: astral-sh/ruff-vscode#51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
linter Related to the linter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants