Skip to content

Commit

Permalink
Only remove the unescape prefix character when parsing
Browse files Browse the repository at this point in the history
* Currently the `unescape_formulas` option when parsing will remove any
  first character of a cell as long as the second character is part of the
  `@escape_formula_start`. This causes issues when the data is something
  like this:

  ```
  "A-1","B+2","C=3","'@d"
  ```

  which should parse to this:

  ```
  ["A-1", "B+2", "C=3", "@d"]
  ```

  but parses to this instead:

  ```
  ["-1", "+2", "=3", "@d"]
  ```

  This change will check that the first character is actually the escape
  character we want before removing it.

* Slight fix up of the documentation
  • Loading branch information
jgmchan authored and beatrichartz committed Jan 2, 2025
1 parent 9eec472 commit e6a7943
Show file tree
Hide file tree
Showing 5 changed files with 21 additions and 6 deletions.
2 changes: 1 addition & 1 deletion lib/csv/decoding/decoder.ex
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ defmodule CSV.Decoding.Decoder do
* `:validate_row_length` – When set to `true`, will take the first row of
the csv or its headers and validate that following rows are of the same
length. Defaults to `false`.
* `:escape_formulas` – When set to `true`, will remove formula escaping
* `:unescape_formulas` – When set to `true`, will remove formula escaping
inserted to prevent [CSV Injection](https://owasp.org/www-community/attacks/CSV_Injection).
## Examples
Expand Down
7 changes: 5 additions & 2 deletions lib/csv/decoding/parser.ex
Original file line number Diff line number Diff line change
Expand Up @@ -88,11 +88,14 @@ defmodule CSV.Decoding.Parser do
unescape_formulas = options |> Keyword.get(:unescape_formulas, @unescape_formulas)

if unescape_formulas do
formula_pattern = :binary.compile_pattern(@escape_formula_start)
formula_pattern =
@escape_formula_start
|> Enum.map(fn char -> @escape_formula_prefix <> char end)
|> :binary.compile_pattern()

fn field ->
case :binary.match(field, formula_pattern) do
{1, _} -> binary_part(field, 1, byte_size(field) - 1)
{0, _} -> binary_part(field, 1, byte_size(field) - 1)
_ -> field
end
end
Expand Down
1 change: 1 addition & 0 deletions lib/csv/defaults.ex
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ defmodule CSV.Defaults do
@escape_formulas false
@unescape_formulas false
@escape_formula_start ["=", "-", "+", "@"]
@escape_formula_prefix "'"
end
end
end
2 changes: 1 addition & 1 deletion lib/csv/encoding/encode.ex
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ defimpl CSV.Encode, for: BitString do

data =
if escape_formulas and String.starts_with?(data, @escape_formula_start) do
"'" <> data
@escape_formula_prefix <> data
else
data
end
Expand Down
15 changes: 13 additions & 2 deletions test/decoding/parser_test.exs
Original file line number Diff line number Diff line change
Expand Up @@ -151,14 +151,25 @@ defmodule DecodingTests.ParserTest do
end

test "removes escaping for formula when unescape_formulas is set to true" do
input = [["=1+1", ~S(=1+2";=1+2), ~S(=1+2'" ;,=1+2)], ["-10+7"], ["+10+7"], ["@A1:A10"]]
input = [
["=1+1", ~S(=1+2";=1+2), ~S(=1+2'" ;,=1+2)],
["-10+7"],
["+10+7"],
["@A1:A10"],
["X-1"],
["B+1"],
["C=1"]
]

assert encode_decode_loop([input], escape_formulas: true, unescape_formulas: true) == [
ok: [
"=1+1=1+2\";=1+2=1+2'\" ;,=1+2",
"-10+7",
"+10+7",
"@A1:A10"
"@A1:A10",
"X-1",
"B+1",
"C=1"
]
]
end
Expand Down

0 comments on commit e6a7943

Please sign in to comment.