Skip to content

Commit

Permalink
fix escapes in backtick str; change csv parse algo
Browse files Browse the repository at this point in the history
1.Change RemesPath indexers to reduce the number of backslash escapes
    needed to get keys containing special characters.
    For instance, previously @.`\\n\\\\a\"` would be required
    to match the key "\n\\a\"", whereas now @.`\n\\a"` matches it.
2. Related, previously RemesPath could not handle
    literal '\\' chars right before the closing backtick. This is fixed.
3. add csv_regex function to get regex used in s_csv
4. add "includeFullMatchAsFirstItem" parameter to s_fa.
5. Update CSV parsing to only accept RFC 4180
6. update CSV-dumping algorithm to output
    RFC 4180-conforming files
  • Loading branch information
molsonkiko committed Dec 2, 2023
1 parent 110cbb7 commit 38891d5
Show file tree
Hide file tree
Showing 18 changed files with 354 additions and 175 deletions.
11 changes: 6 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ and this project adheres to [Semantic Versioning](http://semver.org/).

### To Be Changed

- __ADD ABILITY TO REGEX SEARCH DOCUMENT UNPARSED DOCUMENT (USING REMESPATH)__
- If there's a validation error inside of an `anyOf` list of schemas (i.e. JSON doesn't validate under *any* of the schemas), the error message is rather uninformative, and says only "the JSON didn't validate under any of the schemas", but not *why* it didn't validate.
- *(Note to future devs)*: Resist the temptation to fool around with the StringBuilder initial capacity for the ToString method of `Dtype.STR` JNodes. I tried, and it doesn't help performance.
- Mark dark mode icons that look less out of place
Expand All @@ -47,16 +46,17 @@ and this project adheres to [Semantic Versioning](http://semver.org/).

1. Option to customize which [toolbar icons](/docs/README.md#toolbar-icons) are displayed, and their order.
2. [For loops in RemesPath](/docs/RemesPath.md#for-loopsloop-variables-added-in-v60)
3. [`bool, s_csv` and `s_fa` vectorized arg functions](/docs/RemesPath.md#vectorized-functions) and [`randint` non-vectorized arg function](/docs/RemesPath.md#non-vectorized-functions) to RemesPath.
3. [`bool, s_csv` and `s_fa` vectorized arg functions](/docs/RemesPath.md#vectorized-functions) and [`randint` and `csv_regex` non-vectorized arg functions](/docs/RemesPath.md#non-vectorized-functions) to RemesPath.
4. Make second argument of [`s_split` RemesPath function](/docs/RemesPath.md#vectorized-functions) optional; 1-argument variant splits on whitespace.
5. Right-click dropdown menu in [error form](/docs/README.md#error-form-and-status-bar), allowing export of errors to JSON or refreshing the form.
6. The parser is now much better at recovering when an object is missing its closing `'}'` or an array is missing its closing `']'`.

### Changed

1. The way object keys are represented internally has been changed (*this has no effect on the GUI-based API, but only for developers*). Previously, when pretty-printing and compressing JSON, object keys would be output as is (without escaping special characters), meaning that *prior to v6.0, some strings were not valid object keys (again, this did not affect parsing of JSON, but only some programmatic applications that constructed JOBjects directly without parsing).* Now all strings are valid object keys.
2. When using the JSON-to-CSV form to create CSV files, newline characters will no longer be escaped in strings. Instead, strings containing newlines will be wrapped in quotes, which should be sufficient to allow most CSV parsers to handle them correctly.
3. Made [`offer_to_show_lint` setting](/docs/README.md#parser-settings) (which controls whether a prompt is shown when errors are found) true by default, so that a fresh installation will show the prompt.
1. When using the JSON-to-CSV form to create CSV files, newline characters will no longer be escaped in strings. Instead, strings containing newlines will be wrapped in quotes, which should be sufficient to allow most CSV parsers to handle them correctly.
2. Made [`offer_to_show_lint` setting](/docs/README.md#parser-settings) (which controls whether a prompt is shown when errors are found) true by default, so that a fresh installation will show the prompt.
3. Change RemesPath indexers to reduce the number of backslash escapes needed to get keys containing special characters like `"a\\b"` or `"\"foo\"\tbar"`. For instance, previously ``@.`\\n\\\\a\"` `` would be required to match the key `"\n\\a\""`, whereas now `` @.`\n\\a"` `` matches it.
4. Running a RemesPath query only causes an attempted re-parsing of the document if the treeview's current file is open.

### Fixed

Expand All @@ -68,6 +68,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/).
6. access violations when loading [error form](/docs/README.md#error-form-and-status-bar)
7. unnecessary prompt when manually reloading [error form](/docs/README.md#error-form-and-status-bar)
8. issue with trying to view error form when the error form was already open
9. RemesPath backtick strings now can have a literal `\` character just before the closing backtick. Previously this was impossible because of a regex-writing bug.

## [5.8.0] - 2023-10-09

Expand Down
4 changes: 2 additions & 2 deletions JsonToolsNppPlugin/Forms/TreeViewer.cs
Original file line number Diff line number Diff line change
Expand Up @@ -343,8 +343,8 @@ private static string TextForTreeNode(string key, JNode node)
private void SubmitQueryButton_Click(object sender, EventArgs e)
{
if (json == null) return;
if (shouldRefresh)
RefreshButton.PerformClick();
if (shouldRefresh && Npp.notepad.GetCurrentFilePath() == fname)
RefreshButton.PerformClick(); // as of v6.0, queries only trigger auto-refresh if current fname is open; this avoids accidental refreshes with different document
bool usesSelections = UsesSelections();
string query = QueryBox.Text;
JNode queryFunc;
Expand Down
60 changes: 51 additions & 9 deletions JsonToolsNppPlugin/JSONTools/JNode.cs
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,8 @@ public enum DocumentType
INI,
/// <summary>regex search results</summary>
REGEX,
/// <summary>csv files (differs from REGEX only in handling of quoted values)</summary>
CSV,
}

/// <summary>
Expand Down Expand Up @@ -339,11 +341,26 @@ public static void CharToSb(StringBuilder sb, char c)
/// <returns></returns>
public static string StrToString(string s, bool quoted)
{
int slen = s.Length;
int ii = 0;
for (; ii < slen; ii++)
{
char c = s[ii];
if (c < 32 || c == '\\' || c == '"')
break;
}
if (ii == slen)
return quoted ? $"\"{s}\"" : s;
var sb = new StringBuilder();
if (quoted)
sb.Append('"');
foreach (char c in s)
CharToSb(sb, c);
if (ii > 0)
{
ii--;
sb.Append(s, 0, ii);
}
for (; ii < slen; ii++)
CharToSb(sb, s[ii]);
if (quoted)
sb.Append('"');
return sb.ToString();
Expand All @@ -358,6 +375,36 @@ public static string UnescapedJsonString(string str, bool strAlreadyQuoted)
return (string)new JsonParser().ParseString(strAlreadyQuoted ? str : $"\"{str}\"").value;
}

/// <summary>
/// adds the escaped JSON representation (BUT WITHOUT ENCLOSING QUOTES) of a string s to StringBuilder sb.
/// </summary>
public static void StrToSb(StringBuilder sb, string s)
{
int ii = 0;
int slen = s.Length;
// if s contains no control chars
for (; ii < slen; ii++)
{
char c = s[ii];
if (c < 32 || c == '\\' || c == '"')
break;
}
if (ii == slen)
sb.Append(s);
else
{
if (ii > 0)
{
ii--;
sb.Append(s, 0, ii);
}
for (; ii < slen; ii++)
{
CharToSb(sb, s[ii]);
}
}
}

/// <summary>
/// Compactly prints the JSON.<br></br>
/// If sort_keys is true, the keys of objects are printed in alphabetical order.<br></br>
Expand All @@ -371,12 +418,7 @@ public virtual string ToString(bool sort_keys = true, string key_value_sep = ":
{
case Dtype.STR:
{
var sb = new StringBuilder();
sb.Append('"');
foreach (char c in (string)value)
CharToSb(sb, c);
sb.Append('"');
return sb.ToString();
return StrToString((string)value, true);
}
case Dtype.FLOAT:
{
Expand Down Expand Up @@ -849,7 +891,7 @@ public static string FormatKey(string key, KeyStyle style = KeyStyle.Python)
{
if (DOT_COMPATIBLE_REGEX.IsMatch(key))
return $".{key}";
string key_dubquotes_unescaped = key.Replace("\\", "\\\\").Replace("`", "\\`");
string key_dubquotes_unescaped = key.Replace("\\\"", "\"").Replace("`", "\\`");
return $"[`{key_dubquotes_unescaped}`]";
}
case KeyStyle.JavaScript:
Expand Down
3 changes: 1 addition & 2 deletions JsonToolsNppPlugin/JSONTools/JsonParser.cs
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,6 @@
using System.Text;
using System.Text.RegularExpressions;
using JSON_Tools.Utils;
using static System.Windows.Forms.LinkLabel;

namespace JSON_Tools.JSON_Tools
{
Expand Down Expand Up @@ -717,7 +716,7 @@ public string ParseKey(string inp)
{
sb.Append('\\');
sb.Append(nextChar);
ii += 1;
ii++;
}
else if (nextChar == 'u')
{
Expand Down
38 changes: 17 additions & 21 deletions JsonToolsNppPlugin/JSONTools/JsonTabularize.cs
Original file line number Diff line number Diff line change
Expand Up @@ -795,30 +795,26 @@ public JArray BuildTable(JNode obj, Dictionary<string, object> schema, string ke
return new JArray(0, result);
}

/// <summary>
/// If a string contains the delimiter or a newline, append (the string wrapped in quotes) to sb<br></br>
/// Otherwise, append s to sb.
/// </summary>
/// <param name="s"></param>
/// <param name="delim"></param>
/// <param name="quote_char"></param>
/// <returns></returns>
private void ApplyQuotesIfNeeded(StringBuilder sb, string s, char delim, char quote_char, string newline)
/// <summary>
/// If string s contains the delimiter, '\r', '\n', or a literal quote character, append (the string wrapped in quotes) to sb.<br></br>
/// If s contains literal quote character, it is escaped by doubling it up according to the CSV RFC 4180 (https://www.ietf.org/rfc/rfc4180.txt)<br></br>
/// Otherwise, append s to sb unchanged
/// </summary>
/// <param name="s"></param>
/// <param name="delim"></param>
/// <param name="quote_char"></param>
/// <returns></returns>
private void ApplyQuotesIfNeeded(StringBuilder sb, string s, char delim, char quote_char)
{
if (s.IndexOf(delim) >= 0 || s.IndexOf(newline) >= 0)
if (s.IndexOfAny(new char[] {delim, '\r', '\n', quote_char}) >= 0)
{
// if the string contains the delimiter or a newline, we need to wrap it in quotes
// we also need to escape all literal quote characters in the string
sb.Append(quote_char);
foreach (char c in s)
for (int ii = 0; ii < s.Length; ii++)
{
char c = s[ii];
sb.Append(c);
if (c == quote_char)
{
sb.Append('\\');
sb.Append(quote_char);
}
else
sb.Append(c);
sb.Append(quote_char);
}
sb.Append(quote_char);
}
Expand Down Expand Up @@ -850,7 +846,7 @@ public string TableToCsv(JArray table, char delim = ',', char quote_char = '"',
for (int ii = 0; ii < header.Length; ii++)
{
string col = header[ii];
ApplyQuotesIfNeeded(sb, col, delim, quote_char, newline);
ApplyQuotesIfNeeded(sb, col, delim, quote_char);
if (ii < header.Length - 1) sb.Append(delim);
}
sb.Append(newline);
Expand All @@ -870,7 +866,7 @@ public string TableToCsv(JArray table, char delim = ',', char quote_char = '"',
switch (val.type)
{
case Dtype.STR:
ApplyQuotesIfNeeded(sb, (string)val.value, delim, quote_char, newline);
ApplyQuotesIfNeeded(sb, (string)val.value, delim, quote_char);
break; // only apply quotes if internal delim
case Dtype.DATE:
sb.Append(((DateTime)val.value).ToString("yyyy-MM-dd"));
Expand Down
15 changes: 5 additions & 10 deletions JsonToolsNppPlugin/JSONTools/RemesPath.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1427,7 +1427,7 @@ private static object GetSingleIndexerListValue(JNode ind)
{
switch (ind.type)
{
case Dtype.STR: return (string)ind.value;
case Dtype.STR: return JNode.StrToString((string)ind.value, false);
case Dtype.INT: return Convert.ToInt32(ind.value);
case Dtype.SLICE: return ((JSlicer)ind).slicer;
case Dtype.REGEX: return ((JRegex)ind).regex;
Expand Down Expand Up @@ -1512,7 +1512,7 @@ private Obj_Pos ParseIndexer(List<object> toks, int pos, int end, IndexerStart i
}
else
{
children.Add(jt.value);
children.Add(JNode.StrToString((string)jt.value, false));
}
return new Obj_Pos(new VarnameList(children), pos + 1);
}
Expand Down Expand Up @@ -2155,17 +2155,12 @@ private Obj_Pos ParseProjection(List<object> toks, int pos, int end, JQueryConte
{
throw new RemesPathException("Mixture of values and key-value pairs in object/array projection");
}
if (key.type == Dtype.STR)
if (key.value is string keystr)
{
opo = ParseExprFunc(toks, pos + 1, end, context);
JNode val = (JNode)opo.obj;
pos = opo.pos;
string keystr_in_quotes = key.ToString();
string keystr = keystr_in_quotes.Substring(1, keystr_in_quotes.Length - 2);
// do proper JSON string representation of characters that should not be in JSON keys
// (e.g., '\n', '\t', '\f')
// in case the user uses such a character in the projection keys in their query
children.Add(new KeyValuePair<string, JNode>(keystr, val));
children.Add(new KeyValuePair<string, JNode>(JNode.StrToString(keystr, false), val));
is_object_proj = true;
nt = PeekNextToken(toks, pos - 1, end);
if (!(nt is char))
Expand All @@ -2176,7 +2171,7 @@ private Obj_Pos ParseProjection(List<object> toks, int pos, int end, JQueryConte
}
else
{
throw new RemesPathException($"Object projection keys must be string, not {JNode.FormatDtype(key.type)}");
throw new RemesPathException($"Object projection keys must be string, not type {JNode.FormatDtype(key.type)} (value {key.ToString()})");
}
}
else
Expand Down
Loading

0 comments on commit 38891d5

Please sign in to comment.