Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SUBSEP #4

Closed
greencardamom opened this issue May 8, 2018 · 7 comments
Closed

SUBSEP #4

greencardamom opened this issue May 8, 2018 · 7 comments

Comments

@greencardamom
Copy link

query_json() assumes SUBSEP is set to "," (comma) but SUBSEP is not defined anywhere. In Gawk, the default value of SUBSEP is the string "\034" which causes unusual display output. I've never seen json that didn't use comma so will hard-code it for my purposes, but maybe it's worth keeping as a variable and document somewhere as being adjustable. Couple other functions use SUBSEP that might also have trouble with Gawk.

Nice work BTW I'll be using some the functions.

@dubiousjim
Copy link
Owner

Hi thanks for the feedback.

I think that 1-byte string (\034) is generally the default value of SUBSEP. And awk automatically translates code of the form A[i, j] into A[i SUBSEP j]. That's part of the standard original behavior of awk; I'd be surprised if any of the contemporary implementations change it.

What part of the code gave you the impression that I was assuming SUBSEP was set to comma? I acknowledge I don't have this stuff loaded into my working memory right now, but I don't think I was making that assumption.

@greencardamom
Copy link
Author

Well I don't know then, Try running query_json() with this

{"results": [{"url": "http://nytimes.com", "timestamp": "20070101", "archived_snapshots": {"closest": {"status": "200", "available": true, "url": "http://web.archive.org/web/20061231083247/http://www.nytimes.com:80/", "timestamp": "20061231083247"}}}]}

It produces weird output. Only by setting SUPSEP to comma does it work.

@dubiousjim
Copy link
Owner

BEGIN {
  s = "{\"results\": [{\"url\": \"http://nytimes.com\", \"timestamp\": \"20070101\", \"archived_snapshots\": {\"closest\": {\"status\": \"200\", \"available\": true, \"url\": \"http://web.archive.org/web/20061231083247/http://www.nytimes.com:80/\", \"timestamp\": \"20061231083247\"}}}]}"
  query_json(s, A)
  dump(A)
}

produces:

[results,1,archived_snapshots,closest,status]=<200>
[results,1,url]=http://nytimes.com
[results,1,timestamp]=<20070101>
[results,1,archived_snapshots,closest,timestamp]=<20061231083247>
[results,0]=<1>
[results,1,archived_snapshots,closest,available]=<1>
[results,1,archived_snapshots,closest,url]=http://web.archive.org/web/20061231083247/http://www.nytimes.com:80/

There's no need to change the definition of SUBSEP. What results were you expecting?

After running query_json, your array is filled with an object whose single member is results. This object has an array of length A[results, 0], which is 1. The first (and only) element of that array, is unpacked at A[results, 1, ...]. Its url element can be retrieved as A[results, 1, url].

Everytime I say something like A[key1, key2], Awk silently translates that into A[key1 SUBSEP key2], because it doesn't really have multidimensional arrays but only this mechanism using SUBSEP to pretend like it does.

@dubiousjim
Copy link
Owner

Sorry, when I just wrote A[results, 1, url], that really needs to be A["results", 1, "url"].

@greencardamom
Copy link
Author

Ahh dump(). I was doing:

for(i in A) print "A[" i "] = " A[i]

It appears that because the index ('i') contains \034 it prints to the screen a sort of Chinese-looking block character that overwrites other characters giving a strange result. It's supposed to be non-printing but it prints. It wasn't the comma exactly rather any character other than \034 fixed it.

In dump():

gsub(SUBSEP, ",", j)

which corrects it for display purposes. It was not clear why that strange output was occurring then I narrowed it down to the SUBSEP thinking it required a comma, but really the problem was SUBSEP needs to be gsub()'d before displaying. I just read more carefully about multi-dimensional arrays and SUBSEP and you are right, this is how its done, it makes sense. My only suggestion maybe, since this could be a common gotcha, would be an inline comment to use dump() to properly display the results of json_query()

@dubiousjim
Copy link
Owner

Well, dump is a debugging function, with its own idiosyncrasies (like wrapping the values inside < and >). I didn't expect that the normal use of json_query would be to dump the results; it's just there to help with development.

I expected the normal use would be something like:

query_json(s, A)
url = A["results", 1, "url"]

@greencardamom
Copy link
Author

Yes but how would one know the string to search for is
A["results", 1, "url"]
without first doing a debug to see the output :) At least the first time using the function, to understand what it's producing, or with more complex JSON that is not so easy to parse mentally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants