Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails on flow sequences #114

Closed
lewisl opened this issue May 3, 2021 · 4 comments · Fixed by #116
Closed

Fails on flow sequences #114

lewisl opened this issue May 3, 2021 · 4 comments · Fixed by #116
Labels

Comments

@lewisl
Copy link

lewisl commented May 3, 2021

I have a longish yml file. With block sequences, it works. I converted the blocks to flow because there were very few and very short entries in each sequence. All of the popular YAML parse/validate/prettify tools accept the entire file; it validates as correct; and it can successfully be converted to json, toml,

YAML.jl fails with an angry sequence of error messages that I can't make out (
YAML_error_msgs.txt
).

I broke the file down to many fewer nodes. The shortest fragment works:

5:                                  # agegrp
  25:                               # sickday
    7:                              # fromcond
      probs: [0.682, 0.318]
      outcomes: [3, 4]
    8:
      probs: [0.676, 0.324]
      outcomes: [3, 4]

YAML.load_file("mini1.yml) converts to a julia dict I called mini. Using PrettyPrint to output shows it is correct (effectively an ascii "repr" of json):

{
  5 : {
        25 : {
               7 : {
                     "probs" : [0.682, 
                                0.318],
                     "outcomes" : [3, 
                                   4],
                   },
               8 : {
                     "probs" : [0.676, 
                                0.324],
                     "outcomes" : [3, 
                                   4],
                   },
             },
      },
}

here is a slightly longer form:

5:                                  # agegrp
  25:                               # sickday
    7:                              # fromcond
      probs: [0.682, 0.318]
      outcomes: [3, 4]
    8:
      probs: [0.676, 0.324]
      outcomes: [3, 4]
  19:
    8:
      probs: [0.49, 0.24, 0.27]
      outcomes: [3, 8, 4]
  14:
    6:
      probs: [0.7, 0.3]
      outcomes: [3, 7]
    7:
      probs: [0.7, 0.1, 0.2]
      outcomes: [3, 7, 8]
    8:
      probs: [0.12, 0.67, 0.21]
      outcomes: [3, 8, 4]
  9:
    5:
      probs: [0.5, 0.5]
      outcomes: [3, 7]
    6:
      probs: [0.4, 0.6]
      outcomes: [6, 7]
    7:
      probs: [0.6, 0.4]
      outcomes: [7, 8]
  5:
    5:
      probs: [0.1, 0.5, 0.4]
      outcomes: [5, 6, 7]
      

This stops at the 5: to 25: to 8: to outcomes node. It effectively aborts after the first leaf node with no error message of any kind. Further, it won't generate the julia datastructure at all.

There are 4 more top level nodes like node 5 above. This leads to massive failure.

Finally, it almost works when converting the little arrays (sequences) into block sequences:

1:
  5:
    5:
      probs:
        - 0.4
        - 0.5
        - 0.1
      outcomes:
        - 5
        - 6
        - 7

Except bizarrely the leading integer 1 at the top of the file is converted to a string representation of a utf8 symbol: "\ufeff1" which should be in Julia '\ufeff1' but this could be the way Jupyter wants to output it. The character is this thingee: 󾿱 You can see that the stream approach is much more readable and shows the matchup of the probability and the outcome arrays.

Visual Studio Code is pretty happy that the integer 1 is actually a 1. So, the conversion is causing the problem.

Maybe this is failing on numeric scalar keys? But, it works on the small fragment and it mostly works on the longer example. Some of the convert to json tools sort of insist that the yml BE valid json, but converting to a dict shouldn't have that restriction.

Perhaps I can make a little gist, preferably a jupyter notebook, for you of several examples if this isn't enough to see the problem.

For, now I've attached the files in a zip archive ymlfiles.zip, which contains:

new.yml => shows the stream version
new_block.yml => shows the block version
mini2.yml => shows the stream version for a single top level node

@kescobo kescobo added the bug label May 3, 2021
hexane360 added a commit to hexane360/YAML.jl that referenced this issue May 22, 2021
@hexane360
Copy link
Contributor

It looks like there's two issues going on here.

First, "mini2.yml" starts with a UTF8-BOM (inserted by some windows utilities like notepad.exe). This was fixed in PR #107 when support for alternate encodings was added. It loads fine with the current github version. (To use the latest version in your project, use either pkg add YAML#master or pkg develop YAML)

Second, "new.yml" contains a trailing comma on line 62:

outcomes: [3, 7,]

Looking at the YAML spec, this is explicitly allowed in both flow sequences and mappings. I'll open a PR shortly to fix this behavior.

hexane360 added a commit to hexane360/YAML.jl that referenced this issue May 22, 2021
hexane360 added a commit to hexane360/YAML.jl that referenced this issue May 22, 2021
@lewisl
Copy link
Author

lewisl commented May 22, 2021 via email

@lewisl
Copy link
Author

lewisl commented May 23, 2021 via email

@hexane360
Copy link
Contributor

I'm glad you got your code working.

To match the YAML spec, there are two components:

  • A BOM at the beginning of a document shouldn't be parsed as part of the document. This is fixed in the current master branch.
  • Trailing commas should be allowed in flow collections. This should be fixed in PR Fix trailing commas in flow sequences #116.

Because the trailing commas bug still exists (and the resulting error message is nigh-unreadable), I think we should keep this issue open. When PR #116 is merged we can close the issue.

kescobo pushed a commit that referenced this issue May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants