Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

print uint64 using inttypes.h PRIu64 instead of llu #3423

Closed
jangorecki opened this issue Feb 20, 2019 · 8 comments · Fixed by #3525
Closed

print uint64 using inttypes.h PRIu64 instead of llu #3423

jangorecki opened this issue Feb 20, 2019 · 8 comments · Fixed by #3525
Labels
Milestone

Comments

@jangorecki
Copy link
Member

jangorecki commented Feb 20, 2019

%llu format works fine when using from R C API functions like Rprintf, but it fails on Windows when using in plain C like sprintf. We could use PRIu64 instead, defined in inttypes.h.
According to http://r.789695.n4.nabble.com/package-config-file-for-windows-td921218.html inttypes.h is available on CRAN's Windows machines.
According to pocoproject/poco#1426 inttypes.h is available on Solaris.

We need this to be able to construct a message from plain C code, where message body contains length of a vector, thus can be int64. This is used for deferred error/warnings/verbose messages from code that runs in parallel regions.

@mattdowle
Copy link
Member

mattdowle commented Apr 24, 2019

See https://github.com/Rdatatable/data.table/blob/master/src/fread.c#L30
There llu is defined to be unsigned long long int. Then use "%llu" in format string as usual, but cast the size_t or uint64_t arguments using (llu) prefix.

@jangorecki
Copy link
Member Author

@mattdowle Is it going to work on windows 32bit?
I tried sprintf("%llu", (long long unsigned) nx); for uint_fast64_t nx and it was causing problems for win32.

@mattdowle
Copy link
Member

mattdowle commented Apr 25, 2019

@jangorecki The line in fread.c and in my comment above uses unsigned long long int. Why have you used long long unsigned?

@jangorecki
Copy link
Member Author

jangorecki commented Apr 26, 2019

@mattdowle unsigned long long int doesn't seems to work on windows (both 32 and 64)

froll.c:41:3: warning: unknown conversion type character 'l' in format [-Wformat=]
   if (verbose) sprintf(ans->message[0], "%s%s: running for input length %llu, window %d, hasna %d, narm %d\n", ans->message[0], __func__, (unsigned long long int)nx, k, hasna, (int) narm);
   ^
froll.c:41:3: warning: format '%d' expects argument of type 'int', but argument 5 has type 'uint_fast64_t' [-Wformat=]
froll.c:41:3: warning: too many arguments for format [-Wformat-extra-args]

@mattdowle
Copy link
Member

mattdowle commented Apr 26, 2019

I see what you mean and given those warnings about int types, it is reasonable to think it's about int types. But note the last warning (which is at the same point 41:3) : warning: too many arguments for format [-Wformat-extra-args]. If there's an extra argument (or something like that) then that could cause other compiler warnings to be misleading.

The first thing I noticed was that the target pointer (ans->message[0]) appeared again as the first argument after the format specifier, to achieve writing to the end of the string. That's going to cause trouble because sprintf() will need to read and write to the same part of the string at the same time. Maybe that should be possible and should work, but it isn't clean, it's not efficient, and probably not expected by compiler optimizer writers. So lets just remove that for starters. I added an end() function which just calls strchr() and then wrapped the target with end() each time.

Also we should never use sprintf. Always snprintf and pass the amount of space available. To avoid overwrites. In fread.c it uses snprintf not sprintf. I changed all sprintf to snprintf.

I made these changes to the PR and it's passing now: 17d2c52 and 605d0b0. This shows that unsigned long long int does work on Windows including 32bit and those warnings were misleading.

I left the 500 that I added hard coded just as a first step to make sure it passed CI first. Could you clear that up from here please. Maybe end() could subtract the length that strchr() returned from 4096 allocated and store the remaining space in a global variable. Then pass that global as the 2nd argument to snprintf. However, I'm not sure how much of this is in parallel, so a single global may not work. Perhaps store a usedLength in ans-> too then and wrap the snprintf() calls into an appendMessage helper function or something. If you define the 4096 somewhere in just one place (a .h) then use that macro definition in the two places to ensure there can't be mismatch in future if the 4096 is ever changed in one place but not the other.

@jangorecki
Copy link
Member Author

jangorecki commented Apr 29, 2019

AFAIU snprintf the 500 value is fine there, assuming none of the message has more than 500 chars. Even if such message would happen, the behaviour is defined and result in message being truncated. Writing those messages goes in parallel so, yes, global var wouldn't work. AFAIU the code that is now in this branch is enough, we don't need to store amount of characters used in extra variable. Second argument of snprintf, the 500, is not the number of chars that will be written to that buffer, but a maximum number, thus assuming that none of message is >500 and all messages in total are <=4096 (or precisely 4096-n_messages because each snprintf consume extra byte for \0), then we are safe as it is now.

@mattdowle
Copy link
Member

mattdowle commented Apr 29, 2019

Yes the 500 applies to each message and it's the maximum written (if the message is longer it is safely truncated to 500). There's an implicit upper bound of 8 messages per buffer then (8*500 < 4096). In practice, it will safely hold many more than 8 messages since the messages are a lot shorter than 500. Ok since you're happy with it (sounds like you know the maximum number of messages is 8 or less), I'll merge it so we can move on. Maybe limiting the number of messages is the easiest way to make it rock solid. That could be an int[] variable added to the ans structure alongside the message[][].

@jangorecki
Copy link
Member Author

jangorecki commented Apr 29, 2019

Hold on. It won't be enough as user may provide arbitrary number of columns or windows to calculate. Need to investigate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants