Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StringInputStream gets stuck at large enough stdin input #555

Closed
DartBot opened this issue Nov 22, 2011 · 7 comments
Closed

StringInputStream gets stuck at large enough stdin input #555

DartBot opened this issue Nov 22, 2011 · 7 comments
Assignees
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries.

Comments

@DartBot
Copy link

DartBot commented Nov 22, 2011

This issue was originally filed by joaoped...@gmail.com


What steps will reproduce the problem?

  1. Run this program with the txt file attached:

main() {
  var a;
  var ios = new StringInputStream(stdin);
  while ((a = ios.read()) !== null) {
    print(a.length);
  }
  exit(0);
}

  1. Run it like this:

dart_release wrangler.dart < mega_sample.txt

  1. It should get stuck there.

What is the expected output? What do you see instead?

It should print the length of the input string. But it gets stuck like in an endless loop and doesn't print anything and doesn't exit either.

What version of the product are you using? On what operating system?

Bleeding-edge Dart. Linux manga 3.0.0-13-generic-pae #­22-Ubuntu SMP Wed Nov 2 15:17:35 UTC 2011 i686 i686 i386 GNU/Linux

Please provide any additional information below.

I'm attaching the sample file. Here's the output when the program works for a slightly smaller input file:

$ time dart_release wrangler.dart < mega_sample.txt
837083

real 0m4.274s
user 0m4.236s
sys 0m0.036s

And when it gets stuck and has to be canceled:

$ wc mega_sample.txt
  4960 10416 840472 mega_sample.txt

$ time dart_release wrangler.dart < mega_sample.txt
^C

real 0m52.132s
user 0m52.083s
sys 0m0.032s


Attachment:
mega_sample.txt (820.77 KB)

@DartBot
Copy link
Author

DartBot commented Nov 22, 2011

This comment was originally written by drfibonacci@google.com


Added Area-Library, Triaged labels.

@whesse
Copy link
Contributor

whesse commented Nov 22, 2011

This is due to extremely high new space usage by StringInputStream. Using the
--gc-verbose option (found by using the --print-flags VM option - there is no --help option for the VM), we see that reading this 820 KB file takes more than 20 MB in new space - it performs scavenges in a loop, without making progress, with the default new space allocation of 32 MB (half of that is 16 MB, the default max new space heap size). It completes if we specify --new-gen-heap-size=44. This is a bug - this is requiring more than 25 bytes of live allocation per byte read from the file, and it is not getting promoted to old space.


cc @sgjesse.
cc @madsager.

@madsager
Copy link
Contributor

Set owner to @sgjesse.
Added Accepted label.

@whesse
Copy link
Contributor

whesse commented Nov 22, 2011

The problem is the StringBuffer _result field of _StringDecoderBase in string_stream.dart. Characters get added individually to this buffer as long
as the decoder has not exhausted its input. If we limit the size of this StringBuffer, and stop decoding when it reaches a certain size, then the classes
that use this don't retry the read to get more data, and return null before reporting all the data.

Once toString has been called on the buffer, then the buffer can be freed, and we are using much less memory per character. We need some way to do this after about every 10000 - 100000 characters.

I think we need a 2-level scheme of StringBuffers, so that every 100,000 characters get written to a string, and those strings are accumulated in a higher level StringBuffer. Other than that, we would need a way in which all uses of a decoder would keep calling the decoder until they got all the data in large chunks.

@whesse
Copy link
Contributor

whesse commented Nov 22, 2011

A simpler solution would be to use List<int> charcodes to accumulate the string, instead of a StringBuffer, since Lists can be efficiently extended, and create the string from it when it is read from the decoder. There is no good reason to use a StringBuffer, since we are adding a character at a time, by knowing its character code.

Or maybe StringBuffer itself should know to concatenate itself to a single string, and
free all the substrings, when it gets to be thousands of items long. I don't see an issue with that, unless we need to copy the initial big string thousands of times while accumulating a megabyte string in the buffer. That brings us back to the two-level tree solution, but implemented in StringBuffer, not its users.

@whesse
Copy link
Contributor

whesse commented Nov 22, 2011

Added a CL that fixes this, http://codereview.chromium.org/8639004/, by accumulating the character codes in a List<int>, not a StringBuffer. StringBuffers are mainly useful for accumulating many strings, of varying lengths. List<int> is better for single characters.


cc @sgjesse.
Set owner to @whesse.

@whesse
Copy link
Contributor

whesse commented Nov 23, 2011

Fixed by commit r1777, but note bug 571 : http://code.google.com/p/dart/issues/detail?id=571
FileInputStream and stdin do not work correctly with StringInputStream at the present time. The read handler and close handler of StringInputStream will not be called.


Added Fixed label.

@DartBot DartBot added Type-Defect area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries. labels Nov 23, 2011
nex3 pushed a commit that referenced this issue Aug 31, 2016
Fix file separators on Windows
copybara-service bot pushed a commit that referenced this issue Nov 9, 2023
Revisions updated by `dart tools/rev_sdk_deps.dart`.

collection (https://github.com/dart-lang/collection/compare/e8d7e92..f309148):
  f309148  2023-11-08  Kevin Moore  Latest lints, require Dart 3.1, use mixin (#322)

fixnum (https://github.com/dart-lang/fixnum/compare/3279f5d..6b0888c):
  6b0888c  2023-11-08  Kevin Moore  Update to latest lints and fix (#122)

markdown (https://github.com/dart-lang/markdown/compare/efb73b3..3774ad0):
  3774ad0  2023-11-07  Kevin Moore  Fix formatting for latest Dart dev SDK (#565)
  da11847  2023-11-07  Mosc  Fix beginning of line detection in `AutolinkExtensionSyntax` (#555)

tools (https://github.com/dart-lang/tools/compare/d898ad1..dd46ef2):
  dd46ef2  2023-11-09  Elias Yishak  Enum + event constructor added for `flutterCommandResult` (#199)

Change-Id: I8aa33f52c8afd50b60b225ad890f3e82945bcc50
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/335303
Reviewed-by: Konstantin Shcheglov <scheglov@google.com>
Auto-Submit: Devon Carew <devoncarew@google.com>
Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-core-library SDK core library issues (core, async, ...); use area-vm or area-web for platform specific libraries.
Projects
None yet
Development

No branches or pull requests

3 participants