Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
perf: Cache CSV stream schema (#363)
The stream's `schema` property is accessed multiple times for each record (see `Stream._generate_record_messages()` for instance). Since the schema should be static this change caches it, resulting in a significant performance improvement. Testing with a sample 2,000,000 row dataset (`people-2000000` from https://github.com/datablist/sample-csv-files) reduced the read time from 441 seconds to 48 seconds; about a 10x improvement in throughput. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Loading branch information