Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vm/ffi] Implement TypedData unwrapping to Pointers in Leaf calls #44589

Closed
dcharkes opened this issue Jan 5, 2021 · 10 comments
Closed

[vm/ffi] Implement TypedData unwrapping to Pointers in Leaf calls #44589

dcharkes opened this issue Jan 5, 2021 · 10 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-performance Issue relates to performance or code size

Comments

@dcharkes
Copy link
Contributor

dcharkes commented Jan 5, 2021

After implementing leaf calls (#36707), we should support passing typed data into FFI trampolines.

Leaf calls will not permit the garbage collector to run, which means that it is safe to pass a pointer to underlying data to C, provided that C does not hold on to that pointer after the FFI call returns. This removes the need to copy the data before passing it to C.

Pointer<NativeFunction<Void Function(Pointer<Int8>)>> functionPointer;

final myFunction = functionPointer.asFunction<void Function(Uint8List)>(isLeaf: true);
// or
final myFunction = functionPointer.asFunction<void Function(ByteBuffer)>(isLeaf: true);

Note that the first option needs to take Uint8List.offsetInBytes into account, so that is a double unwrapping.

@blaugold
Copy link
Contributor

Some APIs use slices to pass strings or raw bytes:

struct Slice {
  void *buf;
  size_t size;
};

Maybe this pattern can be considered when designing this feature.

@dcharkes
Copy link
Contributor Author

Pointer<NativeFunction<Void Function(Pointer<Int8>)>> functionPointer;

final myFunction = functionPointer.asFunction<void Function(Uint8List)>(isLeaf: true);
// or
final myFunction = functionPointer.asFunction<void Function(ByteBuffer)>(isLeaf: true);

Note that the first option needs to take Uint8List.offsetInBytes into account, so that is a double unwrapping.

We should probably implement both.

@mraleph
Copy link
Member

mraleph commented Dec 16, 2022

Some additional interesting questions here is whether we want to allow unwrapping of external typed data even on non-leaf calls and whether we want to allow long term sharing of typed data backing buffer with native code.

When embedder creates external typed data through Dart_NewExternalTypedDataWithFinalizer they essentially pass ownership of the buffer to Dart - so it should be feasible to create an API that can facilitate sharing of the ownership with native.

One possibility here could be to allow unwrapping external typed data into a reference counted slice, like so:

typedef struct _Dart_SliceVtbl {
  void (*acquire)(struct _Dart_Slice* slice);
  void (*release)(struct _Dart_Slice* slice);
} Dart_SliceVtbl;

typedef struct _Dart_Slice {
  Dart_SliceVtbl* vtbl;
  void* peer;
  void* buffer;
  size_t size;
} Dart_Slice;

I have not entirely decided whether it should be passed by value or by pointer to the native side.

If the native code wants to keep the slice around it can do slice.vtbl->acquire(&slice) and the slice.buffer and .peer will remain valid even after the FFI call returns back to Dart. Later when native side is done it can call slice.vtbl->release(&slice). At this point buffer might be released if there are no other references to it.

@haberman
Copy link

haberman commented Jun 2, 2023

This would greatly accelerate protobuf parsing using https://github.com/protocolbuffers/upb. Profiles currently show ~40% of the overall parsing cost is just copying the data into an FFI buffer.

A memory copy function a la #43968 would also help greatly, but avoiding the malloc()+free()+copy() will be ideal in cases where the input buffer is small enough to be parsed as a leaf function.

@dcharkes
Copy link
Contributor Author

Some additional interesting questions here is whether we want to allow unwrapping of external typed data even on non-leaf calls and whether we want to allow long term sharing of typed data backing buffer with native code.

@mraleph If we need to do some kind of memory management for non-leaf functions and we're not passing the actual buffer, then that means we can't share the API between leaf and non-leaf functions.

If we can't share the API between leaf and non-leaf functions, we should go and implement at the leaf functions (and not have it blocked on an API design for non-leaf functions).

@dcharkes
Copy link
Contributor Author

This would also be useful for ICU4X https://github.com/unicode-org/icu4x/blob/1c0b95ef95d869841fec45ed0e7b107b47c8afad/ffi/capi/dart/package/lib/src/GraphemeClusterSegmenter.g.dart#L69-L88

The code would then look as follows:

  GraphemeClusterBreakIteratorUtf16 segmentUtf16(Uint16List input) {
    final alloc = ffi2.Arena();

    final result = _ICU4XGraphemeClusterSegmenter_segment_utf16(
        _underlying, input, input._length);
    alloc.releaseAll();
    return GraphemeClusterBreakIteratorUtf16._(result);
  }

  // ignore: non_constant_identifier_names
  static final _ICU4XGraphemeClusterSegmenter_segment_utf16 = _capi<
          ffi.NativeFunction<
              ffi.Pointer<ffi.Opaque> Function(
                  ffi.Pointer<ffi.Opaque>,
                  ffi.Pointer<ffi.Uint16>,
                  ffi.Size)>>('ICU4XGraphemeClusterSegmenter_segment_utf16')
      .asFunction<
          ffi.Pointer<ffi.Opaque> Function(ffi.Pointer<ffi.Opaque>,
              Uint16List, int)>(isLeaf: true);

Thanks @robertbastian for reporting!

Because the length would be in elements (not in bytes), it would make more sense to go for Uint8List and friends rather than ByteBuffer.

Some APIs use slices to pass strings or raw bytes

Hm, just passing a struct by value doesn't work, because while populating the struct Dart might do a GC. So the unwrapping can only happen in the FfiCall itself. A workaround would be to wrap the native code, which should become much easier with the native assets feature. @blaugold Any specific use case in mind here? Would wrapping the native code work?

@blaugold
Copy link
Contributor

For my specific case, I have to pass around this memory slice struct to make use of the API of Fleece. Currently, I'm allocating native memory and copy to it from Dart before passing it to Fleece, but with this issue resolved that would not be necessary anymore. 💪 I already need to use some native code to glue things together, so wrapping would work without much pain and be even easier once native assets become stable. 🔜🤞🤓

copybara-service bot pushed a commit that referenced this issue Dec 12, 2023
Fixes issues reported by:

TEST=pkg/analyzer/test/verify_diagnostics_test.dart

* Fixes error messages on asFunction.
* Fixes validation by introducing Uint8List to mock sdk.
* Removes unused typed_data import in example snippet.
* Fixes error locations in example snippets

Bug: #44589
Change-Id: I82261eea5e55cbafa686865a92131a449a7642f5
Cq-Include-Trybots: luci.dart.try:pkg-linux-debug-try,pkg-linux-release-try,pkg-mac-release-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/340921
Commit-Queue: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
copybara-service bot pushed a commit that referenced this issue Jan 31, 2024
This reverts: https://dart-review.googlesource.com/c/sdk/+/338620

We'd like to support this use case with a different API. See the
discussion in #54739.

TEST=tests/ffi

Bug: #44589
Bug: #54771
Change-Id: Ic22fbcab14d374bb9c81bba1f1bf6ae2dfc9e674
Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-optimization-level-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64c-try,vm-aot-win-release-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-ffi-android-debug-arm-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-fuchsia-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-debug-x64c-try,vm-win-release-ia32-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/349340
Commit-Queue: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
@dcharkes dcharkes reopened this Jan 31, 2024
@dcharkes
Copy link
Contributor Author

We want to support this use case with a different API, see the discussions in #54739.

copybara-service bot pushed a commit that referenced this issue Feb 2, 2024
Refactor to avoid conflicts later with concurrent work on:
- #45508
- #44589

Change-Id: I7b7ea2e4ec29115da42b0c196a2952c3cd5d3fa6
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/349901
Commit-Queue: Martin Kustermann <kustermann@google.com>
Auto-Submit: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
copybara-service bot pushed a commit that referenced this issue Feb 9, 2024
For getting the address of `Struct` fields and `Array` elements
it would be useful to access the `_typedDataBase` field in these
'wrappers' in a unified way.

This CL makes `Array` extend `_Compound`.
Since `Array` is not a `SizedNativeType`, the implements clauses
for `Struct` and `Union` are moved to these classes.

Moreover, any references to 'compound' which only apply to struct or
union are updated.

TEST=test/ffi

Bug: #44589
Bug: #54739
Bug: #41237

CoreLibraryReviewExempt: No API change, only refactoring.
Change-Id: Ib9d8bccd4872df04bcc67731e4052f826ab70af4
Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-optimization-level-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-ffi-android-debug-arm-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-fuchsia-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-release-ia32-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/350960
Reviewed-by: Martin Kustermann <kustermann@google.com>
Reviewed-by: Liam Appelbe <liama@google.com>
copybara-service bot pushed a commit that referenced this issue Mar 6, 2024
And use these new getters inside field reads/writes.
This outlines this currently inlined code into calls to a helper
methods. This may make the kernel a little smaller and easier to
read. Also, it removes the need to compute the (now outlined)
expression on arbitrary call sites in the FFI transformer

TEST=test/ffi
TEST=all the expect files

Bug: #44589
Bug: #41237
Change-Id: I53d69778ee50186944229550f89f10c22452e13f
Cq-Include-Trybots: luci.dart.try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-optimization-level-linux-release-x64-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-ffi-android-debug-arm-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-fuchsia-release-x64-try,vm-kernel-linux-debug-x64-try,vm-kernel-precomp-linux-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-ubsan-linux-release-x64-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/353261
Commit-Queue: Daco Harkes <dacoharkes@google.com>
Reviewed-by: Martin Kustermann <kustermann@google.com>
copybara-service bot pushed a commit that referenced this issue Mar 26, 2024
This CL changes compounds (structs, unions, and arrays) to be backed
by a TypedDataBase and an int offset.

Before this CL, the compounds where only backed by a TypedDataBase.
This leads to the following issues:

1. Access to nested structs required code for allocating new typed data
   views or pointers, which the optimizer then had to prevent from
   being allocated after inlining.
2. Runtime branching on whether the TypedDataBase was a Pointer or
   TypedData increased code size and prevented inlining.
   #54892
   This could not be properly optimized if in AOT both typed
   data and pointer were flowing into the same compound.
3. Constructing TypedData views required calculating the length of the
   view.

After this CL, accessing nested compounds will lead to accesses on the
original TypedDataBase with an extra offset.

This removes the polymorphism on TypedData vs Pointer, because the
final int/float/Pointer accesses in nested compounds operate on
TypedDataBase.

Also, it simplifies creating an `offsetBy` accessor, because it will
no longer have to be polymorphic in typed data vs pointer, nor will it
have to calculate the length of the field.

Implementation details:

* The changes in the CFE and patch files are straightforward.
* VM: Struct-by-value returns (or callback params) are initialized
  with an offsetInBytes of 0.
* VM: Struct-by-value arguments (and callback return) need to read out
  the offsetInBytes. Before this CL we were passing in the TypedData
  as tagged value. With this CL we are passing the TypedData as tagged
  value and the offset as unboxed int, from 1 IL input to 2 IL
  inputs. (The alternative would have been to take the compound as
  a tagged value, but that would have prevented optimizations from not
  allocating the compound object in the optimizer.
  The FfiCallInstr is updated to have two definitions for the case
  where we were passing in the TypedData previously.
  The NativeReturnInstr is refactored to be able to take two inputs
  instead of 1. (Note that we don't have VariadicInstr only
  VariadicDefinition in the code base. So the instruction is _not_
  implemented as variadic, rather as having a fixed length of 2.)
* dart2wasm does no longer support nested compounds due to the
  compound implementation only storing a pointer address.
  #55083

Intending to land this after
https://dart-review.googlesource.com/c/sdk/+/353101.

TEST=test/ffi

CoreLibraryReviewExempt: VM and WASM-only implementation change.
Closes: #54892
Bug: #44589
Change-Id: I8749e21094bf8fa2d5ff1e48b6b002c375232eb5
Cq-Include-Trybots: dart-internal/g3.dart-internal.try:g3-cbuild-try
Cq-Include-Trybots: dart/try:vm-aot-android-release-arm64c-try,vm-aot-android-release-arm_x64-try,vm-aot-linux-debug-x64-try,vm-aot-linux-debug-x64c-try,vm-aot-mac-release-arm64-try,vm-aot-mac-release-x64-try,vm-aot-obfuscate-linux-release-x64-try,vm-aot-optimization-level-linux-release-x64-try,vm-aot-win-debug-arm64-try,vm-aot-win-debug-x64-try,vm-aot-win-debug-x64c-try,vm-appjit-linux-debug-x64-try,vm-asan-linux-release-x64-try,vm-checked-mac-release-arm64-try,vm-eager-optimization-linux-release-ia32-try,vm-eager-optimization-linux-release-x64-try,vm-ffi-android-debug-arm-try,vm-ffi-android-debug-arm64c-try,vm-ffi-qemu-linux-release-arm-try,vm-ffi-qemu-linux-release-riscv64-try,vm-fuchsia-release-x64-try,vm-linux-debug-ia32-try,vm-linux-debug-x64-try,vm-linux-debug-x64c-try,vm-mac-debug-arm64-try,vm-mac-debug-x64-try,vm-msan-linux-release-x64-try,vm-reload-linux-debug-x64-try,vm-reload-rollback-linux-debug-x64-try,vm-ubsan-linux-release-x64-try,vm-win-debug-arm64-try,vm-win-debug-x64-try,vm-win-debug-x64c-try,vm-win-release-ia32-try
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/354226
Reviewed-by: Tess Strickland <sstrickl@google.com>
@dcharkes
Copy link
Contributor Author

dcharkes commented Apr 3, 2024

The next question is whether we should do out of bounds checks:

void testAddressOfInt8ListElementAtOutOfBounds() {
  const length = 20;
  const dontCare = -1;
  final typedData = makeInt8List(length);
  Expect.throws(() {
    takeInt8Pointer(typedData[30].address, dontCare);
  });
  Expect.throws(() {
    takeInt8Pointer(typedData[-10].address, dontCare);
  });
}

@Native<Int8 Function(Pointer<Int8>, Size)>(
    symbol: 'TakeInt8Pointer', isLeaf: true)
external int takeInt8Pointer(Pointer<Int8> pointer, int length);

Int8List makeInt8List(int length) {
  final typedData = Int8List(length);
  for (int i = 0; i < length; i++) {
    final value = i % 2 == 0 ? i : -i;
    typedData[i] = value;
  }
  return typedData;
}

Both for TypedData and Array (#54739).

Pros:

  • More safety, these are Dart objects and we know the bounds.

Cons:

  • Slower & increased code size.
  • Fake safety, if a Pointer is passed, we're likely iterating over it but we have no way to check whether the length argument in the example is within bounds.

@greenrobot
Copy link

Safe by default, unsafe if the API user indicates they know what they are doing (e.g. special function)? Essentially the reserve of what C++ is doing; which is usually a good guideline. 🙃

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. library-ffi type-performance Issue relates to performance or code size
Projects
None yet
Development

No branches or pull requests

5 participants