Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cdac] Implement NibbleMap lookup and tests #108403

Merged
merged 10 commits into from
Oct 8, 2024
69 changes: 69 additions & 0 deletions docs/design/datacontracts/ExecutionManager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
# Contract ExecutionManager

This contract is for mapping a PC address to information about the
managed method corresponding to that address.


## APIs of contract

**TODO**

## Version 1

**TODO** Methods

### NibbleMap

Version 1 of this contract depends on a "nibble map" data structure
that allows mapping of a code address in a contiguous subsection of
the address space to the pointer to the start of that a code sequence.
It takes advantage of the fact that the code starts are aligned and
are spaced apart to represent their addresses as a 4-bit nibble value.

Given a contiguous region of memory in which we lay out a collection of non-overlapping code blocks that are
not too small (so that two adjacent ones aren't too close together) and where the start of each code block is preceeded by a code header aligned on some power of 2,
we can break up the whole memory space into buckets of a fixed size (32-bytes in the current implementation), where
each bucket either has a code block header or not.
Thinking of each code block header address as a hex number, we can view it as: `[index, offset, zeros]`
where each index gives us a bucket and the offset gives us the position of the header within the bucket.
We encode each offset into a 4-bit nibble, reserving the special value 0 to mark the places in the map where a method doesn't start.

To find the start of a method given an address we first convert it into a bucket index (giving the map unit)
and an offset which we can then turn into the index of the nibble that covers that address.
If the nibble is non-zero, we have the start of a method and it is near the given address.
If the nibble is zero, we have to search backward first through the current map unit, and then through previous map
units until we find a non-zero nibble.

For example (all code addresses are relative to some unspecified base):

Suppose there is code starting at address 304 (0x130)

* Then the map index will be 304 / 32 = 9 and the byte offset will be 304 % 32 = 16
* Because addresses are 4-byte aligned, the nibble value will be 1 + 16 / 4 = 5 (we reserve 0 to mean no method).
* So the map unit containing index 9 will contain the value 0x5 << 22 (the map index 2 means we want the second nibble in the second map unit, and we number the nibbles starting from the most significant) , or 0x1400000


Now suppose we do a lookup for address 306 (0x132)
* The map index will be 306 / 32 = 9 and the byte offset will be 306 % 32 = 18
* The nibble value will be 1 + 18 / 4 = 5
* To do the lookup, we will load the map unit with index 9 (so the second 32-bit unit in the map) and get the value 0x1400000
* We will then shift to focus on the nibble with map index 9 (which again has nibble shift 22), so
the map unit will be 0x00000005 and we will get the nibble value 5.
* Therefore we know that there is a method start at map index 9, nibble value 5.
* The map index corresponds to an offset of 288 bytes and the nibble value 5 corresponds to an offset of (5 - 1) * 4 = 16 bytes
* So the method starts at offset 288 + 16 = 304, which is the address we were looking for.

Now suppose we do a lookup for address 302 (0x12E)

* The map index will be 302 / 32 = 9 and the byte offset will be 302 % 32 = 14
* The nibble value will be 1 + 14 / 4 = 4
* To do the lookup, we will load the map unit containing map index 9 and get the value 0x1400000
* We will then shift to focus on the nibble with map index 9 (which again has nibble shift 22), so we will get
the nibble value 5.
* Therefore we know that there is a method start at map index 9, nibble value 5.
* But the address we're looking for is map index 9, nibble value 4.
* We know that methods can't start within 32-bytes of each other, so we know that the method we're looking for is not in the current nibble.
* We will then try to shift to the previous nibble in the map unit (0x00000005 >> 4 = 0x00000000)
* Therefore we know there is no method start at any map index in the current map unit.
* We will then align the map index to the start of the current map unit (map index 8) and move back to the previous map unit (map index 7)
* At that point, we scan backwards for a non-zero map unit and a non-zero nibble within the first non-zero map unit. Since there are none, we return null.
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ public enum DataType
pointer,

GCHandle,
CodePointer,
Thread,
ThreadStore,
GCAllocContext,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ internal abstract class Target
/// <returns>Pointer read from the target</returns>}
public abstract TargetPointer ReadPointer(ulong address);

/// <summary>
/// Read a code pointer from the target in target endianness
/// </summary>
/// <param name="address">Address to start reading from</param>
/// <returns>Pointer read from the target</returns>}
public abstract TargetCodePointer ReadCodePointer(ulong address);

/// <summary>
/// Read some bytes from the target
/// </summary>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
using System;

namespace Microsoft.Diagnostics.DataContractReader;

public readonly struct TargetCodePointer : IEquatable<TargetCodePointer>
{
public static TargetCodePointer Null = new(0);
public readonly ulong Value;
public TargetCodePointer(ulong value) => Value = value;

public static implicit operator ulong(TargetCodePointer p) => p.Value;
public static implicit operator TargetCodePointer(ulong v) => new TargetCodePointer(v);

public static bool operator ==(TargetCodePointer left, TargetCodePointer right) => left.Value == right.Value;
public static bool operator !=(TargetCodePointer left, TargetCodePointer right) => left.Value != right.Value;

public override bool Equals(object? obj) => obj is TargetCodePointer pointer && Equals(pointer);
public bool Equals(TargetCodePointer other) => Value == other.Value;

public override int GetHashCode() => Value.GetHashCode();

public bool Equals(TargetCodePointer x, TargetCodePointer y) => x.Value == y.Value;
public int GetHashCode(TargetCodePointer obj) => obj.Value.GetHashCode();

public TargetPointer AsTargetPointer => new(Value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On ARM32 platforms should this strip off the thumb bit? For reference, on ARM32 Thumb2 targets, the lowest bit is typically set on a code address, to indicate that the pointer refers to a code using the Thumb2 instruction set instead of the ARM instruction set.

I see this as a potential problem around the conversion to ulong here, as well as the AsTargetPointer api.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea that's a good idea. Elsewhere (in the PrecodeStubs contract) I have an explicit helper that strips off the thumb bit:

    internal TargetPointer CodePointerReadableInstrPointer(TargetCodePointer codePointer)
    {
        // Mask off the thumb bit, if we're on arm32, to get the actual instruction pointer
        ulong instrPointer = (ulong)codePointer.AsTargetPointer & MachineDescriptor.CodePointerToInstrPointerMask.Value;
        return new TargetPointer(instrPointer);
    }

I couldn't decide if that's something we want on the TargetCodePointer or on the Target (or on a contract, as I've prototyped it so far)

I think on TargetCodePointer makes the most sense, but then i'll need to store the mask in the code pointer instance at creation time (or make the conversion to a TargetPointer depend on the current target) - and i wasn't sure about the usability of that approach


public override string ToString() => $"0x{Value:x}";
}
Original file line number Diff line number Diff line change
@@ -1,11 +1,16 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
using System;
using System.Diagnostics;

namespace Microsoft.Diagnostics.DataContractReader;


[DebuggerDisplay("{Hex}")]
public readonly struct TargetNUInt
{
public readonly ulong Value;
public TargetNUInt(ulong value) => Value = value;

internal string Hex => $"0x{Value:x}";
}
Original file line number Diff line number Diff line change
Expand Up @@ -23,4 +23,6 @@ namespace Microsoft.Diagnostics.DataContractReader;
public bool Equals(TargetPointer other) => Value == other.Value;

public override int GetHashCode() => Value.GetHashCode();

public override string ToString() => $"0x{Value:x}";
}
Loading
Loading