Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align inner loops #44370

Merged
merged 59 commits into from
Jan 12, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
32236ac
Detect inner loop and add 10 bytes of padding at the beginning
kunalspathak Oct 8, 2020
a2ec5d1
generate nop in previous blocks
kunalspathak Oct 9, 2020
89b5812
TODO: figure out if anything needs to be done in optCanonicalizeLoop
kunalspathak Oct 9, 2020
a84b8be
Add COMPlus_JitAlignLoopMinBlockWeight and COMPlus_JitAlignLoopMaxCod…
kunalspathak Oct 9, 2020
a51759a
Reuse AlignLoops flag for dynamic loop alignment
kunalspathak Oct 10, 2020
2f57c78
Detect back edge and count no. of instructions before doing loop alig…
kunalspathak Oct 14, 2020
b4848f8
fix bugs
kunalspathak Oct 15, 2020
fe92c62
propagate the basic block flag
kunalspathak Oct 15, 2020
51c3171
Switch from instrCount to codeSize
kunalspathak Oct 16, 2020
0b227a8
JitAlignLoopWith32BPadding
kunalspathak Oct 16, 2020
a9c9764
Add emitLoopAlign32Bytes()
kunalspathak Oct 23, 2020
7f1f787
wip
kunalspathak Oct 23, 2020
aac18a4
Add logic to avoid emitting nop if not needed
kunalspathak Oct 28, 2020
5c3c40e
fix a condition
kunalspathak Oct 28, 2020
fb91d41
Several things:
kunalspathak Oct 30, 2020
0b8eb78
Added JitAlignLoopAdaptive algorithm
kunalspathak Nov 5, 2020
072a113
wip
kunalspathak Nov 5, 2020
7e1328a
revert emitarm64.cpp changes
kunalspathak Nov 5, 2020
d49e84d
fix errors during merge
kunalspathak Nov 5, 2020
f6b5135
fix build errors
kunalspathak Nov 5, 2020
256aae5
refactoring and cleanup
kunalspathak Nov 6, 2020
26c3c84
refactoring and build errors fix
kunalspathak Nov 7, 2020
6bb1a75
jit format
kunalspathak Nov 7, 2020
03f6ba6
one more build error
kunalspathak Nov 7, 2020
8036061
Add emitLoopAlignAdjustments()
kunalspathak Nov 11, 2020
809fc85
Update emitLoopAlignAdjustments to just include loopSize calc
kunalspathak Nov 11, 2020
5b9b7a0
Remove #ifdef ADAPTIVE_LOOP_ALIGNMENT
kunalspathak Nov 11, 2020
5584d71
Code cleanup
kunalspathak Nov 12, 2020
f66ee24
minor fixes
kunalspathak Nov 12, 2020
f2f935d
Fix issues:
kunalspathak Nov 12, 2020
99ea31f
Other fixes
kunalspathak Nov 12, 2020
8f64963
Remove align_loops flag from coreclr
kunalspathak Dec 4, 2020
1c85c3c
Review feedback
kunalspathak Dec 4, 2020
ef0b149
jit format
kunalspathak Dec 4, 2020
599ad43
Add FEATURE_LOOP_ALIGN
kunalspathak Dec 5, 2020
97fd373
remove special case for align
kunalspathak Dec 9, 2020
37b0cdb
Do not propagate BBF_LOOP_ALIGN in certain cases
kunalspathak Dec 14, 2020
c0cc8af
Introduce instrDescAlign and emitLastAlignedIgNum
kunalspathak Dec 15, 2020
26f7e61
Several changes:
kunalspathak Dec 15, 2020
305b812
jit format
kunalspathak Dec 16, 2020
f8bdfec
fix issue related to needLabel
kunalspathak Dec 16, 2020
a205cc0
align memory correctly in superpmi
kunalspathak Dec 17, 2020
d576e9c
Few more fixes:
kunalspathak Dec 17, 2020
28480d1
minor JITDUMP messages
kunalspathak Dec 18, 2020
bf03842
Review comments
kunalspathak Dec 18, 2020
dae9749
missing check
kunalspathak Dec 18, 2020
a153742
Mark the last align IG the one that has non-zero padding
kunalspathak Dec 18, 2020
ef02fbb
More review comments
kunalspathak Dec 19, 2020
e7e0d68
Propagate BBF_LOOP_ALIGN for compacting blocks
kunalspathak Dec 19, 2020
4b0e64d
Handle ALIGN_LOOP flag for loops that are unrolled
kunalspathak Dec 20, 2020
6fddce8
jit format
kunalspathak Dec 20, 2020
bad5685
Loop size upto last back-edge instead of first back-edge
kunalspathak Dec 21, 2020
8c30a96
Take loop weight in consideration
kunalspathak Dec 21, 2020
f32f560
remove align flag if loop is no longer valid
kunalspathak Jan 4, 2021
23177b0
Adjust loop block weight to 4 instead of 8
kunalspathak Jan 6, 2021
9f3cb2d
missing space after rebase
kunalspathak Jan 6, 2021
74620ed
fix the enum values after rebase
kunalspathak Jan 7, 2021
b100226
review feedback
kunalspathak Jan 9, 2021
dbeb7d6
Add missing #ifdef DEBUG
kunalspathak Jan 12, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion src/coreclr/ToolBox/superpmi/superpmi/icorjitinfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1609,7 +1609,21 @@ void MyICJI::allocMem(ULONG hotCodeSize, /* IN */
jitInstance->mc->cr->AddCall("allocMem");

// TODO-Cleanup: Could hot block size be ever 0?
*hotCodeBlock = jitInstance->mc->cr->allocateMemory(hotCodeSize);
size_t codeAlignment = sizeof(void*);
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
size_t hotCodeAlignedSize = static_cast<size_t>(hotCodeSize);

if ((flag & CORJIT_ALLOCMEM_FLG_32BYTE_ALIGN) != 0)
{
codeAlignment = 32;
}
else if ((flag & CORJIT_ALLOCMEM_FLG_16BYTE_ALIGN) != 0)
{
codeAlignment = 16;
}
hotCodeAlignedSize = ALIGN_UP_SPMI(hotCodeAlignedSize, codeAlignment);
hotCodeAlignedSize = hotCodeAlignedSize + (codeAlignment - sizeof(void*));
*hotCodeBlock = jitInstance->mc->cr->allocateMemory(hotCodeAlignedSize);
*hotCodeBlock = ALIGN_UP_SPMI(*hotCodeBlock, codeAlignment);

if (coldCodeSize > 0)
*coldCodeBlock = jitInstance->mc->cr->allocateMemory(coldCodeSize);
Expand Down
1 change: 0 additions & 1 deletion src/coreclr/inc/clrconfigvalues.h
Original file line number Diff line number Diff line change
Expand Up @@ -302,7 +302,6 @@ RETAIL_CONFIG_DWORD_INFO_EX(EXTERNAL_UseIBCFile, W("UseIBCFile"), 0, "", CLRConf
///
/// JIT
///
RETAIL_CONFIG_DWORD_INFO_DIRECT_ACCESS(UNSUPPORTED_JitAlignLoops, W("JitAlignLoops"), "Aligns loop targets to 8 byte boundaries")
CONFIG_DWORD_INFO_EX(INTERNAL_JitBreakEmit, W("JitBreakEmit"), (DWORD)-1, "", CLRConfig::EEConfig_default)
CONFIG_DWORD_INFO_DIRECT_ACCESS(INTERNAL_JitDebuggable, W("JitDebuggable"), "")
#if !defined(DEBUG) && !defined(_DEBUG)
Expand Down
50 changes: 25 additions & 25 deletions src/coreclr/inc/corjitflags.h
Original file line number Diff line number Diff line change
Expand Up @@ -79,45 +79,45 @@ class CORJIT_FLAGS
CORJIT_FLAG_BBINSTR = 29, // Collect basic block profile information
CORJIT_FLAG_BBOPT = 30, // Optimize method based on profile information
CORJIT_FLAG_FRAMED = 31, // All methods have an EBP frame
CORJIT_FLAG_ALIGN_LOOPS = 32, // add NOPs before loops to align them at 16 byte boundaries
CORJIT_FLAG_UNUSED12 = 32,
CORJIT_FLAG_PUBLISH_SECRET_PARAM = 33, // JIT must place stub secret param into local 0. (used by IL stubs)
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
CORJIT_FLAG_UNUSED12 = 34,
CORJIT_FLAG_UNUSED13 = 34,
CORJIT_FLAG_SAMPLING_JIT_BACKGROUND = 35, // JIT is being invoked as a result of stack sampling for hot methods in the background
CORJIT_FLAG_USE_PINVOKE_HELPERS = 36, // The JIT should use the PINVOKE_{BEGIN,END} helpers instead of emitting inline transitions
CORJIT_FLAG_REVERSE_PINVOKE = 37, // The JIT should insert REVERSE_PINVOKE_{ENTER,EXIT} helpers into method prolog/epilog
CORJIT_FLAG_UNUSED13 = 38,
CORJIT_FLAG_UNUSED14 = 38,
CORJIT_FLAG_TIER0 = 39, // This is the initial tier for tiered compilation which should generate code as quickly as possible
CORJIT_FLAG_TIER1 = 40, // This is the final tier (for now) for tiered compilation which should generate high quality code

#if defined(TARGET_ARM)
CORJIT_FLAG_RELATIVE_CODE_RELOCS = 41, // JIT should generate PC-relative address computations instead of EE relocation records
#else // !defined(TARGET_ARM)
CORJIT_FLAG_UNUSED14 = 41,
CORJIT_FLAG_UNUSED15 = 41,
#endif // !defined(TARGET_ARM)

CORJIT_FLAG_NO_INLINING = 42, // JIT should not inline any called method into this method

CORJIT_FLAG_UNUSED15 = 43,
CORJIT_FLAG_UNUSED16 = 44,
CORJIT_FLAG_UNUSED17 = 45,
CORJIT_FLAG_UNUSED18 = 46,
CORJIT_FLAG_UNUSED19 = 47,
CORJIT_FLAG_UNUSED20 = 48,
CORJIT_FLAG_UNUSED21 = 49,
CORJIT_FLAG_UNUSED22 = 50,
CORJIT_FLAG_UNUSED23 = 51,
CORJIT_FLAG_UNUSED24 = 52,
CORJIT_FLAG_UNUSED25 = 53,
CORJIT_FLAG_UNUSED26 = 54,
CORJIT_FLAG_UNUSED27 = 55,
CORJIT_FLAG_UNUSED28 = 56,
CORJIT_FLAG_UNUSED29 = 57,
CORJIT_FLAG_UNUSED30 = 58,
CORJIT_FLAG_UNUSED31 = 59,
CORJIT_FLAG_UNUSED32 = 60,
CORJIT_FLAG_UNUSED33 = 61,
CORJIT_FLAG_UNUSED34 = 62,
CORJIT_FLAG_UNUSED35 = 63
CORJIT_FLAG_UNUSED16 = 43,
CORJIT_FLAG_UNUSED17 = 44,
CORJIT_FLAG_UNUSED18 = 45,
CORJIT_FLAG_UNUSED19 = 46,
CORJIT_FLAG_UNUSED20 = 47,
CORJIT_FLAG_UNUSED21 = 48,
CORJIT_FLAG_UNUSED22 = 49,
CORJIT_FLAG_UNUSED23 = 50,
CORJIT_FLAG_UNUSED24 = 51,
CORJIT_FLAG_UNUSED25 = 52,
CORJIT_FLAG_UNUSED26 = 53,
CORJIT_FLAG_UNUSED27 = 54,
CORJIT_FLAG_UNUSED28 = 55,
CORJIT_FLAG_UNUSED29 = 56,
CORJIT_FLAG_UNUSED30 = 57,
CORJIT_FLAG_UNUSED31 = 58,
CORJIT_FLAG_UNUSED32 = 59,
CORJIT_FLAG_UNUSED33 = 60,
CORJIT_FLAG_UNUSED34 = 61,
CORJIT_FLAG_UNUSED35 = 62,
CORJIT_FLAG_UNUSED36 = 63
};

CORJIT_FLAGS()
Expand Down
10 changes: 5 additions & 5 deletions src/coreclr/inc/jiteeversionguid.h
Original file line number Diff line number Diff line change
Expand Up @@ -31,11 +31,11 @@
//
//////////////////////////////////////////////////////////////////////////////////////////////////////////

constexpr GUID JITEEVersionIdentifier = { /* 8e32c24d-62fe-4d78-ae73-eedddb928ee2 */
0x8e32c24d,
0x62fe,
0x4d78,
{0xae, 0x73, 0xee, 0xdd, 0xdb, 0x92, 0x8e, 0xe2}
constexpr GUID JITEEVersionIdentifier = { /* de81f48e-7701-45f2-a91b-1914f88dfd11 */
0xde81f48e,
0x7701,
0x45f2,
{0xa9, 0x1b, 0x19, 0x14, 0xf8, 0x8d, 0xfd, 0x11}
};

//////////////////////////////////////////////////////////////////////////////////////////////////////////
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/block.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,10 @@ void BasicBlock::dspFlags()
{
printf("cfe ");
}
if (bbFlags & BBF_LOOP_ALIGN)
{
printf("align ");
}
}

/*****************************************************************************
Expand Down
5 changes: 5 additions & 0 deletions src/coreclr/jit/block.h
Original file line number Diff line number Diff line change
Expand Up @@ -448,6 +448,7 @@ struct BasicBlock : private LIR::Range

#define BBF_PATCHPOINT MAKE_BBFLAG(36) // Block is a patchpoint
#define BBF_HAS_CLASS_PROFILE MAKE_BBFLAG(37) // BB contains a call needing a class profile
#define BBF_LOOP_ALIGN MAKE_BBFLAG(39) // Block is lexically the first block in a loop we intend to align.

// clang-format on

Expand All @@ -463,6 +464,10 @@ struct BasicBlock : private LIR::Range
{
return ((bbFlags & BBF_LOOP_HEAD) != 0);
}
bool isLoopAlign() const
{
return ((bbFlags & BBF_LOOP_ALIGN) != 0);
}

// Flags to update when two blocks are compacted

Expand Down
14 changes: 11 additions & 3 deletions src/coreclr/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2258,6 +2258,12 @@ void CodeGen::genGenerateMachineCode()

GetEmitter()->emitJumpDistBind();

#if FEATURE_LOOP_ALIGN
/* Perform alignment adjustments */

GetEmitter()->emitLoopAlignAdjustments();
#endif

/* The code is now complete and final; it should not change after this. */
}

Expand Down Expand Up @@ -2345,10 +2351,12 @@ void CodeGen::genEmitMachineCode()
#ifdef DEBUG
if (compiler->opts.disAsm || verbose)
{
printf("\n; Total bytes of code %d, prolog size %d, PerfScore %.2f, instruction count %d (MethodHash=%08x) for "
printf("\n; Total bytes of code %d, prolog size %d, PerfScore %.2f, instruction count %d, allocated bytes for "
"code %d (MethodHash=%08x) for "
"method %s\n",
codeSize, prologSize, compiler->info.compPerfScore, instrCount, compiler->info.compMethodHash(),
compiler->info.compFullName);
codeSize, prologSize, compiler->info.compPerfScore, instrCount,
GetEmitter()->emitTotalHotCodeSize + GetEmitter()->emitTotalColdCodeSize,
compiler->info.compMethodHash(), compiler->info.compFullName);
printf("; ============================================================\n\n");
printf(""); // in our logic this causes a flush
}
Expand Down
61 changes: 49 additions & 12 deletions src/coreclr/jit/codegenlinear.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -311,13 +311,6 @@ void CodeGen::genCodeForBBlist()

genUpdateCurrentFunclet(block);

#ifdef TARGET_XARCH
if (ShouldAlignLoops() && block->bbFlags & BBF_LOOP_HEAD)
{
GetEmitter()->emitLoopAlign();
}
#endif

genLogLabel(block);

// Tell everyone which basic block we're working on
Expand Down Expand Up @@ -356,6 +349,14 @@ void CodeGen::genCodeForBBlist()
needLabel = true;
}

#if FEATURE_LOOP_ALIGN
if (GetEmitter()->emitEndsWithAlignInstr())
{
// we had better be planning on starting a new IG
assert(needLabel);
}
#endif

if (needLabel)
{
// Mark a label and update the current set of live GC refs
Expand Down Expand Up @@ -667,10 +668,6 @@ void CodeGen::genCodeForBBlist()

switch (block->bbJumpKind)
{
case BBJ_ALWAYS:
inst_JMP(EJ_jmp, block->bbJumpDest);
break;

case BBJ_RETURN:
genExitCode(block);
break;
Expand Down Expand Up @@ -741,15 +738,55 @@ void CodeGen::genCodeForBBlist()
#endif // !FEATURE_EH_FUNCLETS

case BBJ_NONE:
case BBJ_COND:
case BBJ_SWITCH:
break;

case BBJ_ALWAYS:
inst_JMP(EJ_jmp, block->bbJumpDest);
FALLTHROUGH;

case BBJ_COND:

#if FEATURE_LOOP_ALIGN
// This is the last place where we operate on blocks and after this, we operate
// on IG. Hence, if we know that the destination of "block" is the first block
// of a loop and needs alignment (it has BBF_LOOP_ALIGN), then "block" represents
// end of the loop. Propagate that information on the IG through "igLoopBackEdge".
//
// During emitter, this information will be used to calculate the loop size.
// Depending on the loop size, decision of whether to align a loop or not will be taken.

if (block->bbJumpDest->isLoopAlign())
{
GetEmitter()->emitSetLoopBackEdge(block->bbJumpDest);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition seems odd to me: Presumably, you've already determined that an "isLoopAlign" block is the top of a loop. But now, any block that jumps to that is considered a back edge. What if the lexically first block of the loop is the target of branches that aren't loop branches: forward branches jumping to the loop top, or non-loop backward branches? (The non-loop backward branches might not matter since presumably your algorithm will hit an in-loop backward branch first, but possibly marking them could confuse other code? E.g., what if the first block in a non-first loop branched back to the top of the first loop? I'm not sure if all these cases can occur, due to loop canonicalization, etc., but this case seems not too specific)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitSetLoopBackEdge() checks the igNum and accordingly decides if the edge is a back-edge or not.

}
#endif
break;

default:
noway_assert(!"Unexpected bbJumpKind");
break;
}

#if FEATURE_LOOP_ALIGN

// If next block is the first block of a loop (identified by BBF_LOOP_ALIGN),
// then need to add align instruction in current "block". Also mark the
// corresponding IG with IGF_LOOP_ALIGN to know that there will be align
// instructions at the end of that IG.
//
// For non-adaptive alignment, add alignment instruction of size depending on the
// compJitAlignLoopBoundary.
// For adaptive alignment, alignment instruction will always be of 15 bytes.

if ((block->bbNext != nullptr) && (block->bbNext->isLoopAlign()))
{
assert(ShouldAlignLoops());

GetEmitter()->emitLoopAlignment();
}
#endif

#if defined(DEBUG) && defined(USING_VARIABLE_LIVE_RANGE)
if (compiler->verbose)
{
Expand Down
39 changes: 30 additions & 9 deletions src/coreclr/jit/compiler.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2308,7 +2308,7 @@ void Compiler::compSetProcessor()
opts.compUseCMOV = jitFlags.IsSet(JitFlags::JIT_FLAG_USE_CMOV);
#ifdef DEBUG
if (opts.compUseCMOV)
opts.compUseCMOV = !compStressCompile(STRESS_USE_CMOV, 50);
opts.compUseCMOV = !compStressCompile(STRESS_USE_CMOV, 50);
#endif // DEBUG

#endif // TARGET_X86
Expand Down Expand Up @@ -2615,6 +2615,29 @@ void Compiler::compInitOptions(JitFlags* jitFlags)
opts.compDbgInfo = jitFlags->IsSet(JitFlags::JIT_FLAG_DEBUG_INFO);
opts.compDbgEnC = jitFlags->IsSet(JitFlags::JIT_FLAG_DEBUG_EnC);

#ifdef DEBUG
opts.compJitAlignLoopAdaptive = JitConfig.JitAlignLoopAdaptive() == 1;
opts.compJitAlignLoopBoundary = (unsigned short)JitConfig.JitAlignLoopBoundary();
opts.compJitAlignLoopMinBlockWeight = (unsigned short)JitConfig.JitAlignLoopMinBlockWeight();

opts.compJitAlignLoopForJcc = JitConfig.JitAlignLoopForJcc() == 1;
opts.compJitAlignLoopMaxCodeSize = (unsigned short)JitConfig.JitAlignLoopMaxCodeSize();
#else
opts.compJitAlignLoopAdaptive = true;
opts.compJitAlignLoopBoundary = DEFAULT_ALIGN_LOOP_BOUNDARY;
opts.compJitAlignLoopMinBlockWeight = DEFAULT_ALIGN_LOOP_MIN_BLOCK_WEIGHT;
#endif
if (opts.compJitAlignLoopAdaptive)
{
opts.compJitAlignPaddingLimit = (opts.compJitAlignLoopBoundary >> 1) - 1;
}
else
{
opts.compJitAlignPaddingLimit = opts.compJitAlignLoopBoundary - 1;
}

assert(isPow2(opts.compJitAlignLoopBoundary));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like opts.compJitAlignLoopBoundary doesn' get set to anything in release if we are not doing adaptive alignment.

And whatever value it does get set to should be overridable in debug, in case you want to experiment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed setting opts.compJitAlignLoopBoundary for non-adaptive / release case. I will do it. Are you also suggesting to give a way to set compJitAlignLoopBoundary even for adaptive? Currently, the logic I have for adaptive is restricted to have a single align instruction of max 15 bytes. I will have to think and restructure some code to make that alignment boundary flexible for adaptive scenario. If you agree, I can do that in a separate PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, it can wait.


#if REGEN_SHORTCUTS || REGEN_CALLPAT
// We never want to have debugging enabled when regenerating GC encoding patterns
opts.compDbgCode = false;
Expand Down Expand Up @@ -3913,19 +3936,17 @@ void Compiler::compSetOptimizationLevel()
codeGen->setFrameRequired(true);
#endif

if (opts.jitFlags->IsSet(JitFlags::JIT_FLAG_RELOC))
if (opts.jitFlags->IsSet(JitFlags::JIT_FLAG_PREJIT))
{
codeGen->SetAlignLoops(false); // loop alignment not supported for prejitted code

// The zapper doesn't set JitFlags::JIT_FLAG_ALIGN_LOOPS, and there is
// no reason for it to set it as the JIT doesn't currently support loop alignment
// for prejitted images. (The JIT doesn't know the final address of the code, hence
// The JIT doesn't currently support loop alignment for prejitted images.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this is preexisting code, but this seems like a roundabout way to check if we are prejitting; why not just check the prejit flag?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean use JIT_FLAG_PREJIT instead of JIT_FLAG_RELOC?

// (The JIT doesn't know the final address of the code, hence
// it can't align code based on unknown addresses.)
assert(!opts.jitFlags->IsSet(JitFlags::JIT_FLAG_ALIGN_LOOPS));

codeGen->SetAlignLoops(false); // loop alignment not supported for prejitted code
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved
}
else
{
codeGen->SetAlignLoops(opts.jitFlags->IsSet(JitFlags::JIT_FLAG_ALIGN_LOOPS));
codeGen->SetAlignLoops(JitConfig.JitAlignLoops() == 1);
}
}

Expand Down
39 changes: 39 additions & 0 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -6367,6 +6367,8 @@ class Compiler

void optFindNaturalLoops();

void optIdentifyLoopsForAlignment();

// Ensures that all the loops in the loop nest rooted at "loopInd" (an index into the loop table) are 'canonical' --
// each loop has a unique "top." Returns "true" iff the flowgraph has been modified.
bool optCanonicalizeLoopNest(unsigned char loopInd);
Expand Down Expand Up @@ -9036,6 +9038,43 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
bool dspGCtbls; // Display the GC tables
#endif

// Default numbers used to perform loop alignment. All the numbers are choosen
// based on experimenting with various benchmarks.

// Default minimum loop block weight required to enable loop alignment.
#define DEFAULT_ALIGN_LOOP_MIN_BLOCK_WEIGHT 4

// By default a loop will be aligned at 32B address boundary to get better
// performance as per architecture manuals.
#define DEFAULT_ALIGN_LOOP_BOUNDARY 0x20
kunalspathak marked this conversation as resolved.
Show resolved Hide resolved

// For non-adaptive loop alignment, by default, only align a loop whose size is
// at most 3 times the alignment block size. If the loop is bigger than that, it is most
// likely complicated enough that loop alignment will not impact performance.
#define DEFAULT_MAX_LOOPSIZE_FOR_ALIGN DEFAULT_ALIGN_LOOP_BOUNDARY * 3

#ifdef DEBUG
// Loop alignment variables

// If set, for non-adaptive alignment, ensure loop jmps are not on or cross alignment boundary.
bool compJitAlignLoopForJcc;
#endif
// For non-adaptive alignment, minimum loop size (in bytes) for which alignment will be done.
unsigned short compJitAlignLoopMaxCodeSize;

// Minimum weight needed for the first block of a loop to make it a candidate for alignment.
unsigned short compJitAlignLoopMinBlockWeight;

// For non-adaptive alignment, address boundary (power of 2) at which loop alignment should
// be done. By default, 32B.
unsigned short compJitAlignLoopBoundary;

// Padding limit to align a loop.
unsigned short compJitAlignPaddingLimit;

// If set, perform adaptive loop alignment that limits number of padding based on loop size.
bool compJitAlignLoopAdaptive;

#ifdef LATE_DISASM
bool doLateDisasm; // Run the late disassembler
#endif // LATE_DISASM
Expand Down
Loading