Skip to content

DaymareOn/hdtSMP64

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hdtSMP with CUDA for Skyrim SE/AE/VR

Fork of [https://github.com/Karonar1/hdtSMP64] by Karonar1, of fork of version by aers, from original code by hydrogensaysHDT

Changes

  • Managed AE version, and corresponding SKSE (currently 1.6.353 and 2.1.5).
  • Added maximum angle in ActorManager. A max angle can be used to specify physics on NPCs within a field of view. 0 degrees represents straight in front of the camera. Default is 45 which is treated as + or - 45 degrees, so 90 total degrees. 180 would be all around.
  • Merged the CUDA version in the master branch through preprocessor directives to allow a continuous evolution, and easier building.
  • Added CUDA support for several parts of collision detection (still a work in progress). This includes everything that had OpenCL support in earlier releases, as well as the final collision check. CPU collision is still fully supported, and is used as a fallback if a CUDA-capable GPU is not available.
  • Added distance check in ActorManager to disable NPCs more than a certain distance from the player. This resolves the massive FPS drop in certain cell transitions (such as Blue Palace -> Solitude). Default maximum distance is 500.
  • New method for dynamically updating timesteps. This is technically less stable in terms of physics accuracy, but avoids a problem where one very slow frame (for example, where NPCs are added to the scene) causes a feedback loop that drops the framerate. If you saw framerates drop to 12-13 in the Blue Palace or at the Fire Festival, this should help.
  • Added can-collide-with-bone to mesh definitions, as a natural counterpart to no-collide-with-bone.
  • Added new "external" sharing option, for objects that should collide with other NPCs, but not this one. Good for defining the whole body as a collision mesh for interactions between characters.
  • Significant refactoring of armor handling in ActorManager, to be much stricter about disabling systems and reducing gradual FPS loss.
  • Changed prefix mechanism to use a simple incrementing value instead of trying to use pointers to objects. Previously this could lead to prefix collisions with a variety of weird effects and occasional crashes.
  • Skeletons should remain active as long as they are in the same cell as (and close enough to) the player character. Resolves an issue where entering the Ancestor Glade often incorrectly marked skeletons as inactive and disabled physics.
  • Added smp list console command to list tracked NPCs without so much detail - useful for checking which NPCs are active in crowded areas. NPCs are now sorted according to active status, with active ones last.
  • New mechanism for remapping mesh names in the defaultBBPs.xml file, allowing much more concise ways of defining complex collision objects for lots of armors at once.
  • The code to scan defaultBBPs.xml can now handle the structure of facegen files, which means head part physics should work (with limitations) on NPCs without having to manually edit the facegen data.
  • New bones from facegen files should now be added to the head instead of the NPC root, so they should be positioned correctly if there is no physics for them or after a reset.

CUDA support

CUDA support is disabled by default, but can be enabled in configs.xml or from the console. It will automatically fall back to the CPU algorithm if you do not have any CUDA capable cards. However, it does not check capabilities of any cards it finds, so may crash if your card is too old. It was developed for a GeForce 10 series card, so should work on those or anything newer.

The absolute minimum required compute capability is 3.5, to support dynamic parallelism. However, the plugin is compiled for compute capability 5.0 for better performance with atomic operations. Therefore you will need at least a Maxwell card to use the stock plugin, or a late model Kepler card if you compile it yourself.

If you have more than one CUDA-capable card, you can select the one used for physics calculations in the configuration file. The code is designed to allow it to be changed at runtime, but there is currently no console command to do this. By default it will use device 0, which is usually the most powerful card available.

The following parts of the collision algortihm are currently GPU accelerated:

  • Vertex position calculations for skinned mesh bodies
  • Collider bounding box calculations for per-vertex and per-triangle shapes
  • Aggregate bounding box calculation for internal nodes of collider trees
  • Building collider lists for the final collision check
  • Sphere-sphere and sphere-triangle collision checks
  • Merging collision results (note that this may reduce performance for objects with lots of bones, as the merge buffer can get quite big - still working on this!)

The following parts are still CPU-based:

  • Converting collision results to manifolds for the Bullet library to work with
  • And, of course, the solver itself, which is part of the Bullet library, not the HDT-SMP plugin

This is still experimental, and may not give good performance. The old CPU collision algorithm was heavily optimized, so matching its framerate is not easy. You will need a high end GPU and a low end CPU to see any real performance benefits.

  • On a 6850K processor (6 cores, 3.6GHz) with a 1080Ti GPU, framerate in crowded areas is a little worse than with the CPU-only algorithm. Most of the time, both algorithms easily reach the framerate cap at 60fps.
  • On the same hardware with only two cores enabled, the total collision time is around 2-3 times faster on GPU than CPU. Of course, this translates to a less impressive framerate difference, because collision checking is only one part of the HDT-SMP algorithm.

Radeon support?

Nope, sorry. CUDA and nVidia cards are pretty much the industry standard for scientific computing, so that's what I use. In any case, I can't support GPU architectures that I don't have. The plugin should work fine in CPU mode with any type of card - I have tested it with Radeon 7850 and no nVidia drivers installed.

The plugin will use the most powerful available CUDA-capable card, regardless of whether it's being used for anything else. In theory, it's possible to have a Radeon as the primary graphics card, and an nVidia card in the same machine for CUDA and physics. I've tested this with two GeForce cards (a 1080Ti and a 1030) in the same system, but not a Radeon and a GeForce.

Note about NPC head parts

Head parts work fine for NPCs without valid facegen data, but this isn't very useful because it triggers the infamous dark face bug. Special restrictions apply to NPCs that do have facegen data:

  • Only one XML file can be used per face, even if was built from multiple physics-enabled parts.
  • NiStringExtraData nodes aren't automatically copied into the facegen file, so physics won't work automatically. Either do it manually in NifSkope or map one of the head parts to a file in defaultBBPs.xml.
  • Bones that aren't explicitly referenced by any mesh are removed when facegen data is generated, and can't be used as kinematic objects in constraints. Replace references to these with the NPC head (which should always be present). You may also need to set the frame of reference origin to the position of the missing bone relative to the head to get correct constraint behavior.

Console commands

The smp console command will print some basic information about the number of tracked and active objects. The plugin recognizes the following optional parameters:

  • smp reset reloads the configs.xml file, attempts to reload all meshes and reset the whole HDT-SMP system. However, it is a little buggy and may fail to reload some meshes or constraints properly.
  • smp gpu toggles the CUDA collision algorithm, if there is at least one CUDA device available. If there is no device available, it does nothing.
  • smp timing starts a timing sequence for the collision detection algorithm. The next 200 frames will switch between CPU and GPU collision. Once complete, mean and standard deviation of timings for the two collision algorithms are displayed on the console.
  • smp dumptree dumps the entire node tree of the current targeted NPC to the log file.
  • smp detail shows extended details of all tracked actors, including active and inactive armour and head parts.
  • smp list shows a more concise list of tracked actors.

Coming soon (maybe)

  • Reworked tag system for better compartmentalized .xml files.
  • Continued work on CUDA algorithms.

Known issues

  • Several options, including shape and collision definitions on bones, exist but don't seem to do anything.
  • Sphere-triangle collision check without penetration defined is obviously wrong, but fixing the test doesn't improve things. Needs further investigation.
  • Smp reset doesn't reload meshes correctly for some items (observed with Artesian Cloaks of Skyrim). Suspect references to the original triangle meshes are being dropped when they're no longer needed. We could keep ownership of the meshes, but it seems pretty marginal and a waste of memory. It also breaks some constraints for NPCs with facegen data.
  • It's not possible to refer to bones in facegen data that aren't used explicitly by at least one mesh. Most HDT hairs won't work as non-wig hair on NPCs without altering constraints. Probably possible but annoying to fix.
  • Physics enabled hair colours are sometimes wrong. This appears to happen there are two NPCs nearby using the same hair model.
  • Physics may be less stable due to the new dynamic timestep calculation.
  • Probably any open bug listed on Nexus that isn't resolved in changes, above. This list only contains issues I have personally observed.
  • Known bugs on the Nexus

Build Instructions

Here is the article by DayDreamingDay. These should be complete instructions to set up and build the plugin, without assuming prior coding experience. I'm checking everything out into D:\Dev-noAVX and building without AVX support, but of course you can use any directory.

You will need:

  • Visual Studio 2019 (any edition)
  • Git
  • CMake
  • CUDA 11.6

Open a VS2019 command prompt ("x64 Native Tools Command Prompt for VS2019"). Download and build Detours and Bullet:

d:
mkdir Dev-noAVX
cd Dev-noAVX
git clone https://github.com/microsoft/Detours.git
git clone https://github.com/bulletphysics/bullet3.git
cd Detours
nmake
cd ..\bullet3
cmake .

If you want AVX support in Bullet, use cmake-gui instead of cmake, and check the USE_MSVC_AVX box before clicking Configure then Generate. This should give a fairly significant performance boost if your CPU supports it.

  • Note that AVX support in Bullet and the HDT-SMP plugin itself are independent configuration options. Enable it in both for maximum performance; disable it in both for maximum compatibility.

Open D:\Dev-noAVX\bullet3\BULLET_PHYSICS.sln in Visual Studio, select the Release configuration, then Build -> Build solution.

Download skse64_2_00_19.7z and unpack into Dev-noAVX (source code is included in the official distribution), then get the HDT-SMP source:

cd D:\Dev-noAVX\skse64_2_00_19\src\skse64
git init
git remote add origin https://github/com/Karonar1/hdtSMP64.git
git fetch
git checkout master

Open D:\Dev-noAVX\skse64_2_00_19\src\skse64\hdtSMP64.sln in Visual Studio. If you are asked to retarget projects, just accept the defaults and click OK.

Open properties for the hdtSMP64 project. Select "All Configurations" at the top, and the C/C++ page. Add the following to Additional Include Directories (just delete anything that was there before):

  • D:\Dev-noAVX\Detours\include
  • D:\Dev-noAVX\bullet3\src
  • D:\Dev-noAVX\skse64_2_00_19\src

On the Linker -> General page, add the following to Additional Library Directories:

  • D:\Dev-noAVX\bullet3\lib\Release
  • D:\Dev-noAVX\Detours\lib.X64

Open properties for the skse64 project. Select General, and change Configuration Type from "Dynamic Library (.dll)" to "Static Library (.lib)".

Credits

  • hydrogensaysHDT - Creating this plugin
  • aers - fixes and improvements
  • ousnius - some fixes, and "consulting"
  • karonar1 - bug fixes (the real work), I'm just building it

About

hdt-smp for 64 bit Skyrim

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C++ 93.7%
  • C 5.1%
  • Other 1.2%