1
0
mirror of https://github.com/microsoft/DirectXMath synced 2024-11-21 11:50:05 +00:00
9 Implementation
Chuck Walbourn edited this page 2024-10-06 09:54:25 -07:00

Be sure to review Microsoft Learn: Library Internals.

Compiler conformance

For Visual C++, the projects make use of the default C++11/C++14 mode rather than /std:c++17 mode. The library does not make use of newer C++17 language & library features such as string_view, static_assert without a message, etc. although that may change in the future. The projects make use of /Wall, /permissive-, /Zc:__cplusplus, and /analyze to ensure a high-level of C++ conformance.

For clang/LLVM for Windows, there is a CMakeList.txt provided to validate the code and ensure a high-level of conformance. This primarily means addressing warnings generated using /Wall -Wpedantic -Wextra.

Language extensions

DirectXMath is written using standard Intel-style intrinsics, which should be portable to other compilers. The ARM and ARM64 codepaths use ARM-style intrinsics (earlier versions of the library used Visual C++ specific __n64 and __n128), so these are also portable.

The DirectXMath library make use of two commonly implemented extensions to Standard C++:

  • anonymous structs, which are widely supported and are part of the C11 standard. Note that the library also uses anonymous unions, but these are part of the C++ and C99 standard.
  • #pragma once rather than old-style #define based guards, but are widely supported

Because of these, DirectXMath is not compatible with Visual C++'s /Za switch which enforces ISO C89 / C++11. It does work with /permissive-.

Naming conventions

  • PascalCase for class names, methods, functions, and enums.
  • camelCase for class member variables, struct members
  • UPPERCASE for preprocessor defines (and nameless enums)

The library does not generally make use of Hungarian notation which as been deprecated for Win32 C++ APIs for many years, with the exception of a few uses of p for pointers and sz for strings.

Type usage

The use of Standard C++ types is preferred including the fundamental types supplied by the language (i.e. int, unsigned int, size_t, ptrdiff_t, bool, true/false, char, wchar_t) with the addition of the C99 fixed width types (i.e. uint32_t, uint64_t, intptr_t, uintptr_t, etc.)

Avoid using Windows "portability" types except when dealing directly with Win32 APIs: VOID, UINT, INT, DWORD, FLOAT, BOOL, TRUE/FALSE, WCHAR, CONST, etc.

Error reporting

As a low-level math library, DirectXMath does not make use of C++ exception handling or HRESULT COM-style error values. Generally, parameter validation is limited to assert macros. All functions should be annotated with noexcept.

SAL annotation

The DirectXMath library makes extensive use of SAL2 annotations (_In_, _Outptr_opt_, etc.) which greatly improves the accuracy of the Visual C++ static code analysis (also known as PREFAST). The standard Windows headers #define them all to empty strings if not building with /analyze, so they have no effect on code-generation.

Calling-conventions

One of the more complicated aspects of DirectXMath's implementation is implementing the various calling-conventions optimally for SIMD which changes per architecture. This is detailed on Microsoft Learn.

128-bit SIMD

XMVECTOR XM_CALLCONV XMVectorHermite(FXMVECTOR Position0, FXMVECTOR Tangent0, FXMVECTOR Position1, GXMVECTOR Tangent1, float t) noexcept;
  • XMVECTOR is the standard 128-bit SIMD register type, and we return it by value.

  • XM_CALLCONV is set to __vectorcall where supported, __fastcall otherwise unless the target compiler doesn't support it.

  • FXMVECTOR is used for the first three SIMD parameters to support SIMD-passing behavior for _fastcall.

  • GXMVECTOR is used for the fourth SIMD parameter to support _vectorcall and the ARM ABI passing of the first four SIMD registers.

  • HXMVECTOR is used for the fifth and six SIMD parameter to support _vectorcall.

  • CXMVECTOR is used for all remaining SIMD registers which passes by 'const ref'.

In configurations where the platform doesn't support 6 SIMD registers, the types are equivalent to CXMMVECTOR.

4x4 Matrix

XMVECTOR XM_CALLCONV XMVector3Project(FXMVECTOR V, float ViewportX, float ViewportY, float ViewportWidth, float ViewportHeight, float ViewportMinZ, float ViewportMaxZ, FXMMATRIX Projection, CXMMATRIX View, CXMMATRIX World) noexcept;

Because of heterogeneous vector aggregates a matrix which consists of 4 SIMD values can be passed as if it were 4 individual SIMD values.

  • FXMMATRIX generally this is used if there are 0, 1, or 2 XMVECTOR parameters preceding the matrix.

  • CXMMATRIX is sued for all other matrix parameters which passes by 'const ref'.

Compiler directives

DirectXMath makes use of many preprocessor defines to target many different instruction sets and architectures.

A full table of defines can be found on Microsoft Learn.

inline XMVECTOR XM_CALLCONV XMVectorRound(FXMVECTOR V) noexcept
{
#if defined(_XM_NO_INTRINSICS_)

    XMVECTORF32 Result = { { {
            MathInternal::round_to_nearest(V.vector4_f32[0]),
            MathInternal::round_to_nearest(V.vector4_f32[1]),
            MathInternal::round_to_nearest(V.vector4_f32[2]),
            MathInternal::round_to_nearest(V.vector4_f32[3])
        } } };
    return Result.v;

#elif defined(_XM_ARM_NEON_INTRINSICS_)
#if defined(_M_ARM64) || defined(_M_HYBRID_X86_ARM64) || defined(_M_ARM64EC) || __aarch64__

    // ARM_NEON v8 implementation

#else

    // ARM-NEON v7 implementation

#endif
#elif defined(_XM_SSE4_INTRINSICS_)

    // SSE 4.1 implementation

#elif defined(_XM_SSE_INTRINSICS_)

    // SSE/SSE2 implementation (the minimum required for x86/x64)

#endif
}

Instruction Set Usage

See this blog series for more details on how each is applied to DirectXMath:

Implementation macros

  • XM_ALIGNED_DATA is used to declare aligned data variables.

  • XM_ALIGNED_STRUCT is used to declare an aligned struct.

x86/x64

  • XM_STREAM_PS, XM256_STREAM_PS, and XM_SFENCE which are controlled by the _XM_NO_MOVNT_ define.

  • XM_PERMUTE_PS is _mm_permute_ps when building for AVX and _mm_shuffle_ps when building for SSE/SSE2.

  • XM_FMADD_PS and XM_FNMADD_PS which are controlled by the use of FMA3 or not.

  • XM_LOADU_SI16 is a fix-up for older versions of GNUC which were missing _mm_loadu_si16.

ARM/ARM64

  • XM_PREFETCH is __prefetch or __builtin_prefetch for ARM/ARM64.