UE4 Mesh Piple Line & Auto Instacing

如需转载本文,请声明作者及出处。

Mesh Piple Line

Drag a mesh into the scene

Render thead will create all mesh processor.

processor call AddMeshBatch, add draw cmd to cache buckets.

For example:

Compute Visibility

  1. FrustumCull

  2. Cal View.PrimitiveVisibilityMap , all visible objects

  3. Create FRelevancePacket with View.PrimitiveVisibilityMap

  4. FRelevancePacket.MarRelevant:take draw cmd from cache buckets

  5. FRelevancePacket.RenderThreadFinalize: add draw cmd to ViewCommands.MeshCommands

DynamicMesh

SetupMeshPass

Insert commands into View.Parallelmeshdrawcommandpasses according to different pass types Each pass takes out the draw cmd according to the pass type For example:

Auto Instancing

TSet CachedMeshDrawCommandStateBuckets.

FMeshDrawCommandStateBucke:

MeshDrawCommandKeyFuncs:

Analyze whiche drawcall can be instacing

GetDynamicInstancingHash

MatchesForDynamicInstancing

can be instacing:GetKeyHash and MatchesForDynamicInstancing return True MeshPiPleLine Finalize, get the bucket Id of the incoming mesh draw cmd and assign it to different basket and increate the count MeshPassProcessor.SubmitDraw Base on whether the numinstances of each mes draw cmd are greater than 1, it is determined whether to execute the instacing draw call

How to update uniformbuffer

GPU Scene Store the scene information, such as the world matrix, world coordinates, bounding box, lightmap data, etc. of each object in a uniformbuffer. Each object is distinguished by a primitiveid, which is used to address in GPU scene when querying parameters.

A large number of shaders must be adjusted accordingly. All the original shader parameters for each object must be obtained with primitive ID instead.

1.FPrimitiveSceneData

FPrimitivescenedata needs to be completely consistent with the data structure of FPrimitiveuniformshaderparameters in the CPU Layer. SceneData.ush A fprimitivescenedata structure is added to replace the original primitive for each object, and an interface getprimitivedata (uint primitiveda) is added struct FPrimitiveSceneData

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
struct FPrimitiveSceneData
{
    float4x4 LocalToWorld;
    float4 InvNonUniformScaleAndDeterminantSign;
    float4 ObjectWorldPositionAndRadius;
    float4x4 WorldToLocal;
    float4x4 PreviousLocalToWorld;
    float4x4 PreviousWorldToLocal;
    float3 ActorWorldPosition;
    float UseSingleSampleShadowFromStationaryLights;
    float3 ObjectBounds;
    float LpvBiasMultiplier;
    float DecalReceiverMask;
    float PerObjectGBufferData;
    float UseVolumetricLightmapShadowFromStationaryLights;
    float UseEditorDepthTest;
    float4 ObjectOrientation;
    float4 NonUniformScale;
    float3 LocalObjectBoundsMin;
    float3 LocalObjectBoundsMax;
    uint LightingChannelMask;
    uint LightmapDataIndex;
    int SingleCaptureIndex;
};
1
<wyn>

2.Compatibility, for devices that do not support gpuse

// Route to Primitive uniform buffer

#define GetPrimitiveData(x) Primitive

3.PrimitiveId Buffer

Primitiveid is added as an additional attribute of vertex,

In order to pass in the primitiveid, a property bsupport primitiveidstream is added to the vertexfactory. When initializing the vertexfactory, a vertexstream is added to store the primitiveid

1
2
3
4
5
6
void FHairStrandsVertexFactory::InitRHI()
{
   Elements.Add(AccessStreamComponent(FVertexStreamComponent(&GPrimitiveIdDummy, 0, 0, sizeof(uint32), VET_UInt, EVertexStreamUsage::Instancing), 13));
        PrimitiveIdStreamIndex = Elements.Last().StreamIndex;

}

Vertex stream usage is set to instancing, which ensures that the primitiveidvertex stream's address is per instance instead of per vertex.

4.BuildMeshDrawCommandPrimitiveIdBuffer Assemble the primitiveids of all meshdrawcommands into a buffer, that is, primitiveids

1
2
3
4
5
6
7
8
9
10
void BuildMeshDrawCommandPrimitiveIdBuffer
{
    //@todo - refactor into instance step rate in the RHI
    for (uint32 InstanceFactorIndex = 0; InstanceFactorIndex < InstanceFactor; InstanceFactorIndex++, PrimitiveIdIndex++)
    {
        //@todo - refactor into memcpy
        checkSlow(PrimitiveIdIndex < MaxPrimitiveId);
        PrimitiveIds[PrimitiveIdIndex] = VisibleMeshDrawCommand.DrawPrimitiveId;
    }
}

5.SubmitDraw Pass in by setting vertexbuffer stream /**

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
/**
 * Build mesh draw command primitive Id buffer for instancing.
 * TempVisibleMeshDrawCommands must be presized for NewPassVisibleMeshDrawCommands.
 */
void BuildMeshDrawCommandPrimitiveIdBuffer
{

    for (int32 VertexBindingIndex = 0; VertexBindingIndex < MeshDrawCommand.VertexStreams.Num(); VertexBindingIndex++)
    {
        const FVertexInputStream& Stream = MeshDrawCommand.VertexStreams[VertexBindingIndex];
        if (MeshDrawCommand.PrimitiveIdStreamIndex != -1 && Stream.StreamIndex == MeshDrawCommand.PrimitiveIdStreamIndex)
        {
            RHICmdList.SetStreamSource(Stream.StreamIndex, ScenePrimitiveIdsBuffer, PrimitiveIdOffset);
            StateCache.VertexStreams[Stream.StreamIndex] = Stream;
        }
        else if (StateCache.VertexStreams[Stream.StreamIndex] != Stream)
        {
            RHICmdList.SetStreamSource(Stream.StreamIndex, Stream.VertexBuffer, Stream.Offset);
            StateCache.VertexStreams[Stream.StreamIndex] = Stream;
        }
    }
}

GPUScene Buffer

Upload and update each frame when rendering

compute shader: transmit gpu scene buffer, which is implemented in ByteBuffer.cpp and ByteBuffer.ush

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
void FScatterUploadBuilder::UploadTo(FRHICommandList& RHICmdList, FRWBufferStructured& DstBuffer)
{
    RHIUnlockVertexBuffer(ScatterBuffer.Buffer);
    RHIUnlockVertexBuffer(UploadBuffer.Buffer);

    ScatterData = nullptr;
    UploadData = nullptr;

    auto ShaderMap = GetGlobalShaderMap(GMaxRHIFeatureLevel);

    TShaderMapRef<FScatterCopyCS> ComputeShader(ShaderMap);

    const FComputeShaderRHIParamRef ShaderRHI = ComputeShader->GetComputeShader();
    RHICmdList.SetComputeShader(ShaderRHI);

    SetShaderValue(RHICmdList, ShaderRHI, ComputeShader->NumScatters, NumScatters);
    SetSRVParameter(RHICmdList, ShaderRHI, ComputeShader->ScatterBuffer, ScatterBuffer.SRV);
    SetSRVParameter(RHICmdList, ShaderRHI, ComputeShader->UploadBuffer, UploadBuffer.SRV);
    SetUAVParameter(RHICmdList, ShaderRHI, ComputeShader->DstBuffer, DstBuffer.UAV);

    RHICmdList.DispatchComputeShader(FMath::DivideAndRoundUp<uint32>(NumScatters, FScatterCopyCS::ThreadGroupSize), 1, 1);

    SetUAVParameter(RHICmdList, ShaderRHI, ComputeShader->DstBuffer, FUnorderedAccessViewRHIRef());
}
// ByteBuffer.ush
 ...
uint NumScatters;
Buffer<float4> UploadBuffer;
Buffer<uint> ScatterBuffer;

[numthreads(THREADGROUP_SIZE, 1, 1)]
void ScatterCopyCS( uint3 DispatchThreadId : SV_DispatchThreadID )
{
    uint ScatterIndex = DispatchThreadId.x;

    if (ScatterIndex < NumScatters)
    {
        uint DestIndex = ScatterBuffer.Load(ScatterIndex);
        uint SrcIndex = ScatterIndex;
        DstBuffer[DestIndex] = UploadBuffer.Load(SrcIndex);
    }
}

End:

summary from : Mesh Auto-Instancing on Mobile

Auto-instancing on mobile mainly benefits projects that are heavily CPU-bound rather than GPU-bound. While it is unlikely that enabling auto-instancing will harm a GPU-bound project, you are less likely to see significant performance improvements from using it.

If a game is heavily memory-bound, it may be more beneficial to turn off r.Mobile.UseGPUSceneTexture and use the buffer instead, with the understanding that it will not work on Mali devices.

My thoughts: 1.work on Mali devices. huawei jj~

2.Updating buffer per frame brings consumption.

If so many many number of primitives , execute computer shader will also cause certain pressure on GPU

That is because:" While it is unlikely that enabling auto-instancing will harm a GPU-bound project, you are less likely to see significant performance improvements from using it."

3.GPU scene , If the buff is not enough, It maybe use other buffs such as texture buff, which is more expensive