Mali G78 is the 2021 GPU

Mali G78 is the 2021 GPU

24 cores, 25 percent perf up

Arm Mali GPUs are the number one shipping graphics processor covering gaming, XR, AI, and entertainment. ARM shared with us the updated GPU roadmap and introduced Mali-G78.

Mali GPUs are used in about 85 percent smart TVs and 50 percent smartphones. Over billion Mali GPUs were shipped alone in 2019. The new Mali-G78 can help Arm achieve a PC and console looking graphics, better XR, and machine learning performance compared to the previous generations.

25 percent better performance

Comparing mixed complex workloads on Mali-G78 to Mali-G77,  including architectural, process, and other improvements, the new GPU scores about 25 percent better performance. Arm compared projected performance of Mali-G78 to existing 2019 devices. We had a long and pleasant chat with Stephen Barton, product manager, Client Line of Business, Daniel Kerry, principal engineer, Central Engineering, and Ian Hutchinson, director of outbound marketing, Client Line of Business at Arm.

When asked to elaborate where the 25 percent performance improvement comes from, we learned that some 15 percent comes shrinking the manufacturing process from current 7 to soon to be introduced 5nm SoC. The rest comes from the internal optimizations. The 5nm silicon will also bring the smaller die size.

Mali-G78 brings 15 percent more performance density, 10 percent better energy efficiency, and 15 percent machine learning performance uplift.


18 and 24 versions

It supports up to 24 cores allowing the highest-ever performance point. Game-changing’ Asynchronous Top Level maximizes performance productivity on cores. The new Fused Multiply-Add (FMA) unit is built from ground up as it is heavily used in Graphics and ML processing and results with a 30 percent energy reduction in the unit.

The Mali-G78 is Arm’s highest performing GPU based on Valhall architecture. Mali-G77 was also based on the Valhall architecture focusing on the Superscalar engine, Unified memory, and simplified scalar ISA. One big change from G77 to G78 is the core count that increased from maximal 16 to now 24.


Asynchronous top-level introduced two asynchronous clock domains, one for shared cores and one for job manager, tiler, MMU, control fabric, and L2 cache. In Asynchronous top-level shaders can run two times faster than the rest of the GPU, allowing higher performance.

Arm showed that 24 core version of the GPU scores thirteen percent more in benchmarks than the 18 core version. Once Arm uses the Asynchronous top level, the performance of the 18 core version increases by an additional eight percent and whooping 23 percent over the nominal 18 core performance.

Gaming performance

In gaming applications, 24 core version scores eleven percent faster than 18 core, while 18 core version with Asynchronous level scores 14 percent higher. Mali-G78 24 core version with Asynchronous scores 28 percent higher.

Like in any mobile processor, Asynchronous Top level will increase the clock when needed, get the job done- get the right frame rate, it returns to a standard clock for sustainable performance and energy consumption.

Average energy goes down by 10 percent in similar conditions when comparing G78 to last year’s G77.


Energy Usage using Asynchronous Top Level can result in six to thirteen percent energy consumption reduction.

The highest benefits come to complex gaming scenes involving smoke, grass, and threes. Optimizing content can yield a five to seventeen percent performance increase in actual games over Mali-G77.

Arm Performance advisor tool helps developers achieve higher performance on the Arm hardware. Frame analysis lets developers easily understand the bottlenecks. Last but not least, the new Mali-G78 GPU brings fifteen percent higher machine learning performance.

Machine Learning on GPU covers a variety of mobile use-cases, including security (e.g., face unlock), video and camera modes, gaming, and Augmented Reality (AR). Still, these workloads also run in collaboration with NPU too.

Asynchronous Top Level boosts ML performance through clocking shader cores. One can expect Mali-G78 based phones next year.