hoakley November 22, 2023 Macs, Technology

What has changed in CPU cores in M3 chips?

If you read the initial reviews of Apple’s new M3-based Macs, you’d be forgiven for thinking little had changed in their CPU cores, apart from a rejigging of numbers and an increase in the maximum frequency of their P cores. As my MacBook Pro 16-inch M3 Pro arrived three days early, this article presents a tentative first look at what has changed in their CPU cores, and from that, how you might choose the right chip for your next Apple silicon Mac. Like Apple, I’m going to make comparison between M1 and M3 chips, as in most respects discussed here, M2 CPU cores didn’t change as much from those in the M1, and I’ve had and tested four different M1 models.

Cluster size

The most obvious difference between M1/M2 CPU cores and those in M3 chips is in the size of their clusters. In M1 and M2 chips, CPU cores are grouped into clusters of 2 or 4, within which cores share L2 cache and run at the same frequency. In M3 chips (certainly the M3 Pro, and I understand the M3 Max as well) clusters are composed of 4 (M3 basic) or 6 (Pro and Max).

This change has impact on chip selection.

macOS tries to allocate threads running at higher priorities, as set by their Quality of Service (QoS), to P cores whenever possible. This is what we want, as it ensures that the apps we’re running deliver best performance, albeit at higher power consumption. When the P cores are already fairly fully occupied, macOS may instead run high QoS threads on the E cores. While it has compensatory mechanisms for doing this (see below), it may mean that those threads run more slowly than we’d want.

If you already have an Apple silicon Mac and are wondering whether to upgrade to an M3 model, then you can use this as a way of working out which chip you’ll need. Load your current Mac up with the apps you normally use together when working, and watch their use in Activity Monitor’s CPU History window. If its P cores are fully occupied much of the time, and that workload often spills over to the E cores, then you should aim for an M3 with more P cores; if there’s always adequate spare capacity on the Mac’s P cores, then you probably wouldn’t get much added value from an M3 with more P cores.

This changed cluster size in M3 chips is significant, as it could not only have effects on performance, but also on power use. When running at full pelt, all six P cores in an M3 Pro cluster can use as much as 5.5 W, while six in an M1 Pro will use about 5.8 W.

E cores

From my preliminary measurements, E cores in an M3 Pro differ little from those in an M1 Pro, except for their frequency management, which is determined by macOS. M1 E cores have a maximum frequency of 2064 MHz, while those in M3 chips reach 2748 MHz. But, when running low QoS threads in the M1 Pro chip, E core frequency is set to 972 MHz, and that in the M3 Pro is 744 MHz, giving a ratio of 1.3 for M1/M3. Integer, floating point, NEON and Accelerate performance at those frequencies matches the difference in frequency, at 1.3-1.4. That means the M3’s E cores run background threads slightly more slowly than the M1 because macOS sets their frequency lower.

That isn’t true, though, when the E cores are being used to run high QoS threads that couldn’t be accommodated on P cores. Those are run at maximum frequency, which favours the M3 Pro by a factor of 1.3.

Replacing an M1 Pro with an M3 Pro thus slows background tasks, but accelerates high QoS tasks that have overflowed onto the E cores.

P cores

There are greater differences between the P cores in an M1 Pro and those in an M3 Pro. M1 P cores have a maximum frequency of 3228 MHz, while M3 P cores run up to a maximum of 4056 MHz, a ratio of 1.26 in favour of the M3. A similar ratio is seen for integer and floating point performance, at 1.30 and 1.28 respectively, but vector performance using NEON or Apple’s Accelerate library is faster still on the M3 Pro P core, at ratios of 1.67 and 1.63.

This suggests that improved integer and floating point performance is largely (if not completely) the result of increased core frequency, but that there are likely to be further improvements in vector processing. Perhaps Apple has improved the design of the NEON unit in M3 P cores.

P v E

Aside from any improvement in vector processing in M3 P cores, M1 and M3 cores show different patterns of performance under load. These are perhaps clearest in the two charts below. Loads were applied using AsmAttic, which here runs tight loops of floating point arithmetic that remains in-core, accessing only registers and not memory. These charts show the time taken to complete one or more threads, each running 200 million loops of assembly code. Each thread is run as if on a single core at 100% active residency, i.e. it’s one core’s worth of performance, so 6 threads will fully load a 6-core P cluster.

m1m3psummary

This chart shows the total time to complete running all the threads, by the number of threads (effectively the number of cores), for an M1 Pro in red, and an M3 Pro in black. These threads were all run at maximum QoS (33), so were run preferentially on P cores. Those run on the 8 P cores in an M1 Pro (red) show a near-perfect linear relationship, with each thread fully occupying one core for a period of 1.3 seconds.

The lower black line shows equivalent results for the 6 P and 6 E cores in an M3 Pro. For 1-6 threads, these were all run on its P cores, then on an increasing number of its E cores as well. That is quite linear up to 6 threads, where the time taken is significantly less than that of the M1 Pro. By 6 threads, that difference is over 1 second; in the time the M1 Pro took to run 5 threads, the M3 Pro had almost completed 6.

From 6-8 threads, the two lines run in parallel, indicating that the M3 E cores were delivering similar performance to the P cores in the M1 Pro. You wouldn’t want to run more than 8 threads, though, on the 8P + 2E cores of the M1 Pro, as they would risk displacing background threads on the two E cores. On the M3 Pro, you can go safely up to a total of 10 threads, on 6 P and 4 E cores, without compromising background threads. Indeed, because the E cluster is running at maximum frequency, background tasks might even complete more quickly under that load.

m1m3esummary

Differences are reversed when running low QoS threads on the E cores, as shown here, again with the 2 E cores of the M1 Pro in red, and the 6 E cores of the M3 Pro in black.

The frequency of the M1 Pro E cores is increased when they’re running a second thread, which accounts for the small change in total time from 1-2 threads. However, with more than 2 threads, further threads are queued, and performance suffers as a result. The 6 E cores of the M3 Pro have three times the capacity for background threads, and although running them more slowly, they cope with up to 6 threads, beyond which those threads are queued, and the time required to complete them rises more rapidly.

CPU History

The most accessible window you have on core load and performance is CPU History in Activity Monitor. Although it can cast light on the use of different types of core, and help you decide whether your next Mac needs more cores, it’s also seriously misleading, as shown in the screenshot below.

m1m3cpuhist

This shows what happened during two tests using my app AsmAttic: in the first, responsible for the large blocks of green in the E cores, I ran a load of 6 threads at low QoS; in the second, reflected in the much narrower blocks for the P cores below, I ran the same load of 6 threads on the P cores. When the E cores were fully loaded, their frequency was 744 MHz, that’s only a little above their idle, but when the P cores were fully loaded, they were running at close to their maximum at just under 4000 MHz. This persistent failure in Activity Monitor to take core frequency into account gives seriously misleading impressions.

Summary

There’s much more to comparing CPU cores than multi-core benchmarks.
If you already have an Apple silicon Mac, observe patterns of use of P and E cores during normal use to determine whether you need a Mac with more cores.
CPU core cluster size has changed in M3 chips, from 2-4 to 4-6, which is likely to have extensive effects on performance and power use.
M3 E cores appear similar to those in the M1, but have a higher maximum frequency, and are run at lower frequency for background tasks.
M3 P cores appear to have improved performance in the vector (NEON) unit, and have a higher maximum frequency.
Increased E core count increases the capacity to accommodate overflow of high QoS threads from P cores.
macOS core management has also changed.

I will post further analyses of the M3 Pro chip’s CPU performance as I assess the data.

43Comments

Add yours

1

Enzo Vincenzo on November 22, 2023 at 8:21 am

Thank you! You were very helpful to me! Maybe I’ll buy a new iMac M3 and this time I’ll opt for a BTO purchase on the Apple Store in basic configuration, but with 16 GB of RAM and 2 TB of SSD.
Considering the power of any M3 Mac, I prefer to save by choosing the cheapest model, but immediately equipped with RAM and a large SSD whose choice outweighs the advantages of having milliseconds or seconds of difference, depending on the applications.
Obviously I’m speaking from my point of view and based on my needs. We’ve reached such a rapid level now that I think it’s hard to notice any difference unless you’re making heavy video or 3D applications.
After all, my late 2013 27″ iMac CPU i7 (maximum RAM and video card upgrade during BTO purchase and to which I added a fast 2TB SSD) has become so fast and powerful, thanks to OCLP and Sonoma, which provides Geekbench results similar to the Catalina-supported 2020 iMac.

OT note: I would like to point out that with Sonoma and OCLP 1.2.1 Continuity Camera still doesn’t work and there are some small minor bugs. But with Ventura everything works (on the system and with every application) just as if my iMac were an officially supported Mac. I still remain in Sonoma, waiting for Team OCLP to fix everything, little by little, like they did with Ventura back then.
If that’s not possible, I will reinstall the Ventura to continue to enjoy the iMac which is perfect, fast, high-performance but which Apple proposes TO SCRAP if I try to include it in a possible trade-in proposal.

LikeLiked by 1 person
- 2
  
  hoakley on November 22, 2023 at 9:33 am
  
  Apple will recycle models traded in if they’re unable to resell them – it’s the only alternative. And there’s no market for Apple to sell reconditioned models that are ten years old, no matter how well they might still work. Neither do they try to stock replacement parts for such old hardware, so even its logic board isn’t going to be of any use. Sorry, that’s the way it is – if you want your Mac to go to a good home, then sell it privately, or give it to a charity that re-uses old computers.
  I think you’ll find an M3 iMac blows your socks off!
  Howard.
  
  LikeLiked by 1 person
  - 3
    
    Enzo Vincenzo on November 22, 2023 at 1:40 pm
    
    I agree with you, Howard, in justifying Apple if it doesn’t (rightly) accept trade-ins of Macs that are more than a few years old.
    But I don’t agree with Apple’s invitation to scrap my Mac. For Office use and also for use in a doctor’s office, in fact, my Mac is fast and snappy beyond any need. Even the vision of RX and CT scan in 3D is fluid, fast, has no lag and does not make you want to replace the Mac.
    Scrapping it, therefore, goes against the Planet… Even more so if you consider that thanks to OCLP and Ventura or Sonoma the Mac accepts security updates…
    So, if I buy the new iMac M3, I will do it to use it at home.
    
    LikeLiked by 1 person
    - 4
      
      hoakley on November 22, 2023 at 5:32 pm
      
      I think that’s an excellent way to reuse it. Apple doesn’t scrap old Macs – they go for recycling, and these days almost everything in a Mac is fully recyclable. So it goes on to make more new Macs.
      Howard.
      
      LikeLike
5

tomsax on November 22, 2023 at 12:13 pm

I can confirm that the 16-core M3 Max has one E-Cluster of 4 cores and 2 P-Clusters of 6 cores each.

LikeLiked by 1 person
- 6
  
  hoakley on November 22, 2023 at 12:39 pm
  
  Thank you very much. There’s just the Ultra to go then, when it’s released.
  Howard.
  
  LikeLike
7

Monica Benneton-Smith on November 22, 2023 at 3:11 pm

Apple graciously gives us miraculously powerful hardware, yet time and time again we see it held back by horribly inefficient software.

In hindsight, Swift was a big mistake.

Apple should have gone with Rust instead.

Not only is Rust blazingly fast and memory-efficient, but it also has no runtime or garbage collector, and its rich type system and ownership model guarantee memory-safety and thread-safety, according to its web site.

Like Firefox proved to the world, it is also possible to gradually rewrite portions of existing C and C++ software using Rust.

It would have taken several years to do, obviously, but macOS, Cocoa, and the rest of Apple’s software should have gradually been rewritten using Rust.

Thanks to both the hardware and the software getting faster each year, Apple’s computers would then continually be getting even more powerful and faster than they already are.

Rust would have been a software performance multiplier in addition to the M* hardware performance multiplier.

LikeLiked by 1 person
- 8
  
  hoakley on November 22, 2023 at 5:36 pm
  
  Oh no – you’re not going to turn this into a religious debate about programming languages. For a start, it has absolutely nothing to do with this article. And, as you should well know, it’s not the language, but the compiler and the person coding with that language that determine the efficiency of the code run.
  And you’re welcome to code in Rust for macOS if that’s what you want.
  Now, can we please return to the topic of this article? Thank you for your understanding.
  Howard.
  (Over 35 years coding in almost everything from assembly language to APL.)
  
  LikeLike
9

Manuel Garcia on November 22, 2023 at 4:47 pm

I received my 14″ MBPro with M3Pro early (last Thursday), too. Interestingly, and according to results from Geekbench 6, the CPU frequency was not 4.05 GHz; it was ~3.7 GHz (I don’t recall the exact frequency). Is the frequency not always set the same, or was it a problem with my hardware or software (Geekbench)? I don’t know and didn’t try to find out. It was a custom configuration (36 instead of 18 GB of memory), and I ended up just returning it and getting an M3 Max with 36 GB (14″ again), not because I necessarily needed the Max, but rather just the 36 GB of memory and the M3 Max model was the only one in stock at the apple store with that amount of memory. Anyway, thanks for all your great columns and advice. I also really enjoyed reading your Don Quixote summaries and paintings.

LikeLiked by 1 person
- 10
  
  hoakley on November 22, 2023 at 5:42 pm
  
  Thank you.
  I don’t know where Geekbench got that figure from, but I have taken it from powermetrics, running assembly code. macOS has great control over the frequency of both P and E cores. When P cores are given a substantial load at high QoS, they will normally be run at maximum frequency as much as possible, as that’s what delivers the performance. The code used in my tests here is written in assembly language so that it run tight loops in each core, with only register access, not memory. It’s carefully designed to measure maximum throughput in the core itself.
  I hope that you enjoy your M3 Max!
  Howard.
  
  LikeLike
11

Simon on November 22, 2023 at 8:05 pm

A very nice comparison, Howard. Thank you.

If money is not an issue (pardon this somewhat academic question), would you consider any negative impact from over-spec’ing a MBP? Can we trust that the variable clock and macOS’ excellent management of the Mx cores will ensure that even if we have purchased too many P cores for our typical workloads, that the system will still run cool and with long battery life? Or can you see cases where people would indeed get better battery life and cooler systems just because they chose a lighter spec’ed system.

LikeLiked by 1 person
- 12
  
  hoakley on November 22, 2023 at 8:16 pm
  
  Thank you.
  When not in use, a cluster is effectively shut down, either at 0 MHz or idle frequency, and draws almost no power. So if your M3 Max’s second cluster of P cores never do anything except idle, they won’t cost it any battery endurance.
  Where the M3 Pro might profit is when threads overflow its single cluster of P cores. With no second P cluster to light up, the additional threads will then be run on the E cluster, not quite so fast, but with significantly lower power use. On the M3 Max, the second P cluster would light up, and probably drain the battery slightly faster.
  Power and energy calculations are quite nuanced, so I want to look more carefully at the data before saying whether that would be significant, but it appears likely.
  This is but the start of a long journey of discovery.
  Howard.
  
  LikeLike
  - 13
    
    Jozef Remen on December 1, 2023 at 6:33 am
    
    As a Web developer, I’m running Mysql and reload Safari/Chrome quite often. Mysql DB is massive (14 GB) and when reloading browser on M1 Air, I can see all efficiency cores used to max (albeit without frequency info), while only 2 perf cores are somewhat used (probably read operations on Mysql). Would that mean that M3 Pro would bring me more benefit than M3 Max with faster browser reloading, which takes about 15 s now?
    Project is made using React and compilation times (which, too I do hundreds of time per day) have dropped from 22 s to 10 s using Node 18/19.x instead of 16.x. It’s compiled using esbuild which is multicore (Go lang) but performance cores are used just half of their max.
    As said, most the time is taken by reloading the browser and contacting local mysql DB utilizing all efficiency cores to maximum.
    Would 6 of efficiency cores help me?
    
    LikeLiked by 1 person
    - 14
      
      hoakley on December 1, 2023 at 10:04 am
      
      With an extra two E cores, an M3 Pro could well improve the speed. However, it’s worth discovering why that’s being run on E cores in the first place. Is it a task that’s designated to run at low QoS in the background? Changing that to a higher QoS would run it on the P cores when available.
      One reason for the P cores only being used at half their maximum is a limit to the number of threads in the code. If that can be increased, then it should be able to use P cores faster. But other factors such as memory or disk access may also be limiting factors. Without knowing more about what’s limiting its performance, it’s hard to know whether an M3 Pro or Max would result in major improvements.
      And Chrome may be another story altogether, of course.
      You might find it best to try this out on an M3 Pro or Max before making a decision.
      Howard.
      
      LikeLike
    - 15
      
      Jozef Remen on December 1, 2023 at 10:29 am
      
      Well I Have found that problem when running on E cores was two fold.
      I’ve had old MAMP running under Rosetta, but also, as we have to stay on PHP 8.0.x I had one took from homebrew – which was also running as x86 from previous Intel Mac (as it was migrated). So, updating MAMP to ARM and also, using PHP 8.0 from homebrew now getting packages from arm64 repo solved the problems. Reloading is now fast, compilation in react too (using node 19, as 16 has slow esbuild).
      Had not realized that all these parts were just moved from previous installation as x86 compatible.
      Basically now I need M3 Pro/Max just to be able to connect ultrawide 5k2k monitor as basic M1/M2/M3 can generate only 6k buffer so scaled resolution would be small…
      
      Now to move the rest of homebrew utilities to arm… that is going to be funny friday again 😀
      
      LikeLiked by 1 person
    - 16
      
      hoakley on December 1, 2023 at 12:39 pm
      
      Yes, ensuring all your code is running native rather than in Rosetta translation is also a major factor. While Rosetta does well, it’s nowhere near as good as an Arm port.
      Howard.
      
      LikeLike
17

Simon on November 22, 2023 at 8:12 pm

> Although it can cast light on the use of different types of core, and
> help you decide whether your next Mac needs more cores, it’s also
> seriously misleading, as shown in the screenshot below.
> …
> This persistent failure in Activity Monitor to take core frequency into
> account gives seriously misleading impressions.

I do not disagree with this statement at all. But I am curious to hear more.

In this case specifically there is an indication, right? You know the task you have initiated is identical in both cases, so the integral needs to be the same. Is not the fact that 6 P cores show a much broader block than the thin stripes for 6 E cores already an indication that the clock for the P cluster thus must be running much higher than for the E cluster?

LikeLiked by 1 person
- 18
  
  hoakley on November 22, 2023 at 8:25 pm
  
  As I have established before, what Activity Monitor’s CPU History window shows is percent active residency. Not only does that fail to take into account differences between P and E cores, but it misleads within the same core.
  On an M3 Pro E core, low QoS threads will be run at 744 MHz, and quite possibly attain 100% active residency. That’s around a third of the total core throughput capacity, not 100%. When that E core is running high QoS threads that have overflowed from the P cores, it might have a lower active residency of 50%, but run at a frequency of 2064 MHz. But there, CPU History will show its load at 50% even though its throughput is far greater than at low QoS.
  I have long argued that the figures shown in Activity Monitor should be expressed as percentages of full active residency at maximum core frequency, which would allow much better semi-quantitative comparisons. With all the power of Apple silicon, you might have thought that making that simple arithmetic adjustment would have been straightforward.
  Howard.
  
  LikeLike
19

Maynard Handley on November 23, 2023 at 7:18 am

Hi Howard, great work as usual!
I’m especially curious about the AMX capabilities.
Can you run
(a) some SGEMM type tests, to see if we get one AMX unit in the P cluster, or two?
(b) some FFT type tests, to see how much more performant the AMX unit now is when acting as a vector rather than a matrix unit?

Thanks!

LikeLiked by 1 person
- 20
  
  hoakley on November 23, 2023 at 5:27 pm
  
  Thank you – what an excellent idea. I presume the best (perhaps only) way to run these is using the Accelerate library. While I don’t have any problem finding a choice of suitable FFT-type functions, was there anything specific you’d suggest for SGEMM type tests?
  Howard.
  
  LikeLike
  - 21
    
    Maynard Handley on November 24, 2023 at 12:23 am
    
    Yes, I was assuming just use basic SGEMM and FFT from Accelerate. The immediate questions of interest are as I suggested, nothing more than that, so just a basic comparison with M1 should answer most questions.
    
    One of the interesting developments in AMX is that it seems to be moving towards not just a matrix processor but a replacement for much AVX512 functionality (not permute, but FP compute). To this end there have been multiple tweaks (in the patent record) to
    – transport more instructions per cycle from a single core to AMX
    – perform multiple vector instructions per cycle on the AMX core (in principle you could do 8 of these per cycle. I think M1/M2 can do 2 per cycle so there is room to grow!)
    – not yet OoO functionality, but steps in that direction including a neat tweak that is kinda/sorta speculative and has the net result of “pre-loading” register values so eliminating load latency.
    
    The question then, of course, is how much of this stuff has yet been *implemented*?
    
    My hope is that SGEMM numbers tell us whether there is one vs two AMX units in the single P-cluster (I could argue for either choice as being sensible!)
    And that the FFT numbers (which I believe are most to all vector, not matrix, instructions) will tell us just how much additional vector work has moved to AMX/how much more performant AMX vector handling now is.
    
    LikeLiked by 1 person
- 22
  
  hoakley on November 23, 2023 at 11:10 pm
  
  I’ve now got a Cholesky Decomposition and an FFT up and running, using Accelerate routines. Results from the former are strongly suggestive that they’re not running on the CPU core, but on a singleton processor. There are quite large differences between the M1 Max and the M3 Pro, which is far faster. More later.
  Howard.
  
  LikeLike
23

Kris on November 24, 2023 at 3:33 pm

hi
you wrote. M3 P cores appear to have improved performance in the vector (NEON) unit by 1,63 % or so vs M1. And what % vs M2 please ?

LikeLiked by 1 person
- 24
  
  hoakley on November 24, 2023 at 3:34 pm
  
  Send me an M2 Pro and I’ll tell you.
  Howard
  
  LikeLike
  - 25
    
    Christophe Lapierre on November 24, 2023 at 3:49 pm
    
    Will do. Joke aside it would be useful for choosing between M3 and M2 pro since my softwares of choice, Corel Painter, Escape motion rebelle benefits of NEON vector units. M3 seems appealing, and thank you for the information provided.
    
    LikeLiked by 1 person
    - 26
      
      hoakley on November 24, 2023 at 3:52 pm
      
      No problem, but I can’t test what I don’t have, and I chose not to upgrade to an M2 because of the relatively small differences in performance. I don’t think you would see much difference either.
      Howard
      
      LikeLike
- 27
  
  hoakley on November 24, 2023 at 5:35 pm
  
  Oh – and it’s not 1.63%. It’s a factor of 1.63, in other words M3 NEON delivers a performance of 163% compared to M1. That’s a lot more than the more general 130% or so.
  Howard.
  
  LikeLike
  - 28
    
    Christophe Lapierre on November 24, 2023 at 6:07 pm
    
    Oops. Yes it is what I understood and interest me. Mixed numbers in typing (x 1,63 OR 163 % of course). You can tell i wasn’t good at maths, right ? My bad. So Thank you for the correction. My interrogation remains if that type of gains was linear from M1 to M2 to M3 or perhaps only a jump since M3. We will nether know I presume. Not a big deal. M3 Pro with enough memory / storage is out of my budget and i will perhaps buy a second hand 64 go ram m1 max Mac Studio 500 go internal + external storage if any. Still a nice jump from my late 2013 i7 laptop, still useful for web browsing and office work.
    
    LikeLiked by 1 person
    - 29
      
      hoakley on November 24, 2023 at 10:47 pm
      
      I doubt whether the NEON improvements are linear. They look to me like an improvement in hardware design, but whether that occurred in the M2 or M3, I don’t know. My guess would be that it’s recent, and in the step M2 to M3. But the big step here for you is going from Intel to Apple silicon – that’s the biggest gain of all.
      Howard.
      
      LikeLike
    - 30
      
      Christophe Lapierre on November 25, 2023 at 12:26 pm
      
      THan ko for your insight.
      
      LikeLiked by 1 person
31

Daniel on November 26, 2023 at 4:48 am

Congrats on your new 16″ MacBook Pro M3 Pro! I ordered the same a week ago, and it’s on the plane from Shanghai, due to arrive here in a couple of days. Is there anything you do to test the CPU, GPU, RAM, and other components for defects and rare intermittent errors? I remember years ago using Memtest86 to test newly-bought RAM modules. I assume Apple usually tests everything thoroughly in the factory before shipping, but some people reported surprisingly poor quality quality control earlier this year (defective screens, defective trackpads).
By the way, your website is my favourite place for info on Macs and macOS. And also my favourite place for info on painting. Thanks for all the excellent work on both topics!
– Daniel

LikeLike
- 32
  
  hoakley on November 26, 2023 at 7:40 am
  
  Thank you. Congratulations – I hope that you enjoy it when it arrives.
  No, I don’t try to do my own QA. I’m not sure who “some people” are, but that’s not my experience at all. Any perceived defects should quickly become obvious anyway. However, I do run CPU core performance tests and SSD benchmarks as a matter of routine to determine how my Apple silicon Macs perform, as you have read in the above article. I suppose you could run something like Geekbench just to confirm that you’re getting similar performance to others. But that doesn’t measure the maximum transfer rate of each of its TB ports, for instance. Life is too short to spend on such tests unless they have a real purpose.
  Howard.
  
  LikeLike
33

idinfopedia on November 28, 2023 at 2:50 am

nice article

LikeLiked by 1 person
- 34
  
  hoakley on November 28, 2023 at 8:07 pm
  
  Thank you.
  Howard.
  
  LikeLike
35

Jinwook Jeong on November 30, 2023 at 4:08 am

Thank you for a nice article. I have a question. Is there any automated way of analyzing CPU utilization patterns? It’s infeasible to stare at CPU history window having a limited time range, and I think it is more reliable to decide based on daily and monthly statistics.

LikeLiked by 1 person
- 36
  
  hoakley on November 30, 2023 at 9:52 am
  
  Thank you.
  I don’t know of any automated way. I suppose you could script powermetrics to take periodic sample, but its output isn’t formatted to be easily ingested into any form of automatic analysis. I do all my analyses manually, which gives a good feel for the data, but is extremely time-consuming and laborious.
  Unfortunately, powermetrics seems the only accurate tool for doing this, and it requires root privileges too. Xcode instruments can also do it, but again aren’t designed for this purpose.
  One day I’ll maybe get round to writing a more convenient app.
  Howard.
  
  LikeLike
37

Michele Galvagno on February 16, 2024 at 6:44 pm

I suppose my M3 Max with 14 cores (10P 4E) has one 6-cluster and one 4-cluster.
How can I confirm that? In System Information I’ve only found the 14 cores info.
Maybe this machine is less energy efficient than the M3 Pro but its battery life is just insane!
Then, most music-making apps make poor use of E-cores, making the Pro variant a poor choice (for the money spent, not per se).

LikeLiked by 1 person
- 38
  
  hoakley on February 16, 2024 at 7:16 pm
  
  That depends which P cores have been disabled. It could be 6+4 or 5+5. You may get a better idea using the CPU History window in Activity Monitor. If not, then you can find out in Terminal using powermetrics. Let me know if you want to do that and I’ll cook up a set of command options for you to use.
  Howard.
  
  LikeLike
  - 39
    
    Michele Galvagno on February 16, 2024 at 7:37 pm
    
    That would be great if you could.
    
    LikeLiked by 1 person
    - 40
      
      hoakley on February 16, 2024 at 7:44 pm
      
      I’ve been using it this afternoon, but the options are so complex! Try
      sudo powermetrics -i 10 -n 1 –samplers cpu_power > ~/Documents/apmtest1.text
      where the — before samplers is two hyphens (not a long dash), and substitute your own filename for apmtest1.text, which you can then read and discover the cores in each cluster, and their power use at that moment in time.
      Happy hunting!
      Howard.
      
      LikeLike
    - 41
      
      Michele Galvagno on February 17, 2024 at 7:26 pm
      
      It’s a 5+5P and 4E cluster arrangement.
      If you want to see the results, I can send you the TXT via email.
      
      LikeLiked by 1 person
    - 42
      
      hoakley on February 17, 2024 at 7:27 pm
      
      Well done – I believe you!
      That seems a better balance.
      Howard
      
      LikeLike
- 43
  
  hoakley on February 16, 2024 at 7:23 pm
  
  Sorry – I forgot to address the question of the Pro and energy efficiency.
  A Max is pretty well as energy-efficient as a Pro so long as it runs all its user (high QoS) threads on just the first P cluster, so the second cluster sits idling all the time. That depends on how you use it, and how the apps you run concurrently use threading.
  Where the Pro comes into its own is when the threads overflow the first cluster of P cores. In a Max, they’ll then use the second cluster; in a Pro, they’ll use E cores instead, resulting in the use of considerably less power, and lower energy use. Remember that high QoS (user) threads aren’t confined to the P cores: when none is available, they’ll be allocated to the E cores instead. It’s the low QoS background threads that aren’t (generally) run P cores, but have to wait for an E core to become available. As the Pro has a full cluster of E cores, compared to the Max’s 4, that enables the Pro to cope with higher background task loads.
  Howard.
  
  LikeLiked by 1 person