There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

tal , (edited )
@tal@lemmy.today avatar

I hit this.

I had a 13900KF fail after a few months on an Asus Z790-based motherboard; started seeing memory errors when anything more than one core was active (disabling additional cores in grub during Linux boot). Worked fine prior to that. I believe that the failures were progressive to some degree; I initially only saw sporadic errors, then saw them increasingly-frequently until it wasn’t possible to even boot with multiple cores. memory testers didn’t trigger it. Doing builds with many cores did consistently do so (I built Cataclysm: Dark Days Ahead at -j32, was the first test case I could find that reliably failed at some random point during the build on the failing processor, though never the same point). Starting up Stable Diffusion also could pretty consistently fail. I scripted up test cases using these to investigate downclocking memory and trying to fiddle with other settings. Downclocking the memory may have helped a bit – I didn’t gather enough data to try to get solid figures – but at the end, even having it all the way down wasn’t sufficient to cleanly boot the system; you’d have errors just trying to mount the root filesystem. Tried different Linux kernels, including building my own out of latest nightly code, tried fiddling with the kernel preemption mode (on the off chance that it was a Linux bug triggered by multiple core use). Got a 14900KF to replace it, made sure to turn off the default motherboard settings that were recommended against by Intel before inserting or ever using chip, assuming that it must have been overly-aggressive motherboard defaults. Had very hefty cooling on this. At first I thought it might be voltage drops due to the Stable Diffusion startup issues (maybe the GPU drawing power was a factor) or maybe even cooling (though temps seemed fine), but no – swapping the CPU made all the problems go away at first…and then it failed in the same way after a few months. Variety of problems, including Linux kernel complaining about hardware bugs, memory errors, kernel threads hanging, same as before. Same progressive failures that got more-frequent over time.

Never saw any problems with either CPU when running on a single core (maxcpus=1 passed to the Linux kernel), so at least I could get the system functional and stable, but obviously the performance was abysmal in that case. Using even one additional core and the problems were present (unusably so, towards the end on each CPU).

Switched to an AMD motherboard and processor. Haven’t had any problems. I expect that I’ll continue using AMD processors moving forward unless they put some serious lemons out.

No change in DIMMs (and in fact, used the same DIMMs just fine with the AMD processor).

At least I know that I’m not just crazy and that a ton of other people are getting this too. And the fact that this guy has been running on a different chipset and has a large dataset running within safe specs does kind of rule out the motherboard being at fault – I didn’t try running a motherboard with another chipset and another CPU from that class. The guy did say that some CPUs in his dataset just don’t seem to experience the problems (I saw him say a “50% rate”), so maybe there’s some sort of problem with Intel’s manufacturing process rather than with their design, and whatever testing methodology they used didn’t deal well with that.

And the guy is very explicit that they saw progressive degradation too, and had tests with logged times for it. 14:50:

We have datacenter logs from where these systems first went online, and with these systems first going online six months ago, they would pass these specific tests. Re-running these specific tests on the exact same hardware, it will not pass. That’s wild.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines