What is the most difficult problem that you have fixed in linux?

33550336 , 3 months ago

quit vim

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

IzzyJ , 3 months ago

This will feel extremely simple for some folks, but I was having a hell of a time getting Steam games that had previously worked through Proton running. I scoured the internet for solutions after trying to install proton-ge and testing multiple versions. Eventually someone had the galaxy brain idea to suggest installing WINE. For some reason, that fixed the problem real good.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

MentalEdge , 3 months ago

I manage a machine that runs both media transcodes and some video game servers.

The video game servers have to run in real-time, or very close to it. Otherwise players using them suffer noticeable lag.

Achieving this at the same time that an ffmpeg process was running was completely impossible. No matter what I did to limit ffmpegs use of CPU time. Even when running it at lowest priority it impacted the game server processes running at top priority. Even if I limited it to one thread, it was affecting things.

I couldn’t understand the problem. There was enough CPU time to go around to do both things, and the transcode wasn’t even time sensitive, while the game server was, so why couldn’t the Linux kernel just figure it out and schedule things in a way that made sense?

So, for the first time I read up on how computers actually handle processes, multi-tasking and CPU scheduling.

As FFMPEG is an application that uses ALL available CPU time until a task is done, I came to the conclusion that due to how context switching works (CPU cores can only do one thing, they just switch out what they do really fast, but this too takes time) it was causing the system to fall behind on the video game processes when the system was operating with zero processing headroom. The scheduler wasn’t smart enough to maintain a real-time process in the face of FFMPEG, which would occupy ALL available cycles.

I learned the solution was core pinning. Manually setting processes to run on certain cores of the CPU. I set FFMPEG to use only one core, since it doesn’t matter how fast it completes. And I set the game processes to use all but that one core, so they don’t accidentally end up queueing for CPU time on a core that doesn’t have the headroom to allow the task to run within a reasonable time range.

This has completely solved the problem, as the game processes and FFMPEG no longer wait for CPU cycles in the same queue.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Waffelson OP , 3 months ago

This reminded me of how I disabled processor cores in Process Lasso for programs

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

flambonkscious , 3 months ago

Well that’s interesting… I’d have thought, possibly naively, that as long as a thread had work to do it would essentially behave like ffmpeg does?

Perhaps there’s something about the type of work though, that it’s very CPU-bound or something?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

MentalEdge , 3 months ago

I think the difference is simply that most processes only have a certain amount that needs accomplishing in a given unit of time. As long as they can get enough CPU time, and do so soon enough after getting in line for it, they can maintain real-time execution.

Very few workloads have that much to do for that long. But I would expect other similar workloads to present the same problem.

There is a useful stat which Linux tracks in addition to a simple CPU usage percentage. The “load average” represents the average number of processes that have requested CPU time, but have to queue for it.

As long as the number is lower than the available number of cores, this essentially means that whenever one process is done running a task, the next in line can get right on with theirs.

If the load average is less than the number of cores available, that means the cores have idle time where they are essentially just waiting for a process to need them for something. Good for time-sensitive processes.

If the load average is above the number of cores, that means some processes are having to wait for several cycles of other processes having their turn, before they can execute their tasks. Interestingly, the load average can go beyond this threshold way before the CPU hits 100% usage.

I found that I can allow my system to get up to a load average of about 1.5 times the number of cores available, before you start noticing it when playing on one of the servers I run.

And whenever ffmpeg was running, the load average would spike to 10-20 times the number of cores. Not good.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

flambonkscious , 3 months ago

That makes complete sense - if you’ve got something ‘needy’, as soon as it’s queuing up, I imagine it snowballs, too…

10-20 times the core count is crazy, but I guess it’s had a lot of development effort into parallelizing it’s execution, which of course goes against what your use case is :)

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

MentalEdge , 3 months ago (edited 3 months ago)

Theoretically a load average could be as high as it likes, it’s essentially just the length of the task queue, after all.

Processes having to queue to get executed is no problem at all for lots of workloads. If you’re not running anything latency-sensitive, a huge load average isn’t a problem.

Also it’s not really a matter of parallelization. Like I mentioned, ffmpeg impacted other processes even when restricted to running in a single thread.

That’s because most other processes will do work in small chunks that complete within nanoseconds. Send a network request, parse some data, decode an image, poll HID device, etc.

A transcode meanwhile can easily have a CPU running full tilt for well over a second, working on just that one thing. Most processes will show up and go “I need X amount of CPU time” while ffmpeg will show up and go “give me all available CPU time” which is something the scheduler can’t actually quantify.

It’s like if someone showed up at a buffet and asked for all the food that no-one else is going to eat. How do you determine exactly how much that is, and thereby how much it is safe to give this person without giving away food someone else might’ve needed?

You don’t. Without CPU headroom it becomes very difficult for the task scheduler to maintain low system latency. It’ll do a pretty good job, but inevitably some CPU time that should have gone to other stuff, will go the process asking for as much as it can get.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Maxxus , 3 months ago

Maybe this goes a bit deeper than the question intended, but I’ve made and shared two patches that I had to apply locally for years before they were merged into the base packages.

The first was a patch in 2015 for SDL2 to prevent the Sixaxis and other misbehaving controllers to not use uninitialized axes and overwrite initialized ones. Merged in 2018.

The second was a patch in the spring of 2021 for Xft to not assume all the glyphs in a monospaced font to be the same size. Some fonts have ligatures which are glyphs that represent multiple characters together, so they’re actually some multiple of the base glyph size. Merged in the fall of 2022.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

bane_killgrind , 3 months ago

How dare you science in a kvetching discussion

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

GravitySpoiled , 3 months ago

Grub.

Seriously. Tha was some fat as shit because I didn’t know what I was doing.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

nul9o9 , 3 months ago

I broke my bootloader fucking with uefi settings. I was in a panic for a few hours because I hadn’t bothered to learn how that shit worked until then.

It sure was a relief when i got back into my system.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

passepartout , 3 months ago

Bricked my pc twice because of the bootloader and couldn’t repair it. From now on i just nuke my system if something is fucky and have a shell script do the installing of packages etc.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Hyrulian , 3 months ago

Around 2017 I spent three days on and off trying to diagnose why my laptop running elementary OS had no wifi support. I reinstalled the wifi drivers and everything countless times. It worked for many days initially then just didn’t one day when I got on the laptop. Turns out I had accidentally flipped the wifi toggle switch while it was in my bag. I forgot the laptop had one. Womp womp.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Hawke , 3 months ago

Womp womp.

I used to bullseye womp rats in my T-16 back home, they’re not much bigger than 2 meters.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

passepartout , 3 months ago

I had a friend come over to my place to fix her laptops wifi. After about an hour searching for any setting in windows that i could have missed, i coincidentally found a forum where one pointed out this could be due to a hardware wifi switch…

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

teft , 3 months ago

I once exited vim without having to look up the commands.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

z00s , 3 months ago

Truly you are a god amongst men

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

flambonkscious , 3 months ago

I suppose it’s statistically inevitable, I just didn’t think it would happen in my lifetime

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

reddithalation , 3 months ago (edited 3 months ago)

single gpu vm passthrough. took a few days for troubleshooting, and i didnt even want to get it to be undetectable by game anticheat, i hear that needs building your own kernel for some advanced detection methods.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

LordCrom , 3 months ago

I used a GPS module to feed time data to a custom ntp service with no jitter. About as close as I could get to atomic clock precision for under $150.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

DmMacniel , 3 months ago

my session manager refused to start, and I was very close to reinstalling my system.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Waffelson OP , 3 months ago

I had problems with the session manager My lightdm was broken and I tried to fix it. Disable, enable, start, stop the service in systemctl I have changed the configuration of lightdm I’ve tried different lightdm greeters But the problem wasn’t with lightdm, it was xorg. I don’t use xorg, and now I use terminal session manager “ly” It will work even without xorg

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

DumbAceDragon , 3 months ago

Can’t think of the most difficult problem, but I have managed to solve a lot of problems with btrfs snapshots.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Nithanim , 3 months ago

Jumping from the default kernel with zfs to the xanmod kernel using a manually compiled version of zfs. I don’t rememeber a whole lot but it was quite… interesting. Next would be a suddenly vanished efi partition and my f* mainboard refusing to boot ZBM.

Bonus: my currently still unfixed problem is a very weird freezing/stuttering of the whole OS and the only (useless) “lead” I have is workqueue: fill_page_cache_func hogged CPU for >10000us 4 times, consider switching to WQ_UNBOUND

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

jack , 3 months ago

Getting VR to work

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

cygon , 3 months ago (edited 3 months ago)

That might be it for me, too.

I run a distro with OpenRC instead of systemd, so I had to gain some understanding of udev permissions for USB devices and come up with my own udev rules for Steam because I couldn’t follow Valve’s setup guide.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

DumbAceDragon , 3 months ago

VR pretty much just worked for me with my vive. Had some issues with weird stuttering and tearing but I managed to find a solution in some config file.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

AceFuzzLord , 3 months ago (edited 3 months ago)

I don’t know how I fixed it, but KDE Plasma 5.whatever on MX was acting up. It would let me login but if I couldn’t do much else. Wouldn’t respond to my clicks or anything. Thankfully I could open Yakuake and install a different desktop environment. Then, one day while I was backing up files to do a reinstall, it started working again. I could use Plasma without issues. I have no clue what fixed it, though.

It also came with a non-issue of now my laptop won’t auto turn on every time I open it up, but I’ll take that over having to reinstall and set things back up.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

sep , 3 months ago

Xfree86 was sonetimes a mess. And i did not have a browser anymore when it refused to start. So man pages only.

I once rm -rf all the db files of a running database: Recovered the files via inodes since they were all still open on the running database, that was a mess.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...