I’m familiar enough with Linux but never used an immutable distro. I recognize the technical difference between what you describe and “go delete a specific file in safe mode”. But how about the more generic statement? Is this much different from “boot in a special way and go fix the problem”? Is any easier or more difficult than what people had to do on windows?
Primarily it’s different because you would not have had to boot into any safe mode. You would have just booted from the last good image from like a day ago and deleted the current image and kept using the computer.
lol thanks for the answer. This is the really relevant bit isn’t it? My Linux machines have also never died this badly before. But I’ve seen windows do it a number of times before this whole fiasco.
That’s all well and good, but many of these Windows machines were headless or used by extremely non-technical people - think tills at your supermarket or airport check-in desks. Worse, some of these installations were running in the cloud, so console access would have been tricky.
The cloud systems would have been a problem. Any local systems, a non-technical user, could have easily done because their IT department could simply tell them, turn on your computer, and when it gets to this screen with these words, press the down arrow key one time and press enter, and your computer will boot normally.
Their willingness to do it would primarily come from the fact that they have a job to do, and if their co-workers are doing their jobs because they followed the instruction and they are not, then the boss is going to have a nice look at them.
This relies on the assumption that everyone else, or at least a significant portion, in the office managed to do it.
I'm not talking about whether or not they're actually physically capable of it, of course they are. Im talking about how people immediately shut down and pretend they can't follow simple directions the second something relates to a compute.
You clearly haven’t worked a help desk if you think even those simple instructions are something every end user is capable of or willing to do without issue.
I guess I had really good colleagues. I was the network administrator for a small not-for-profit organization and the only time people came to me with computer problems was when they had tried the things that they knew worked first. If the obvious answers did not fix the problem, then they would bring it to my attention.
It should be relatively straightforward to script the recovery of cloud VM images (even without snapshots). Good luck getting the unwashed masses to follow a script to manually enter recovery mode and delete files in a critical area of the OS.
Realistically, immutability wouldn't have made a difference. Definition updates like this are generally not considered part of the provisioned OS (since they change somewhere around hourly) and would go into /var or the like, which is mutable persistent state on nearly every otherwise immutable OS. Snapshots like Timeshift are more likely to help.
How does Falcon store these channel files on Linux? I don’t know how an immutable distro would handle this given CrowdStrike push several of these updates per day and presumably use their own infrastructure to deploy them.
I guess if you pay them enough they could customize the deployment to work with whatever infrastructure you have but it’s all proprietary so I have no idea if they’re really doing that anywhere.
If the sensor was using eBPF (as any modern sensor on Linux should) then the faulty update would have made the sensor crash, but the system would still be stable. But CrowdStrike has a long history of using stupid forms of integration, so I wouldn’t put it past them to also load a kernel module that fucks things up unless it’s blacklisted in the bootloader. Fortunately that kind of recovery is, if not routine, at least well documented and standardized.
It wouldn’t technically stop anything, it would just make your live Hell on Earth if you tried to add that self-updating ring-0 proprietary software in your servers.
But I guess what you are looking for is immutable infrastructure? That one would stop the problem.