People of lemmy, what was your “oh crap” moment?

Please just provide a short, medium or long story

Nomad , 13 hours ago

I was testing some code late at night in the test system. Rolled out the changes, log on to the admin interface and write a short news article about how one of the more hated profs at the university had died suddenly and unexpectedly.

Result looks good, roll out changes to prod, about to call it quits for the night. Think to myself: common reason people get fired, maybe delete the story from test system. Check test system, no story there… Uhoh.

Story has been live for about three hours. Hope no spiders have caught it yet, hurry to delete it and learn how to purge all evidence from database.

Turns out the shithead admin had copy and pasted the server config for the test system from live and forgotten to change the admin rewrite rules to test system. Phew…

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

nis , 15 hours ago

“We need to talk.”

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

dan , 21 hours ago (edited 6 hours ago)

I took down the home page of one of the top 5 websites for around 5 minutes.

There were two existing functions that were written by a different team: An encode method that took a name of something (only used internally, never shown to the user) and returned a numeric identifier for it, and a decode method that did the opposite.

Some existing code already used encode, but I had to use decode in my new code. Added the code, rolled it out to 80% of employees, and it seemed to work fine. Next day, I rolled it out to 5% public and it still seemed okay.

Once I rolled it out to everyone, it all broke.

Turns out that while the encode function used a static map built at build-time (and was thus just an O(1) lookup at runtime), decode connected to a database that was only ever designed for internal use. The DB only had ten replicas, which was nowhere near enough to handle hundreds of thousands of concurrent users.

Luckily, it’s commonplace to use feature flags changes, which is how I could roll it out just to employees initially. The devops team were able to find stack traces of the error from the prod logs, find my code, find the commit that added it, find the name of the killswitch, and disable my code, before I even noticed that there was a problem. No code rollback needed.

That was probably 7 years ago now. Thankfully I haven’t made any mistakes as large as that one again!

Always use feature flags for major changes, especially if they’re risky!

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

JoMiran , 23 hours ago (edited 6 hours ago)

September 12, 2001. I accidentally shut down an entire production plant. Management didn’t even get mad. They closed the plant for the day, I kicked off the boot cycle (takes hours for the system to be ready for production again) and everyone went home to be with their families. Nobody’s head was on right that day anyway.

EDIT: A few years later I was testing some BigIP configs on a tertiary unit when suddenly the entire e-commerce site went down. Apparently this unit used to be a primary before being demoted and someone (not me) forgot to disable replication, so when I wiped all the rules from my “test unit” I inadvertently wiped all the rules to the production units. Technically it wasn’t my fault but it was still an “oh fuck” moment.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

JustAnOrdinaryCreep , 1 day ago

Yesterday I was waiting for the Tram.

As I stood there, I turned my head to the right and witnessed how a pigeon was hit by a car.

Kinda traumatizing, especially when the cars that followed ran over and over the carcass.

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

Sir_Kevin , 1 day ago

Just one?

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...

stink , 1 day ago

I pooped my pants 🤯

Reply

Report

Activity

Open original URL

Copy original URL

Copy Mbin URL

Loading...