Flip side of the coin, I had a sysadmin who wouldn’t increase the tmp size from 1gb because ‘I don’t need more than that recommended size’. I deploy tons of etl jobs, and they download gbs of files for processing to this globally known temp storage. I got it changed for one server successfully after much back and forth, but the other one I just overrode it in my config files for every script.
Yeah python does lack in such things. Half a decade ago, I setup an ml model for tableau using python, and things were fine until one day it just wouldn’t finish anymore. Turns out the model got bigger and python filled out the ram and the swap trying to load the whole model in memory.
During the pandemic I had some unoccupied python graduates I wanted to teach data engineering to.
Initially I had them implement REST wrappers around Apache OpenNLP and SpaCy and then compare the results of random data sets (project Gutenberg, sharepoint, etc..).
I ended up stealing a grad data scientist because we couldn't find a difference (while there was a difference in confidence, the actual matches were identical).
SpaCy required 1vCPU and 12GiB of RAM to produce the same result as OpenNLP that was running on 0.5 vCPU and 4.5 GiB of RAM.
2 grads were assigned a Spring Boot/Camel/OpenNLP stack and 2 a Spacy/Flask application. It took both groups 4 weeks to get a working result.
The team slowly acquired lockdown staff so I introduced Minio/RabbitMQ/Nifi/Hadoop/Express/React and then different file types (not raw UTF-8, but what about doc, pdf, etc..) for NLP pipelines. They built a fairly complex NLP processing system with a data exploration UI.
I figured I had a group to help me figure out Python best approach in the space, but Python limitations just lead to stuff like needing a Kubernetes volume to host data.
Conversely none of the data scientists we acquired were willing to code in anything but Python.
I tried arguing in my company of the time there was a huge unsolved bit of market there (e.g. MLOP's)
Alas unless you can show profit on the first customer no business would invest. Which is why I am trying to start a business.
Companies hate OpEx and love CapEx. That’s the main driver as companies loathe hardware life cycle costs and prefer a pay as you go model. It is more expensive but it’s more budget friendly as you avoid sticker shock every 3-4 years.
Do you mean that it’s still the case that more resources are allocated than actually used or that the code does not need to be optimized anymore due to elastic compute?
If I remember correctly, that was the original idea of AWS, to offer their free capacity to paying customers.
Do you think that AWS in particular has this problem or Azure and GCP as well? I have mainly worked with DWHs in Snowflake, where you can adjust the compute capacity within seconds. So you pay almost exactly for the capacity you really need.
Not having to optimize queries is a good selling point for cloud-based databases, too.
It is certainly still cheaper than self-hosted on-premises infrastructure.
We worked with a business unit to predict how many people they would migrate on to their new system week 1-2 … they controlled the migration through some complicated salesforce code they had written.
We were told “half a million first week”. We reserved capacity to be ready to handle the onslaught.
Yeah, almost certainly the software only uses 4GB because it limits itself to what memory it has available.
I have seen this conversation pan out a few times already. It has always been because of that, and once expanded things work much better. (Personally I have never took party at one, I guess that’s luck.)
This was excellent, but conveniently left off any discussion that npm can “un-un-publish” a programmer’s code against their wishes, and apparently without repercussions?
Absolutely they can un-unpublish since the programmer has given everyone the rights to use his code wherever they want, with its open license. Npm can actually use the older version of the code and give it to everyone. Its actually a good thing
Thank fuck for that, cause if they didn't faker.js and node-ipc would have caused a lot of trouble, with the developers adding malware to a new version and later deleting the entire packages, breaking tons of projects. And those were everything but small packages.
All for the greater good, especially if it’s the choice between one guy’s desire to nuke their own code VS tens / hundreds of thousands of projects that depend on it.
sometimes i start my iterator with = -1. As I only +=1 it with a condition and I know that it will return true on the first cycle. I’ll chuck array[iterator] and need it to be 0 to start with ofc.
I just have no idea how to not do this, but it looks so bad, i need a i8 instead of a u8 at least because of this
I could tell you my recent cenario, but it wouldn’t get us anywhere. because I know that it’s avoidable, but it’d take for me to run a different logic for only first element of my array. which is doable, but it’d make the code like 5 extra lines longer, harder to read/follow. But I just simply choose to put -1 and boom it’s fixed, just works.
another solution would be (without context) is to add one more variable and one more check to my foreach, but that takes more memory and cpu, I usually choose the i = -1, it’s ugly but not as ugly as other solutions would be
So you can have a local, and a team config. So at time of commit the code rules your team has selected are enforced. So if I looked at my code, on GitHub, it would look as expected by the team.
I love such formatters and wish they were even more widespread. In many cases, I really want consistency above all and it’s so dang hard to achieve that without an opinionated formatter. If the formatters isn’t opinionated enough, it just leads to countless human enforced rules that waste time (and lead to an understandable chorus of “why can’t the formatter just do that for meeeee”).
Yeah but outside of that where the code is implemented or in a documentation, tabs are still easier to look through. And it does look pretty as long as there aren’t too many nested functions.
That’s really interesting. So does that mean the interpreter just checks whether the current line is more indented, less indented, or equal vs. the preceding, without caring by how much?
programmer_humor
Oldest
This magazine is from a federated server and may be incomplete. Browse more on the original instance.