Make your MIT-licensed library big enough that the corpos use it, then switch it to AGPL just before you add a really important and tricky feature they’ve been waiting for.
Problem: Given two list of length n, find what elements the two list have in common. (we assume that there are not duplicates within a single list)
Naive solution: For each element in the first list, check if it appears in the second.
Bogo solution: For each permutation of the first list and for each permutation of the second list, check if the first item in each list is the same. If so, report in the output (and make sure to only report it once).
lol, you’d really have to go out of your way in this scenario. First implement a way to get every single permutation of a list, then to ahead with the asinine solution. 😆 But yes, nice one! Your imagination is impressive.
unless the problem space includes all possible functions f , function f must itself have a complexity of at least n to use every number from both lists , else we can ignore some elements of either of the lists , therby lowering the complexity below O(n!²)
if the problem space does include all possible functions f , I feel like it will still be faster complexity wise to find what elements the function is dependant on than to assume it depends on every element , therefore either the problem cannot be solved in O(n!²) or it can be solved quicker
By “certain distance function”, I mean a specific function that forces the problem to be O(n!^2).
But fear not! I have an idea of such function.
So the idea of such function is the hamming distance of a hash (like sha256). The hash is computed iterably by h[n] = hash(concat(h[n - 1], l[n])).
This ensures:
We can save partial results of the hashing, so we don’t need to iterate through the entire lists for each permutation. Otherwise we would get another factor n in time complexity.
The cost of computing the hamming distance is O(1).
Order matters. We can’t cheat and come up with a way to exclude some permutations.
No idea of the practical use of such algorithm. Probably completely useless.
honestly I was very suspicious that you could get away with only calling the hash function once per permutation , but I couldn’t think how to prove one way or another.
so I implemented it, first in python for prototyping then in c++ for longer runs… well only half of it, ie iterating over permutations and computing the hash, but not doing anything with it. unfortunately my implementation is O(n²) anyway, unsure if there is a way to optimize it, whatever. code
as of writing I have results for lists of n ∈ 1 … 13 (13 took 18 minutes, 12 took about 1 minute, cant be bothered to run it for longer) and the number of hashes does not follow n! as your reasoning suggests, but closer to n! ⋅ n.
!desmos graph showing three graphs, labeled #hashes, n factorial and n factorial times n
anyway with your proposed function it doesn’t seem to be possible to achieve O(n!²) complexity
also dont be so negative about your own creation. you could write an entire paper about this problem imho and have a problem with your name on it. though I would rather not have to title a paper “complexity of the magic lobster party problem” so yeah
Good effort of actually implementing it. I was pretty confident my solution is correct, but I’m not as confident anymore. I will think about it for a bit more.
So in your code you do the following for each permutation:
for (int i = 0; i<n;i++) {
You’re iterating through the entire list for each permutation, which yields an O(n x n!) time complexity. My idea was an attempt to avoid that extra factor n.
I’m not sure how std implements permutations, but the way I want them is:
1 2 3 4 5
1 2 3 5 4
1 2 4 3 5
1 2 4 5 3
1 2 5 3 4
1 2 5 4 3
1 3 2 4 5
etc.
Note that the last 2 numbers change every iteration, third last number change every 2 iterations, fourth last iteration change every 2 x 3 iterations. The first number in this example change every 2 x 3 x 4 iterations.
This gives us an idea how often we need to calculate how often each hash need to be updated. We don’t need to calculate the hash for 1 2 3 between the first and second iteration for example.
The first hash will be updated 5 times. Second hash 5 x 4 times. Third 5 x 4 x 3 times. Fourth 5 x 4 x 3 x 2 times. Fifth 5 x 4 x 3 x 2 x 1 times.
So the time complexity should be the number of times we need to calculate the hash function, which is O(n + n (n - 1) + n (n - 1) (n - 2) + … + n!) = O(n!) times.
EDIT: on a second afterthought, I’m not sure this is a legal simplification. It might be the case that it’s actually O(n x n!), as there are n growing number of terms. But in that case shouldn’t all permutation algorithms be O(n x n!)?
The time complexity can be simplified as O(2.71828 x n!), which makes it O(n!), so it’s a legal simplification! (Although I thought wrong, but I arrived to the correct conclusion)
END EDIT.
We do the same for the second list (for each permission), which makes it O(n!^2).
Finally we do the hamming distance, but this is done between constant length hashes, so it’s going to be constant time O(1) in this context.
Maybe I can try my own implementation once I have access to a proper computer.
you forgot about updating the hashes of items after items which were modified , so while it could be slightly faster than O((n!×n)²) , not by much as my data shows .
in other words , every time you update the first hash you also need to update all the hashes after it , etcetera
so the complexity is O(n×n + n×(n-1)×(n-1)+…+n!×1) , though I dont know how to simplify that
I’m taking into account that when I update a hash, all the hashes to the right of it should also be updated.
Number of hashes is about 2.71828 x n! as predicted. The time seems to be proportional to n! as well (n = 12 is about 12 times slower than n = 11, which in turn is about 11 times slower than n = 10).
Interestingly this program turned out to be a fun and inefficient way of calculating the digits of e.
Since I decided to pack the hashes and previous number values into a single array and then forgot to actually properly format the values, the hash counts generated by my code were nonsense. Not sure why I did that honestly.
Also, my data analysis was trash, since even with the correct data, which as you noted is in a lineal correlation with n!, my reasoning suggests that its growing faster than it is.
Here is a plot of the incorrect ratios compared to the correct ones, which is the proper analysis and also clearly shows something is wrong.
!Desmos graph showing two data sets, one growing linearly labeled incorrect and one converging to e labeled #hashes
Anyway, and this is totally unrelated to me losing an internet argument and not coping well with that, I optimized my solution a lot and turns out its actually faster to only preform the check you are doing once or twice and narrow it down from there. The checks I’m doing are for the last two elements and the midpoint (though I tried moving that about with seemingly no effect ???) with the end check going to a branch without a loop. I’m not exactly sure why, despite the hour or two I spent profiling, though my guess is that it has something to do with caching?
Also FYI I compared performance with -O3 and after modifying your implementation to use sdbm and to actually use the previous hash instead of the previous value (plus misc changes, see patch).
Surely you could implement this via a sorting algorithm? If you can prove the distance function is a metric and both lists contains elements from the same space under that metric, isn’t the answer to sort both?
if I’m not mistaken , a example of a problem where O(n!²) is the optimal complexity is :
There are n traveling salespeople and n towns . find the path for each salesperson with each salesperson starting out in a unique town , such that the sum d₁ + 2 d₂ + … + n dₙ is minimised, where n is a positive natural number , dᵢ is the distance traveled by salesperson i and i is any natural number in the range 1 to n inclusive .
pre post edit, I realized you can implement a solution in 2(n!) :(
The alt text is nice, too: The weird sense of duty really good sysadmins have can border on the sociopathic, but it’s nice to know that it stands between the forces of darkness and your cat blog’s servers.
Oh that’s interesting. I started poking around with a Gameboy emulator guide implemented in Python that intended to emulate a Z80. Got any good resource recommendation in case I decide to pick this back up and inevitably get stuck?
When I was young, we didn’t have hex codes, we only had 1 and 0s. One time we where all out of 1s, and I had to code a whole Database system with only 0s!
If you want some modern day fun with this, try the Zachtronics programming games; TIS-100, Shenzhen I/O, and Exapunks.
Or, my personal favorite I only discovered somewhat recently, try Turing Complete. You start by designing all your logic gates from just a negate gate IIRC. You eventually build up an ALU and everything else you need and then create your own computer. Then you define your own assembly language and have to write programs in your assembly language that run on the computer you’ve designed to complete different tasks. It’s a highly underrated game, although it takes a certain type of person to enjoy.
I would say it’s very polished. It does everything you’d expect and has some nice QoL features. There was work on a big update that’d improve performance and things, but the last information about that was from Aug of last year as far as I can tell. That’s not a big deal though. The game works fine without it.
Another interesting low-level interpreter/emulated system to look into for anyone else trying to get started with this type of thing is the CHIP-8! It’s a pretty basic 8/16-bit instruction set (there are 35 opcodes, the instructions themselves are mostly simple) and there are tons of detailed guides on making one and writing roms for them.
In the modern world it’s completely subjective.
The lowest-level language is probably ASM/machine code, as many people at least edit that regularly, and the highest-level would be LLMs. They are the shittiest way to program, yes, but technically you just enter instructions and the LLM outputs eg. Python, which is then compiled to bytecode and run. Similar to how eg. Java works. And that’s the subjective part; many people (me included) don’t use LLMs as the only way to program, or only use them for convenience and some help, therefore the highest level language is probably either some drag-and-drop UI stuff (like scratch), or Python/JS. And the lowest level is either C/C++ (because “no one uses ASM anyway”), or straight up machine code.
My school had nothing about react, node, angular, angularJS, SaaS, etc. back in 2015.
We learned Perl, PHP, LAMP stack, SOAP based APIs and other “antiquated” things. Provided a solid foundation of fundamentals that I’ve built a nice career on.
It might have been by design to get a feel for the fundamentals. Or maybe it’s just because the people teaching it have probably left the industry and are teaching how they did it.
My department head was in his 70s and my professors all trended on the older side.
Just fyi, Randall who makes xkcd has a very permissive approach and offers hotlinks on the site for easy embedding. I think he prefers that you hotlink rather than reupload.
Exactly. For every level of abstraction, the abstractor is the high level and the abstractee is the lower level. Those aren’t real words perhaps, but you get what I’m saying. It’s all relative along the chain of abstraction.
Is it a chain though? I think it’s more of a branching network that (almost?) always is stopped at quantum physics and it’s theories or some form philosophy.
It’s not really abstraction though. It is more like syntactic sugar. In stead of 1000111011 you say ADD, but it is still the exact same thing. There is no functional, prgrammatical benefit of one over the other. It’s just that asm is readable by humans.
At least thats as far as I understand asm. I haven’t gone beyond NandToTetris
I would argue they don’t know what that means really. Assembly is pretty much a mapping of words to machine code. It’s just a way to make machine code easier to read. It doesn’t actually change how it works.
A compiler re-arranges and modifies things so what you write isn’t the same as the final program that is created. With assembly it is. It’s not really an abstraction, but a translation. It doesn’t move you further from the machine, it only makes it so you’re speaking the same language.
I’m sure there’s nothing wrong with the program at all =)
Modern webapp deployment approach is typically to have an automated continuous build and deployment pipeline triggered from source control, which deploys into a staging environment for testing, and then promotes the same precise tested artifacts to production. Probably all in the cloud too.
Compared to that, manually FTPing the files up to the server seems ridiculously antiquated, to the extent that newbies in the biz can’t even believe we ever did it that way. But it’s genuinely what we were all doing not so long ago.
It’s that automated workflow ? With human checkpoints ?
Like, a programmer will ‘hit save’ and drop his work in version control, which automatically lands in a development environment, is promoted to test, and lands in the queue of a tester, and so on ?
Like anything else, it’s good to know how to do it in many different ways, it may help you down the line.
In production in an oddball environment, I have a python script to ftp transfer to a black box with only ftp exposed as an option.
Another system rebuilds nightly only if code changes, publishing to a QC location. QC gives it a quick review (we are talking website here, QC is “text looks good and nothing looks weird”), clicks a button to approve, and it gets published the following night.
I’ve had hardware (again, black box system) where I was able to leverage git because it was the only command exposed. Aka, the command they forgot to lock down and are using to update their device. Their intent was to sneakernet a thumb drive over to it for updates, I believe in sneaker longevity and wanted to work around that.
So you should know how to navigate your way around in FTP, it’s a good thing! But I’d also recommend learning about all the other ways as well, it can help in the future.
(This comment brought to you by “I now feel older for having written it”, and “I swear I’m only in my fourties,”)
Nah, it’s probably more efficient to .tar.xz it and use netcat.
On a more serious note, I use sftp for everything, and git for actual big (but still personal) projects, but then move files and execute scripts manually.
And also, I cloned my old Laptops /dev/sda3 to my new Laptops /dev/main/root (on /dev/mapper/cryptlvm) via netcat over a Gigabit connection with netcat. It worked flawlessly. I love Linux and its Philosophy.
Netcat is basically just a utility to listen on a socket, or connect to one, and send or receive arbitrary data. And as, in Linux, everything is a file, which means you can handle every part of your system (eg. block devices [physical or virtual disks]) like a normal file, i.e. text, you can just transfer a block device (e.g. /dev/sda3) over raw sockets.
It’s perfectly fine for some private page etc. but when you make business software for customers that require 99,9% uptime with severe contractual penalties it’s probably too wonky.
Think of this like saying using a scythe to mow your lawn is antiquated. If your lawn is tiny then it doesn’t really matter. But we’re talking about massive “enterprise scale” lawns lol. You’re gonna want something you can drive.
Shitty companies did it like that back then - and shitty companies still don’t properly utilize what easy tools they have available for controlled deployment nowayads. So nothing really changed, just that the amount of people (and with that, amount of morons) skyrocketed.
I had automated builds out of CVS with deployment to staging, and option to deploy to production after tests over 15 years ago.
It’s good practice to run the deployment pipeline on a different server from the application host(s) so that the deployment instances can be kept private, unlike the public app hosts, and therefore can be better protected from external bad actors. It is also good practice because this separation of concerns means the deployment pipeline survives even if the app servers need to be torn down and reprovisioned.
Of course you will need some kind of agent running on the app servers to be able to receive the files, but that might be as simple as an SSH session for file transfer.
But it’s genuinely what we were all doing not so long ago
Jokes on you, my first job was editing files directly in production. It was for a webapp written in Classic ASP. To add a new feature, you made a copy of the current version of the page (eg index2_new.asp became index2_new_v2.asp) and developed your feature there by hitting the live page with your web browser.
When you were ready to deploy, you modified all the other pages to link to your new page
Of course, it’s going to be difficult to find a modern application where each individually deployed component isn’t at least 7MB of compiled source (and 50-200MB of container), compared to this single 7MB war that contained everything.
Except that FileZilla does come with bundled adware from their sponsors and they do want you to pay for the pro version. It probably is the shittiest GPL-licensed piece of software I can think of.
programmer_humor
Hot
This magazine is from a federated server and may be incomplete. Browse more on the original instance.