DYK that Kiwix was actually created by Wikipedia? Back in the late 2000s there was this gigantic effort to select and improve a ton of articles to make an offline "Wikipedia 1.0" release. The only remains of that effort are Kiwix, periodic backups, and an incredibly useful article-rating system.
There is a set of criteria to rate an article B, C, Start or Stub. These are called classes. Similarly, articles can be rated to be of 1 of 4 importance values to a particular WikiProject.
There’s a banner on every article’s talk page. Any editor can change an article’s rating between one of the above classes boldly; if a revert happens, they discuss it according to the criteria.
Some WikiProjects have their own criteria for rating articles. Some of them even have process to make an article A-class.
Before this system, Wikipedia already had processes to make an article a Good Article or Featured article.
With GAs, a nominator should put a candidate onto backlog. Later, a reviewer will scrutinize the article according to criteria. Often, the reviewer asks the nominator to fix quite a bit of issues. If these issues are fixed promptly, or the reviewer thinks that there are only nitpicks, the article passes. If they aren’t fixed in a week or the reviewer thinks that there are major problems, the article fails.
As with other processes, the nominator and reviewer can be anyone, though reviewers are usually experienced.
With FAs, a nominator brings the candidate to a noticeaboard. Editors there then come to a consensus about whether the article should pass.
Both processes display a badge directly on passed articles.
Both processes have an associated re-review process where editors come to a consensus whether the article should fail if it were nominated today
There’s also an informal process called “peer review”, where someone just puts an article at a noticeable and anyone can comment about its quality.
Articles are automatically sorted into categories by their rating and importance. Editors usually look at these to decide which articles to focus on nowadays.
I am currently reading on terrorists while in the states. But something tells me I will get my IP banning me. But I have read a shitton and I highly doubt its just 100gb. Otherwise you would see it more on piracy sites.
Currently where I am working can't get high. But can get drunk. But was neither when I wrote that. My ISP is very brutal on looking up stuff or downloading shit.
I know there are a few companies working on DNA storage. From the comment below about the entirety of Wikipedia and Wiki Commons, I’d say that’d be a pretty practical thing to store.
Probably a lot less, keep in mind that whenever it answers a question the whole model is traversed multiple times, going through multiple GBs is not possible in the matter of seconds the model answers.
I’d be surprised if it was significantly less. A comparable 70 billion parameter model from llama requires about 120GB to store. Supposedly the largest current chatgpt goes up to 170 billion parameters, which would take a couple hundred GB to store. There are ways to tradeoff some accuracy in order to save a bunch of space, but you’re not going to get it under tens of GB.
These models really are going through that many Gb of parameters once for every word in the output. GPUs and tensor processors are crazy fast. For comparison, think about how much data a GPU generates for 4k60 video display. Its like 1GB per second. And the recommended memory speed required to generate that image is like 400GB per second. Crazy fast.
I mean, you can self-host your own local LLMs using something like Ollama. The performance will be bound by the disk space you have (the complexity of the model you’re able to store), and the performance of the CPU or GPU you are using to run it, but it does work just fine. Probably as good results as ChatGPT for most use cases.
We do this at work (lots of sensitive data that we don’t want Openai to capitalize on) and it works pretty well. Hosted locally, setup by a data security and privacy sensitive admin, who specifically runs the settings to not save any queries even on the server. Bit slower than chatgpt but not by much
Download the kiwix app for whatever OS you’re using, then go into Kiwix and click on the folder icon in the app and navigate to where the .zim file you downloaded is located. If you click it it should automatically pop-up and be viewable.
If you did that and it’s still failing, is it giving you a specific error or anything?
This saved my ass at my engineering chemistry exam (still a requirement, even for software engineers) where only offline tools were allowed. Love Kiwix!
Tin is a prima donna metal. Grows conductive whiskers if you use it as a conductor, gets brittle if it gets cold and just makes things softer when alloyed. It’s like it only wants to be looked at.
til
Oldest
This magazine is from a federated server and may be incomplete. Browse more on the original instance.