There have been multiple accounts created with the sole purpose of posting advertisement posts or replies containing unsolicited advertising.

Accounts which solely post advertisements, or persistently post them may be terminated.

File indexing and search tool with specific features?

Hey Linux community,

I’m struggling with a file management issue and hoping you can help. I have a large media collection spread across multiple external hard drives. Often, when I’m looking for a specific file, I can’t remember which drive it’s on.

I’m looking for a file indexing and search tool that meets the following requirements:

  • Ability to scan multiple locations
  • Option to exclude specific folders or subfolders from both scan and search
  • File indexing for quicker searches
  • Capability to search indexed files even when the original drive is disconnected
  • Real-time updates as files change

Any recommendations for tools that meet most or all of these criteria? It would be a huge help in organizing and finding my media files.

Thanks in advance for any suggestions!

Evotech ,

Plex if it’s movies, stash if it’s porn…

refalo ,

hydrus network has entered the chat

ssm , (edited )
@ssm@lemmy.sdf.org avatar

Ability to scan multiple locations


<span style="color:#323232;">find /path/one /path/two </span><span style="font-weight:bold;color:#a71d5d;">[</span><span style="color:#323232;">expression</span><span style="font-weight:bold;color:#a71d5d;">]
</span>

Option to exclude specific folders or subfolders from both scan and search


<span style="color:#323232;">find /some/path -type d ! </span><span style="color:#0086b3;">(</span><span style="color:#323232;"> -name  exclusion1 -o -name exclusion2 ... </span><span style="color:#0086b3;">) </span><span style="font-weight:bold;color:#a71d5d;">[</span><span style="color:#323232;">expression</span><span style="font-weight:bold;color:#a71d5d;">]
</span>

File indexing for quicker searches

Not indexing, but you can make find faster through parallelization if you have the extension for xargs.


<span style="font-style:italic;color:#969896;"># find -print0 is an extension which separates files found by '
solrize , (edited )

[search indexed files that are offline] One would hope this is not possible.

I think the idea is store the search index in a separate place from the file. For indexing text though, I’ve found that the index is comparable in size to the file itself. It’s not entirely clear to me what OP wants to search. Something like email? Obviously if it’s just metadata for media files (kilobyte text description of a gigabyte video) then the search index can be tiny.

Real-time updates as files change

Would require non-portable script that stores each file’s mtime in an array and compares the old mtime against the new mtime using stat, and then loop. Maybe implement as a daemon.

That is what inotify is for.

I realize your overall answer was mostly snark, but the problems mentioned really do take some work to solve. For example, if you want to index email, you want the indexer to understand email headers so it can do the right things with the timestamps and other fields. You can’t just chuck everything into a big generic search engine and press “blend”.

I will mention git-annex which is for sort of a different problem, but it can help you manually track where your offline files are, more or less.

Euro , (edited )

Funnily enough I’ve been looking for a similar utility.

I use jellyfin, and yacy for my local media/documents

Jellyfin isn’t really a search engine, and it may or may not work if you disconnect the drives.

From my experience with shows and movies it does great with metadata and displaying what i have in my collection. However it’s not as good for searching images/videos, as you have to search the exact image/video name (unless it has metadata)

Yacy on the other hand, is much more like a traditional search engine, with an index and all. It’s great for documents (html, md, txt even docx), but doesn’t do well with media files, as it can’t pull metadata, so you have to search all media by title.

I dont think yacy has real time updating, if it does, idk how to enable it.

Both yacy and jellyfin have a way to blacklist things, but they’re just completely different

yacy has a url based blacklist, while jellyfin only displays stuff from folders you tell it to (basically a whitelist)

There was a program that I had stumbled across that was able to index a photos folder using image recognition to generate a description that you could search. I have since forgotten the name of the program but it does exist, and if I find it again I’ll update this comment.

Personally I want something that works like yacy for traditional documents, and can use image recognition for images, but I have yet to find it.

EDIT: I have found the program that does image recognition: sist2I have tried it once before, from experience the sqlite search is a bit janky but works decently enough imo, i haven’t tried the other indexing method.

breadsmasher ,
@breadsmasher@lemmy.world avatar

Are the files specific (media - movies, tv, music) or just any type of file at all?

CoderSupreme OP ,

Yes, they are all media but they are not specific to a single type of media. Today I may want to find a book and tomorrow a song with the same program. So the files can be literature, audio, movies, series, etc.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • [email protected]
  • random
  • lifeLocal
  • goranko
  • All magazines