Topic: Request - IQDB Database Export

Posted under e621 Tools and Applications

I'd like to request adding the E621 IQDB data to be added to the daily Database Exports. This seems like something that could easily be self-hosted but in order to setup IQDB we need the source images to generate the HAAR signatures (which would involve downloading tons of files), OR ideally we just download the HAAR signatures from the existing E621 IQDB database and import them into self-hosted IQDB instances.

E621 IQDB Search has upload whitelists, rate limits, and requires CSRF authenticity tokens which impedes automated, especially bulk, reverse image searching. Other third-party reverse image searches appear to also have significant rate limits for bulk searching. This makes sense given I imagine it uses a bit of compute to do and I understand why its rate limited and somewhat locked down.

From what I can see in the IQDB Source Code, the services stores searchable hashes alongside a corresponding Post ID, the hashes primarily residing under separate RGB channels. Given that E621 is self-hostable and a export of all posts is already provided, I'd like the IQDB HAAR hashes/signatures to also be provided which would permit efficient self-hosted reverse image searches of E621 posts from the database exports.

Option 1: Add IQDB sqlite dump to Database Exports
- Simply dump the sqlite database and serve it with the export so we could initialize a self-hosted IQDB with the sqlite db
- Probably best to convert to CSV instead and have it be iqdb-[date].csv.gz

Option 2: Add iqdb_data column to the posts.csv of Database Exports
- Reuses existing CSV, adds new CSV column with the information needed to sync into a self-hosted IQDB instance and perform reverse searches
- Encode the data for the post we have in the IQDB, or maybe hit the images/[post_id] endpoint and feed the corresponding .hash into the iqdb_haar_hash column in the CSV. Latter would require additional decoding

My preference is on Option 1. Its simplest, could easily be converted to a sqlite INSERT, and wont mess with the other existing CSV exports. To promote self hosting, we'd likely want an additional iqdb endpoint to add an image by a known hash but this is something I could do on my own time.

Thanks!

Donovan DMC

Former Staff

I was going to say it would likely be too large, but based on my own IQDB for my booru (446464 bytes total with 1363 images), I've averaged out to 327.6 bytes which would make an export for e6's IQDB somewhere around 2 gigabytes, which is just above the compressed size of the posts export, and would likely be halved when compressed, so size isn't an issue

Option 2 really isn't possible since they don't live in the same database, and possibly not even the same server - the posts export is a direct postgres dump and isn't modified any further, adding onto it like this would require a significant amount of processing time (it is NOT trivial to do anything with a 4 gigabyte CSV), and the exports are already bordering on taking an hour (they run around 7AM UTC, and from the timestamps take around 50 minutes to complete)

for the record I would also like a dump of the iqdb database so I could avoid ratelimits but I also don't see it happening any time soon

Aacafah

Moderator

Problem is, the person who set up the current DB exports isn't working with us anymore (no bad blood or anything afaik; they're still on the Discord server & the site), and basically 1 person has access & familiarity to set it up; that being the sysadmin for all of BD's sites. As you can imagine, they're pretty busy.

For the record, I'm all for it, I just don't have the knowledge (nor time tbh) to do it myself, & don't know who does.