This afternoon I experimented with tag searching. Current implementation doesn’t scale beyond 6 tags due to parsing requirement. You can argue about binding variable all day but in the end it can be abused easily if the limit is lifted.
Here’s the original, awesomely monstrous query:
SELECT (SELECT name FROM tags WHERE id = pt0.tag_id) AS tag, Count(pt0.*) AS tag_count FROM posts_tags pt0, posts_tags pt1, posts_tags pt2, posts_tags pt3, posts_tags pt4, posts_tags pt5, posts_tags pt6, posts_tags pt7, posts_tags pt8, posts_tags pt9, posts_tags pt10 WHERE pt0.post_id = pt1.post_id AND (SELECT TRUE FROM POSTS p0 WHERE p0.id = pt0.post_id AND p0.status <> 'deleted') AND pt1.post_id = pt2.post_id AND pt1.post_id = pt3.post_id AND pt1.post_id = pt4.post_id AND pt1.post_id = pt5.post_id AND pt1.post_id = pt6.post_id AND pt1.post_id = pt7.post_id AND pt1.post_id = pt8.post_id AND pt1.post_id = pt9.post_id AND pt1.post_id = pt10.post_id AND pt1.tag_id = (SELECT id FROM tags WHERE name = 'pantsu') AND pt2.tag_id = (SELECT id FROM tags WHERE name = 'amaha_miu') AND pt3.tag_id = (SELECT id FROM tags WHERE name = 'angelina_nanatsu_sewell') AND pt4.tag_id = (SELECT id FROM tags WHERE name = 'uryuu_sakuno') AND pt5.tag_id = (SELECT id FROM tags WHERE name = 'sena_airi') AND pt6.tag_id = (SELECT id FROM tags WHERE name = 'seifuku') AND pt7.tag_id = (SELECT id FROM tags WHERE name = 'thighhighs') AND pt8.tag_id = (SELECT id FROM tags WHERE name = 'izumi_tsubasu') AND pt9.tag_id = (SELECT id FROM tags WHERE name = 'palette') AND pt10.tag_id = (SELECT id FROM tags WHERE name = 'mashiroiro_symphony') GROUP BY pt0.tag_id ORDER BY tag_count DESC;
A kawaii~ 10 tables join. The query itself is fast (~30 ms)… after parsing. It takes ~2500 ms with parsing.
There’s an alternative for that query – Toxi solution (reference):
SELECT Count(1), pt.tag_id, t.name FROM posts_tags pt JOIN tags t ON ( pt.tag_id = t.id ) WHERE post_id IN (SELECT post_id FROM posts_tags ptin WHERE tag_id IN ( 5858, 12271, 4264, 16822, 43495, 20773, 168, 16823, 1148, 539 ) GROUP BY post_id HAVING Count(post_id) = 10) AND (SELECT true FROM posts p WHERE p.id = pt.post_id AND p.status <> 'deleted') GROUP BY pt.tag_id, t.name ORDER BY 1 DESC;
The parsing is fast (instant) but the query itself is a bit slow (~100 ms). The reference above suggested fetching all data per tag but sure sounds like a waste of query.
And then after few hours, I figured out another solution: just nest the damn queries. Its per tag search goes like this:
select post_id from posts_tags where tag_id = 1148
And then we need to further filter it like this (e.g. tag_id 1148 and 5858):
SELECT post_id FROM posts_tags WHERE tag_id = 1148 AND post_id IN (SELECT post_id FROM posts_tags WHERE tag_id = 5858);
Each queries are fast and simple. The results from inner are also obviously cacheable (by db). To complete the query, just repeat the patter above ad infinitum (or 10 times) and apply relevant additional queries to obtain result just like the original query:
select count(post_id), (select name from tags where id = pt.tag_id) tag from posts_tags pt where post_id in (select post_id from posts_tags where tag_id = 66 and post_id in (select post_id from posts_tags where tag_id = 168 and post_id in (select post_id from posts_tags where tag_id = 16822 and post_id in (select post_id from posts_tags where tag_id = 16823 and post_id in (select post_id from posts_tags where tag_id = 43495 and post_id in (select post_id from posts_tags where tag_id = 20773 and post_id in (select post_id from posts_tags where tag_id = 539 and post_id in (select post_id from posts_tags where tag_id = 1148 and post_id in (select post_id from posts_tags where tag_id = 5858 and post_id in (select post_id from posts_tags where tag_id = 4264 and post_id in (select post_id from posts_tags where tag_id = 12271 ))))))))))) group by tag_id order by 1 desc;
Looks a bit like monster but really just simple queries nested 10 times. Parsing speed is awesomely fast and the query itself runs at ~30 ms just like the original query.
And even better, this query is possible over Ruby on Rails (and perhaps any ORM) though not in single query but split per level.
Still, don’t you need another query for tag_id lookup? and I see you’ don’t test the result for deleted post at your solution, might end up with orphaned tags… I humbly think that some search limit was set for good as there’s not gonna be any solution would scaled up nicely both in code and performance for this case. Not sure about recursive query though.
Just my 2 cents… I’ve been away from sql related works fo a while now. 😀
Put `post_id = (select id from posts p where p.id = pt.post_id and p.status ‘deleted’)` before `GROUP BY`. In fact, using recursive query is much faster after certain amount of keywords since the search range will be much smaller.
The tag_id lookup is primary key lookup which is almost free.