This afternoon I experimented with tag searching. Current implementation doesn’t scale beyond 6 tags due to parsing requirement. You can argue about binding variable all day but in the end it can be abused easily if the limit is lifted.
Here’s the original, awesomely monstrous query:
SELECT (SELECT name
FROM tags
WHERE id = pt0.tag_id) AS tag,
Count(pt0.*) AS tag_count
FROM posts_tags pt0,
posts_tags pt1,
posts_tags pt2,
posts_tags pt3,
posts_tags pt4,
posts_tags pt5,
posts_tags pt6,
posts_tags pt7,
posts_tags pt8,
posts_tags pt9,
posts_tags pt10
WHERE pt0.post_id = pt1.post_id
AND (SELECT TRUE
FROM POSTS p0
WHERE p0.id = pt0.post_id
AND p0.status <> 'deleted')
AND pt1.post_id = pt2.post_id
AND pt1.post_id = pt3.post_id
AND pt1.post_id = pt4.post_id
AND pt1.post_id = pt5.post_id
AND pt1.post_id = pt6.post_id
AND pt1.post_id = pt7.post_id
AND pt1.post_id = pt8.post_id
AND pt1.post_id = pt9.post_id
AND pt1.post_id = pt10.post_id
AND pt1.tag_id = (SELECT id
FROM tags
WHERE name = 'pantsu')
AND pt2.tag_id = (SELECT id
FROM tags
WHERE name = 'amaha_miu')
AND pt3.tag_id = (SELECT id
FROM tags
WHERE name = 'angelina_nanatsu_sewell')
AND pt4.tag_id = (SELECT id
FROM tags
WHERE name = 'uryuu_sakuno')
AND pt5.tag_id = (SELECT id
FROM tags
WHERE name = 'sena_airi')
AND pt6.tag_id = (SELECT id
FROM tags
WHERE name = 'seifuku')
AND pt7.tag_id = (SELECT id
FROM tags
WHERE name = 'thighhighs')
AND pt8.tag_id = (SELECT id
FROM tags
WHERE name = 'izumi_tsubasu')
AND pt9.tag_id = (SELECT id
FROM tags
WHERE name = 'palette')
AND pt10.tag_id = (SELECT id
FROM tags
WHERE name = 'mashiroiro_symphony')
GROUP BY pt0.tag_id
ORDER BY tag_count DESC;
A kawaii~ 10 tables join. The query itself is fast (~30 ms)… after parsing. It takes ~2500 ms with parsing.
There’s an alternative for that query – Toxi solution (reference):
SELECT Count(1),
pt.tag_id,
t.name
FROM posts_tags pt
JOIN tags t
ON ( pt.tag_id = t.id )
WHERE post_id IN (SELECT post_id
FROM posts_tags ptin
WHERE tag_id IN ( 5858, 12271, 4264, 16822,
43495, 20773, 168, 16823,
1148, 539 )
GROUP BY post_id
HAVING Count(post_id) = 10)
AND (SELECT true
FROM posts p
WHERE p.id = pt.post_id
AND p.status <> 'deleted')
GROUP BY pt.tag_id,
t.name
ORDER BY 1 DESC;
The parsing is fast (instant) but the query itself is a bit slow (~100 ms). The reference above suggested fetching all data per tag but sure sounds like a waste of query.
And then after few hours, I figured out another solution: just nest the damn queries. Its per tag search goes like this:
select post_id from posts_tags where tag_id = 1148
And then we need to further filter it like this (e.g. tag_id 1148 and 5858):
SELECT post_id
FROM posts_tags
WHERE tag_id = 1148
AND post_id IN (SELECT post_id
FROM posts_tags
WHERE tag_id = 5858);
Each queries are fast and simple. The results from inner are also obviously cacheable (by db). To complete the query, just repeat the patter above ad infinitum (or 10 times) and apply relevant additional queries to obtain result just like the original query:
select count(post_id), (select name from tags where id = pt.tag_id) tag
from posts_tags pt where
post_id in
(select post_id from posts_tags where
tag_id = 66 and post_id in
(select post_id from posts_tags where
tag_id = 168 and post_id in
(select post_id from posts_tags where
tag_id = 16822 and post_id in
(select post_id from posts_tags where
tag_id = 16823 and post_id in
(select post_id from posts_tags where
tag_id = 43495 and post_id in
(select post_id from posts_tags where
tag_id = 20773 and post_id in
(select post_id from posts_tags where
tag_id = 539 and post_id in
(select post_id from posts_tags where
tag_id = 1148 and post_id in
(select post_id from posts_tags where
tag_id = 5858 and post_id in
(select post_id from posts_tags where
tag_id = 4264 and post_id in
(select post_id from posts_tags where
tag_id = 12271
)))))))))))
group by tag_id order by 1 desc;
Looks a bit like monster but really just simple queries nested 10 times. Parsing speed is awesomely fast and the query itself runs at ~30 ms just like the original query.
And even better, this query is possible over Ruby on Rails (and perhaps any ORM) though not in single query but split per level.
Still, don’t you need another query for tag_id lookup? and I see you’ don’t test the result for deleted post at your solution, might end up with orphaned tags… I humbly think that some search limit was set for good as there’s not gonna be any solution would scaled up nicely both in code and performance for this case. Not sure about recursive query though.
Just my 2 cents… I’ve been away from sql related works fo a while now. 😀
Put `post_id = (select id from posts p where p.id = pt.post_id and p.status ‘deleted’)` before `GROUP BY`. In fact, using recursive query is much faster after certain amount of keywords since the search range will be much smaller.
The tag_id lookup is primary key lookup which is almost free.