Question Improtant: sphinxsearch delta problems with 1.1.5

Status
Not open for further replies.

janslu

Customer
I need help. I'm running a big board with 10+ million posts and I use digitalpoint sphinx search. It's been working perfectly until yesterday. 2 hours after I installed dbseo 1.1.5, and enabled content tagging the server I'm running indexer on began to slow down. I have been observing serious speed degradation, load increase and very high IO load. It took me a while before I pinpointed the problems on indexer delta cron job (running every 5 minutes). I've been experimenting since then and I'm completely stuck. Here's what I know:
1. until 1.1.4 it worked great, indexing deltas took couple of seconds and didn't cause observable load
2. since 1.1.5 the time to generate deltas increases after full index refresh with each delta. Below I've inserted some data from cronjobs. The first row is done before full index rotation. Second is done few minutes after rotation, each consecutive shows little increase in actual docs, but enormous increase in attr values and increasing time needed to sort and index this increasing number of values.


17:21

indexing index 'vb_postdelta'...
collected 3947 docs, 2.0 MB
collected 51120447 attr values
sorted 51.1 Mvalues, 100.0% done
sorted 0.3 Mhits, 100.0% done
total 3947 docs, 2048804 bytes
total 62.443 sec, 32810 bytes/sec, 63.20 docs/sec
total 20904623 reads, 9.075 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 1276 writes, 5.381 sec, 1194.5 kb/call avg, 4.2 msec/call avg

17:50

collected 53 docs, 0.0 MB
collected 1722624 attr values
sorted 1.7 Mvalues, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 53 docs, 19725 bytes
total 2.257 sec, 8736 bytes/sec, 23.47 docs/sec
total 713462 reads, 0.339 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 54 writes, 0.126 sec, 951.0 kb/call avg, 2.3 msec/call avg

18:10

collected 83 docs, 0.0 MB
collected 3898263 attr values
sorted 3.9 Mvalues, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 83 docs, 36709 bytes
total 5.419 sec, 6773 bytes/sec, 15.31 docs/sec
total 1592234 reads, 0.742 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 108 writes, 0.275 sec, 1074.4 kb/call avg, 2.5 msec/call avg

18:30

collected 120 docs, 0.1 MB
collected 5414463 attr values
sorted 5.4 Mvalues, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 120 docs, 53994 bytes
total 6.408 sec, 8425 bytes/sec, 18.72 docs/sec
total 2208323 reads, 1.012 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 146 writes, 0.491 sec, 1103.7 kb/call avg, 3.3 msec/call avg

18:50

collected 158 docs, 0.1 MB
collected 5869297 attr values
sorted 5.9 Mvalues, 100.0% done
sorted 0.0 Mhits, 100.0% done
total 158 docs, 71372 bytes
total 8.821 sec, 8090 bytes/sec, 17.91 docs/sec
total 2393534 reads, 1.087 sec, 0.0 kb/call avg, 0.0 msec/call avg
total 157 writes, 0.885 sec, 1112.8 kb/call avg, 5.6 msec/call avg

Can you think of any way the new version could influence post indexing?
 
To eliminate DBSEO as a variable, you can revert to v1.1.4 by re-uploading those files and re-importing the XML file. There were no DB changes between these versions, so no conflicts should occur.

Since there were no DB changes, I can't think of a reason why Sphinx would be affected.
 
It seems to be related to tags rather than dbseo version. My board with 1.1.4 seems to have the same issue.
This is the query that generates attr values:
Code:
SELECT post.postid AS doc_id, tagcontent.tagid FROM vb_digitalpoint_sphinx_delta AS digitalpoint_sphinx_delta RIGHT JOIN vb_tagcontent AS tagcontent ON(tagcontent.contentid = digitalpoint_sphinx_delta.primaryid) LEFT JOIN vb_post AS post ON(tagcontent.contentid = post.threadid) WHERE digitalpoint_sphinx_delta.contenttypeid = (SELECT contenttypeid FROM vb_contenttype AS contenttype WHERE class = 'Thread') AND primaryid BETWEEN 51088 AND 71455;
I'll keep working on this, but at this moment I have no idea why adding tags causes this exponential increase in attr values...
 
Last edited:
The reason why that is causing a delay is the sheer number of tags that have been added. I recommend purging your tags, then re-adding tags but only for threads newer than, say, 1 year.

You can do both of these things via the DBSEO CP.
 
But the real problem here is a constant rapid increase of indexed data AFTER the tagging is done by dbseo. I think that there's a bug in sphinxsearch causing repeating indexing of the same data during delta index runs. But this is only a wild guess at the moment. I will try to get help from digitalpoint...
 
I have tried to resolve the problem with digitalpoint and it's not possible. One of the way search works in vbulletin is a search by tags. And tagging a thread tags all posts inside. My forum's distinct character is that users are writing in hyperlong threads spanning hundreds of pages. So, whenever a long thread is tagged, search queue receives a number of elements that is a combination of tags and number of posts inside. In my case these were millions of posts to index. Index rotation took an order of magnitude more to generate (couple of hours instead of minutes for full rotation) and caused a server overload. Gradual increase of load was probably caused by additional threads being tagged, increase in number of tags etc...
To sum it all up - vbulletin's decision to combine thread tags with posts tags causes problems on forums like mine - where a single thread can contain many thousands of posts. Searching by tag doesn't make any sense in such situations...
 
Last edited:
In that case you should probably clear the tagging DB in DBSEO and turn off the system, unfortunately :(
 
Status
Not open for further replies.

Legacy DragonByte SEO

vBulletin 3.8.x vBulletin 4.x.x
Seller
DragonByte Technologies
Release date
Last update
Total downloads
7,185
Customer rating
5.00 star(s) 1 ratings
Back
Top