While working on some projects, I wanted to develop a comment engine, so people could comment on posts. I know, I could just use a blog, but a blog wouldn't have fit these applications.
Additionally, I wanted comment threading and to be able to sort the hierarchy simply, using a minimum of RAM and CPU cycles.
For those of you who don't know what threading is, it's organizing articles/comments/entries in a hierarchy based on their relationship to a prior article/comment/post. For example:
|- Comment 1
|---- Comment 4 (reply to comment 1)
|------- Comment 8 (reply to comment 4)
|---- Comment 6 (reply to comment 1)
|- Comment 2
|---- Comment 3 (reply to comment 2)
|------- Comment 7 (reply to comment 3)
|---------- Comment 9 (reply to comment 7)
|- Comment 5
Now, so long as you know the parent (Comment four is the parent of Comment eight), you can recurse the comments and create a hierarchical array. But it's inefficient both on the PHP back end (recursion is laborious), and in terms of the SQL query that may have to pull way more comments than you want to display.
So, while it's a somewhat simple solution in terms of storing the comments, it's a memory and processor hog in terms of getting them out and displaying them. I was sure there had to be something more efficient.
I read about presort order traversal, recursive processes, and all sorts of stuff that made my head ache. Then, I *finally* saw a solution that was pure genius. Creating a "lineage string" for each comment.
Lineage is the parentage of the comment plus the comment's own unique id, using an autoincrement to create unique IDs. Parentage is the lineage of the parent. Let's create a super-simple table and show how this works.
CREATE TABLE `comments` (
`uid` INT( 4 ) UNSIGNED ZEROFILL NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`parentage` VARCHAR( 220 ),
`content` TEXT NOT NULL ,
`post_id` VARCHAR( 10 )
) ENGINE = MYISAM ;
Here's the threaded list above with the parentage strings you'd store in parentheses. For each comment, if it wasn't a reply to another comment, it would have no parentage. If it was a reply to a comment, it would have that comment's parentage with that comment's own ID appended to the parentage.
|- Comment 1 ()
|---- Comment 4 (-0001)
|------- Comment 8 (-0001-0004)
|---- Comment 6 (-0001)
|- Comment 2 ()
|---- Comment 3 (-0002)
|------- Comment 7 (-0002-0003)
|---------- Comment 9 (-0002-0003-0007)
|- Comment 5 ()
Now, if you just sorted by parentage, your list would not really be threaded. It would sort as follows.
Comment 1 ()
Comment 2 ()
Comment 5 ()
Comment 4 (-0001)
Comment 6 (-0001)
Comment 8 (-0001-0004)
Comment 3 (-0002)
Comment 7 (-0002-0003)
Comment 9 (-0002-0003-0007)
For this to sort properly, you need the "lineage" (parentage with the comments own uid appended). Then you get...
|Comment 1 (-0001)
|--Comment 4 (-0001-0004)
|----Comment 8 (-0001-0004-0008)
|--Comment 6 (-0001-0006)
|Comment 2 (-0002)
|--Comment 3 (-0002-0003)
|----Comment 7 (-0002-0003-0007)
|------Comment 9 (-0002-0003-0007-0009)
|Comment 5 (-0005)
Now they sort properly.
Storing the full lineage string as an entry in the table itself is difficult and creates other problems, because you have to know the comment's ID when you store it. But the lineage is just the parentage and uid separated with a dash. So combine them during the SELECT.
SELECT *, CONCAT(parentage, '-', uid) as lineage FROM `trees` WHERE
post_id = "post50" ORDER BY lineage ASC LIMIT 0,5
You get the following selection (lineage in parentheses)...
Comment 1 (-0001)
Comment 4 (-0001-0004)
Comment 8 (-0001-0004-0008)
Comment 6 (-0001-0006)
Comment 2 (-0002)
When you want to show the next 5, you merely adjust the limit.
Spooling them out would be a simple matter of counting the dashes in the lineage string to determine the indentation level.
Now, of course, if a record was deleted and the autoincrement values reindexed, your whole table goes to Hell. So you have to be careful that doesn't happen.
And, with a 4-digit zerofill, as used above, you're limited to 9999 comments in the database before things break. But I just used 4-digit to keep the examples clean.
An important thing to understand with this method is that it only works for one threaded sorting method, oldest to newest. You can sort in other ways unthreaded, but it only threads and indents correctly in oldest to newest.