Posts Tagged “mysql”

You've seen them as a feature of Web 2.0 web sites, big bunches of keywords called tags or a tag cloud. You've got a large database of items and you'd like to add this feature to your site. It's easier than you might think.

Step 1: Getting The Tags From Your Users

The first part of this is getting the tags from your users. The standard method seems to be to give the users a text input field and letting them type in any tags they want, splitting up multiple tags with commas or carriage returns. We'll look at doing it with commas. First, the form field:

<b>Please enter your tags, separating each one with a comma.<b><br>
<input type="text" name="tags" size=50>

On the back-end form processing script, to convert that into a list of tags, you do this:

$taginput = $_POST["tags"];
$tagarray = explode(",",$taginput);

The explode command chops up that long string of tags into an array of tags, splitting at the commas. Now you've got your list of tags. You'll still need to do some minor clean-up on them before putting them into the database, but we'll get into that when we put them into it.

Step 2: Getting The Tags Into Your Database

There are multiple organizational methods that will work for this, but we're going to go for the simplest. It requires two database tables and assumes that the first table is one you already have, listing all your items you want to tag. That's Table 1 and we'll call it "items", calling your primary unique ID key "uid". These aren't names you have to use. But I have to use something for example purposes. For example purposes, we're also going to assume that the "uid" field in Table 1 is a number (probably an autoincrementing integer).

The second table looks something like this...

CREATE TABLE `tags` (
`uid` INT( 11 ) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`item_id` INT( 11 ) NOT NULL,
`tag` VARCHAR( 50 ) NOT NULL
) ENGINE = MYISAM ;

What we're basically doing is creating a simple table that will store the ID of an item from Table 1 and an associated tag. We've set the maximum length of the tag to 50 characters. You can go up a bit or down a bit, but unless you're planning on letting people tag stuff with complete sentences, put a limit on length.

Now we go back to $tagarray from Part 1. I'm going to assume you've already put the code into your back-end script that connects it to your database and we'll go from there. I'm also going to assume that your form or some other method has fed an item ID to your script so we know which item will own this batch of tags and that you've checked it to make sure it's a number. We'll call that $itemid.

To put the tags into the database, we do the following:

for($i=0;$i<count($tagarray);$i++){
$usetag = mysql_real_escape_string(stripslashes(ltrim(rtrim($tagarray[$i]))));
if($usetag == "") continue;
$query = "INSERT INTO tags (item_id,tag) VALUES ($itemid,'$usetag')";
mysql_query($query);
}

Now what the heck is $usetag = mysql_real_escape_string(ltrim(rtrim(stripslashes($tagarray[$i])))) doing? It's doing clean-up. First rtrim and ltrim are cleaning off any white space on the ends of the tag. Then stripslashes is removing any slashes your web server might have added to sanitize the input. Last mysql_real_escape_string both sanitizes the input to try to prevent SQL injection attacks and puts in slashes where needed.

If you didn't do the stripslashes first, then mysql_real_escape_string would slash the slashes, causing problems in the future, like couldn't displaying as couldn\'t. If you've ever seen a site with backslashes showing up like that, it means they're not doing good slash management.

You'll also notice that right after that, we state that if $usetag == "" the script should "continue". In this instance, that means "quit executing the tag-handling code for this tag and start over with the next tag". That's to keep you from getting a bunch of entries with no tag value at all.

Now your tags are in the database. How do we get them out?

Step 3: Using The Tags

There are going to be three ways you'll want to use the tags:

  • Showing all the tags in your tag cloud

  • Showing all the tags for a specific item
  • Showing all the items for a specific tag

Showing all the tags in your tag cloud:

Here's your database query:

SELECT count(*) as num, tag FROM tags GROUP BY tag ORDER BY tag ASC

This will return a result set with all the different tags, counted, and sorted in alphabetical order. Here's how you'd turn them into a cloud. We're going to assume that the display page for any tagset is tagitems.php?tagvalue=[url encoded tag].

$query = "SELECT count(*) as num, tag FROM tags GROUP BY tag ORDER BY num DESC";
$result = mysql_query($query);
while($taginfo = mysql_fetch_row($result)){
     $numtags = $taginfo[0];
     $tagname = $taginfo[1];
     $tagurl = urlencode($tagname);
     $tagsize = 11 + intval($numtags/20);
     if ($tagsize > 24) $tagsize = 24;
     echo "<a href=\"tagitems.php?tagvalue=$tagurl\"><span style=\"font-size:" . $tagsize . "pt;\">$tagname</span> ($numtags)</a> ";
}

Now there are a couple of things we do here:

Because a tag might have spaces or other characters that could mess up a URL, we're creating the $tagurl version of the tag to use in the link to the tagitems.php display page. That converts the characters into code that won't mess up the URL. In the tagitems.php script, you'd just use urldecode to convert it back before using it for database queries.

Also, one thing tag clouds tend to do is make the text size correspond to the number of items to which the tag has been assigned. So with $tagsize = 11 + intval($numtags/20) we're setting a base text size of 11 point text and adding one point to it for every 20 items that use the tag. Then we use if ($tagsize > 24) $tagsize = 24 to set a maximum text size of 24 point text. If you want to change the numbers around, that's fine.

Showing all the tags for a specific item:

Now, let's assume that you've fed the script your item ID, made sure it was valid, and named it $itemid, we'd do this.

SELECT tag FROM tags WHERE item_id = $itemid GROUP BY tag

This is the simplest of all the queries. You actually don't need the GROUP BY in there, but I put it in just in case a duplicate tag slipped through. This will condense out any accidental duplicates.

Showing all items for a specific tag:

This is the hardest one because it requires a very simple join to get the item name from your items table. First we need to make sure you've fed your script the tag by some method and sanitized it with mysql_real_escape_string (and possibly stripslashes first). We'll assume that you've put the tag into $tagvalue, and in your items database, you've named the item's name field with item_name.

Again, this isn't how you must do it. But I have to establish some names for them for the sake of example and it's better than calling them John and Mary Smith.

Here's your database query:

SELECT items.item_name,tags.item_id FROM items,tags WHERE tags.tag = '$tagvalue' AND tags.item_id = items.uid GROUP BY tags.item_id

Since I was using fields from two different tables, you'll see I had to preface the field names with the table name. I also did the grouping thing again to warrant against any accidental duplication.

And There You Go

As with any project, you can make this more complex and more complicated, adding more rules and validation. To some extent, you probably should. There are bad people out there who may try to mess with your tag cloud for no other reason than they can.

But, as you can see, creating a tag cloud for your database of items is a lot simpler than it might seem on first blush. So if you're interested in doing that for your site, hopefully it's not so intimidating now.

  • Share/Bookmark

Comments 6 Comments »

Author's Note: Now the stuff below is not the code you'd use if you were building the next Digg. It's just a simple demonstration of the concepts. But if you're looking to start building your own rating script, it's an excellent start to get you familiar with how the most elemental functions work.

There are lots of pre-packaged scripts out there to do almost anything you want, but in my experience a lot of them either do too much or don't do enough. Finding that Goldilocks style of "just right" takes a lot of searching and experimenting, or just plain hacking of existing scripts. But for simple functions, writing your own script that does exactly what you want can often be quicker and more satisfying.

Rating is one of these things. What do I mean by "rating"? I mean giving people a selection of items and letting them rate them on some sort of scale: 1 to 5 stars, 1 to 10 points, Hot/Warm/Cold, etc.

Rating is a lot easier than you might think. While you can make it very complex, at its most basic, a rating system consists of three things:

  • A database table that lets you store a unique item ID and a score.
  • A script that stores those values in the table.
  • A script that shows the results.

If you want to get more complicated, you can add:

  • A database table to store information about the items.
  • Code in your storing script that makes sure the values being stored correspond to known items.
  • Code in your script that shows the results to display the item information.

If you want to get really complex, you can add:

  • Methods to prevent people voting for the same item more than once.
  • Methods to weight votes to make certain more active/trusted users' votes count more.
  • Ranking methods that use a formula based on the number of votes as well as the average to help an item with an average of 8.3 from 500 votes rank higher than an item with 5 votes, but an average of 9.

In this article, we'll leave the "more complicated" and "really complex" items aside and just concentrate on the basics.

So let's look at the three items in the simple approach.

Item 1: The Database Table

The table structure is simple...


CREATE TABLE `ratings` (
`uid` INT( 11 ) UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`item_id` VARCHAR( 30 ) NOT NULL,
`rating` TINYINT( 4 ) NOT NULL
) ENGINE = MYISAM ;

That's it. Your database table is created.

"uid" is just a unique ID for the database to use to keep everything in order. It's set to an integer with an autoincrement value, meaning you don't ever need to worry about it after creating the table.

"item_id" is a value of up to 30 characters that you can give to an item you're letting people rank. For example, if you call something "Cat Photo 1", you can give it that item ID in the database.

"rating" is the numerical value of the rating the user gives the item.

Item 2: The Script That Stores The Rating

To feed the data to the script, you'll need a form. Let's look at a form that lets your users give something a rating on a scale of 1 to 5:


<form method="post" action="store-rating.php">
<input type="hidden" name="item_id" value="cat photo 1">
<input type="select" name="rating">
<option value="1">1</option>
<option value="2">2</option>
<option value="3">3</option>
<option value="4">4</option>
<option value="5">5</option>
</select>
<input type="submit" value="Rate This">
</form>

Yup, that easy. Now we have to get the data from the form and put it into the database, using the store-rating.php script. First you'd put in the code that connects your script to the database. Then this is the code you'd use to store the rating.


$itemid = mysql_real_escape_string($_POST["item_id")];
$rating = intval($_POST["rating"]);
if(($rating < 1)||($rating>5)) die ("$rating is not a valid rating score");
$query = "INSERT INTO ratings(item_id,rating) VALUES ('$itemid',$rating)";
mysql_query($query);

Now, even though we're keeping this simple, we've put in a small bit of security. In line 1, we use mysql_real_escape_string to protect against someone trying to submit a MySQL command as an item ID (also known as a "SQL injection" attack). In line two, we make sure that the value put into $rating is a number. In line three, we make sure that the value of $rating is between 1 and 5 so someone doesn't try to sneak in a 0 or a 100 and skew the results.

Now the rating's stored. How do we put it to use?

Item 3: The Script That Shows The Results

This too is pretty simple. If you want to get the results for any item, you only need to do a simple query.

If you want to get all the ratings, you use this:


SELECT item_id,AVG(rating) FROM ratings GROUP BY item_id

That gives you a list of all the item IDs and their average scores. Let me break it down into more plain English:

  • SELECT item_id,AVG(rating) - Remember we created a table with three fields: uid, item_id, and rating. Remember also that I said we could safely ignore uid after that. All we care about is item_id and rating. So we're telling it to select those two fields from the table. Additionally, we're telling it that the rating value should be averaged from a group of values.
  • FROM ratings - This just tells it the name of the table it should use to get the values.
  • GROUP BY item_id - This is what makes the averaging possible. It groups all the entries with the same item ID making it possible for the AVG() function to average all the associated rating values.

You could spool them out with the following...


$query = "SELECT item_id,AVG(rating) FROM ratings GROUP BY item_id";
$result = mysql_query($query);
while($values = mysql_fetch_row($result)){
      $itemid = $values[0];
      $rating = number_format($values[1], 2, '.', '');
      echo "The score for $itemid is $rating.<br>\n";
}

That's pretty easy. MySQL does the averaging for you, so you don't need to do any complicated extra code. All you do is use the PHP "number_format" function to cut it down to two decimal places (so 3.571867453 will be displayed as 3.57).

If you wanted to get just the rating for "cat photo 1", then it would be...


SELECT item_id,AVG(rating) FROM ratings WHERE item_id = "cat photo 1" GROUP BY item_ID

The results returned would just be the one item instead of all of them.

And That's It

As I note at the beginning of this post, you're not going to be ready to go build your own social bookmarking site now. This is an introductory tutorial. But hopefully the idea of building your own rating script looks a lot more possible than it did before you started.

  • Share/Bookmark

Comments 4 Comments »

While working on some projects, I wanted to develop a comment engine, so people could comment on posts. I know, I could just use a blog, but a blog wouldn't have fit these applications.

Additionally, I wanted comment threading and to be able to sort the hierarchy simply, using a minimum of RAM and CPU cycles.

For those of you who don't know what threading is, it's organizing articles/comments/entries in a hierarchy based on their relationship to a prior article/comment/post. For example:


|- Comment 1
|---- Comment 4 (reply to comment 1)
|------- Comment 8 (reply to comment 4)
|---- Comment 6 (reply to comment 1)
|- Comment 2
|---- Comment 3 (reply to comment 2)
|------- Comment 7 (reply to comment 3)
|---------- Comment 9 (reply to comment 7)
|- Comment 5

Now, so long as you know the parent (Comment four is the parent of Comment eight), you can recurse the comments and create a hierarchical array. But it's inefficient both on the PHP back end (recursion is laborious), and in terms of the SQL query that may have to pull way more comments than you want to display.

So, while it's a somewhat simple solution in terms of storing the comments, it's a memory and processor hog in terms of getting them out and displaying them. I was sure there had to be something more efficient.

I read about presort order traversal, recursive processes, and all sorts of stuff that made my head ache. Then, I *finally* saw a solution that was pure genius. Creating a "lineage string" for each comment.

Lineage is the parentage of the comment plus the comment's own unique id, using an autoincrement to create unique IDs. Parentage is the lineage of the parent. Let's create a super-simple table and show how this works.


CREATE TABLE `comments` (
`uid` INT( 4 ) UNSIGNED ZEROFILL NOT NULL AUTO_INCREMENT PRIMARY KEY ,
`parentage` VARCHAR( 220 ),
`content` TEXT NOT NULL ,
`post_id` VARCHAR( 10 )
) ENGINE = MYISAM ;

Here's the threaded list above with the parentage strings you'd store in parentheses. For each comment, if it wasn't a reply to another comment, it would have no parentage. If it was a reply to a comment, it would have that comment's parentage with that comment's own ID appended to the parentage.


|- Comment 1 ()
|---- Comment 4 (-0001)
|------- Comment 8 (-0001-0004)
|---- Comment 6 (-0001)
|- Comment 2 ()
|---- Comment 3 (-0002)
|------- Comment 7 (-0002-0003)
|---------- Comment 9 (-0002-0003-0007)
|- Comment 5 ()

Now, if you just sorted by parentage, your list would not really be threaded. It would sort as follows.


Comment 1 ()
Comment 2 ()
Comment 5 ()
Comment 4 (-0001)
Comment 6 (-0001)
Comment 8 (-0001-0004)
Comment 3 (-0002)
Comment 7 (-0002-0003)
Comment 9 (-0002-0003-0007)

For this to sort properly, you need the "lineage" (parentage with the comments own uid appended). Then you get...


|Comment 1 (-0001)
|--Comment 4 (-0001-0004)
|----Comment 8 (-0001-0004-0008)
|--Comment 6 (-0001-0006)
|Comment 2 (-0002)
|--Comment 3 (-0002-0003)
|----Comment 7 (-0002-0003-0007)
|------Comment 9 (-0002-0003-0007-0009)
|Comment 5 (-0005)

Now they sort properly.

Storing the full lineage string as an entry in the table itself is difficult and creates other problems, because you have to know the comment's ID when you store it. But the lineage is just the parentage and uid separated with a dash. So combine them during the SELECT.

SELECT *, CONCAT(parentage, '-', uid) as lineage FROM `trees` WHERE
post_id = "post50" ORDER BY lineage ASC LIMIT 0,5

You get the following selection (lineage in parentheses)...


Comment 1 (-0001)
Comment 4 (-0001-0004)
Comment 8 (-0001-0004-0008)
Comment 6 (-0001-0006)
Comment 2 (-0002)

When you want to show the next 5, you merely adjust the limit.

Spooling them out would be a simple matter of counting the dashes in the lineage string to determine the indentation level.

Now, of course, if a record was deleted and the autoincrement values reindexed, your whole table goes to Hell. So you have to be careful that doesn't happen.

And, with a 4-digit zerofill, as used above, you're limited to 9999 comments in the database before things break. But I just used 4-digit to keep the examples clean.

An important thing to understand with this method is that it only works for one threaded sorting method, oldest to newest. You can sort in other ways unthreaded, but it only threads and indents correctly in oldest to newest.

  • Share/Bookmark

Comments 3 Comments »

Get an angel for your site An Angel Watches Over This Site