Posts Tagged “google”

ampersandIf you're studying up on SEO, you'll find that one of the primary recommendations is to validate your HTML, and this is so Google can understand it. This is all well and good when it comes to your HTML tags, but what about HTML entities?

HTML entities are special codes for special characters like the copyright symbol (©) and unlike tags, they're not bracketed. If you want to insert a copyright symbol, you use &copy; in your body text. On the other hand, if you wanted to make it bold, you'd have to put bracketed tags around it like so: <b>&copy;</b> (©). Read the rest of this entry »

Share

Comments 3 Comments »

As part of my "Search Engine Optimization" for the clip art library on my art site, I've been exploring using a sitemap to help ensure that all my pages got indexed.

My main stumbling block is that I have a lot of pages generated from different databases, plus a number of static pages. I thought it would take a long time.

Imagine my relief when I read deeper on Google's page and found this quote. "This program does not replace our normal methods of crawling the web. Google still searches and indexes your sites the same way it has done in the past whether or not you use this program."

If you've got a lot of pages that are four or five clicks away from your home page, Google's less likely to find those pages and get around to crawling them. Their logic is that if they're that buried, they must not be that important. So those pages will be found more slowly and crawled less often.

What a sitemap does is sort of let those pages say "hey, here I am and I'm more important than it seems" to the Google spider. So for your pages that are one or two links away from the home page or get direct links from other sites, getting them in the sitemap may be of middling value, because the spider will find them easily and regularly anyway. But for the pages that live deeper in your directory structure, a sitemap may be a great tool for getting them a little daylight.

For me, that was a lot of relief. The scattered static pages, despite being harder to round up and put in a sitemap are generally easier for Google to find. The pages that are harder for Google to find are generally easier to generate a sitemap file for. So, if I want to generate a quick and dirty sitemap with the more buried content, I can do it without harming the SEO on the content I left out.

Nice.

Share

Comments Comments Off

I've been reading different articles about what elements of a page Google indexes with an eye toward whether they index content that's added to the page via the JavaScript document.write() method. Not getting a conclusive answer, I decided to do my own test.

Why was I interested? Well, with all the "Web 2.0" technologies that rely on JavaScript (in the form of AJAX) to populate a page with content, it's important to know how it's treated to determine if the content is searchable. If it's not searchable, then it's not having an impact on search-driven traffic.

The test page had three pairs of nonsense words that, at the time of its creation, generated no hits in a Google search. Two were placed in the page via straight HTML. Two were placed in the page via a JavaScript that was part of the document. Two were placed in the page via a JavaScript on a different server that was sourced from within the page (<script type="text/javascript" language="javascript" src="URL to script on other server">).

The page was linked from a sitewide footer to ensure that Google found it, and was posted and linked on the evening of March 7th. Google alerts were set up for one word from each pair so Google would notify me by e-mail when it spotted a page containing those words.

An alert came in in the late evening of March 10th for "zonkdogfology", one of the words in the first pair (part of the straight HTML). By the time I got online in the early afternoon of March 11th, it was part of the Google index and a search for it turned up the page as the sole result.

I then searched for each of the six words at Google.

  • The two HTML words both generated a search result that included the page.
  • The two words inserted by a JavaScript in the page generated no search results.
  • The two words inserted by a remotely sourced JavaScript generated no search results.

Now, it's too early to say conclusively that Google will never index the JavaScript-generated content, barring a change in their search/indexing algorithms. I'll continue to monitor the situation over the next two weeks to give Google time for any secondary processing and distribution to all their datacenters. It is worth noting though, that at least in the immediate term, content in your pages that is made part of the page via JavaScript document.write statements will not be searchable in Google.

GOING FORWARD: Over the next two weeks, I'll be watching to see two things. First, does the indexing change so this page shows up in searches for the four JavaScripted words? And second, how long does it take for MSN and Yahoo to pick up the page and how do they treat it?

Stay tuned.

Addendum: People have been asking why you'd want to index dynamic JavaScripted content... Look at the dozens of comments on this article. They're all going to be indexed by Google because the inclusion is server-side. That's got some value. Comments in general don't just enhance the user experience, but add indexable content to your page and can organically increase your keyword density.

If you're using an AJAX powered comment module, particularly one that's remotely hosted like JS-Kit, then it's important to know what you're getting and what you're losing. Yes, you may be adding functionality to your page easily and enhancing the user experience, but if you don't get the comments indexed, you lose all that juicy keywordy goodness.

Given, I didn't do a heavily AJAXed test with nodes and other constructs. I decided to do the most simple construct... document.write(). I may do other tests in the future. But this was a good place to start. See, in both instances of the JavaScript inserted words, they were included in the scripts as discrete strings. If Google merely indexed the page and made the script text part of the searchable index, the two words from the script that's hardcoded into the page would become searchable. If it read the remote script and indexed it in the same manner, we might see those last two words showing up either in the test page or get the remote script as a hit.

Share

Comments 88 Comments »

Get an angel for your site An Angel Watches Over This Site