Does Google Index Dynamic JavaScripted Content?
Mar 11th, 2007 by Greg Bulmash
I've been reading different articles about what elements of a page Google indexes with an eye toward whether they index content that's added to the page via the JavaScript document.write() method. Not getting a conclusive answer, I decided to do my own test.
Why was I interested? Well, with all the "Web 2.0" technologies that rely on JavaScript (in the form of AJAX) to populate a page with content, it's important to know how it's treated to determine if the content is searchable. If it's not searchable, then it's not having an impact on search-driven traffic.
The test page had three pairs of nonsense words that, at the time of its creation, generated no hits in a Google search. Two were placed in the page via straight HTML. Two were placed in the page via a JavaScript that was part of the document. Two were placed in the page via a JavaScript on a different server that was sourced from within the page (<script type="text/javascript" language="javascript" src="URL to script on other server">).
The page was linked from a sitewide footer to ensure that Google found it, and was posted and linked on the evening of March 7th. Google alerts were set up for one word from each pair so Google would notify me by e-mail when it spotted a page containing those words.
An alert came in in the late evening of March 10th for "zonkdogfology", one of the words in the first pair (part of the straight HTML). By the time I got online in the early afternoon of March 11th, it was part of the Google index and a search for it turned up the page as the sole result.
I then searched for each of the six words at Google.
- The two HTML words both generated a search result that included the page.
- The two words inserted by a JavaScript in the page generated no search results.
- The two words inserted by a remotely sourced JavaScript generated no search results.
Now, it's too early to say conclusively that Google will never index the JavaScript-generated content, barring a change in their search/indexing algorithms. I'll continue to monitor the situation over the next two weeks to give Google time for any secondary processing and distribution to all their datacenters. It is worth noting though, that at least in the immediate term, content in your pages that is made part of the page via JavaScript document.write statements will not be searchable in Google.
GOING FORWARD: Over the next two weeks, I'll be watching to see two things. First, does the indexing change so this page shows up in searches for the four JavaScripted words? And second, how long does it take for MSN and Yahoo to pick up the page and how do they treat it?
Stay tuned.
Addendum: People have been asking why you'd want to index dynamic JavaScripted content... Look at the dozens of comments on this article. They're all going to be indexed by Google because the inclusion is server-side. That's got some value. Comments in general don't just enhance the user experience, but add indexable content to your page and can organically increase your keyword density.
If you're using an AJAX powered comment module, particularly one that's remotely hosted like JS-Kit, then it's important to know what you're getting and what you're losing. Yes, you may be adding functionality to your page easily and enhancing the user experience, but if you don't get the comments indexed, you lose all that juicy keywordy goodness.
Given, I didn't do a heavily AJAXed test with nodes and other constructs. I decided to do the most simple construct... document.write(). I may do other tests in the future. But this was a good place to start. See, in both instances of the JavaScript inserted words, they were included in the scripts as discrete strings. If Google merely indexed the page and made the script text part of the searchable index, the two words from the script that's hardcoded into the page would become searchable. If it read the remote script and indexed it in the same manner, we might see those last two words showing up either in the test page or get the remote script as a hit.
[...] Original post by Greg Bulmash [...]
thank you for that, that's an interesting piece of research.
I've been dabbling with search engine technology myself, and this was one of the major stumpers, in order to really know what a javascript does you have to *run* the bugger, there really is no other way.
DHTML/AJAX are quickly becoming so prevalent that it will not be long before google will have to do something about this or they risk losing a very important part of the web as 'dark'.
thanks again, & best regards
Jacques Mattheij
ps: picked you up through ./'s firehose...
I think the words will show up after they have been added to the crawling pile. Sometimes it will take another 2 weeks before those links are visited.
Ed said: "think the words will show up after they have been added to the crawling pile. Sometimes it will take another 2 weeks before those links are visited."
I thought that might be the case, which is why I'm monitoring this for at least two more weeks. I also want to see how Yahoo and MSN Live treat this. Though they're not quite as big as Google when it comes to referring traffic, it's worth watching.
- Greg
"Going forward"...do you mean "in the future"? That aside, good bit of experimental work.
OK it was good of you to share these findings. In my experience straight html with uncomplicated url's are always the best indexed by any search engine.
This is the classic Halting problem outlined by Turing -- Google can't really index *everything* that's dynamically generated without running it -- and that could be dangerous. I'm sure some hacker would quickly code up an ECMAscript DoS attack on Google, waiting for their spiders to fall into the trap.
neat experiment. but, what sort of content needs to be inserted into your web pages using javascript, such that this content *should* be indexed?
i think it's fair to say that DHTML web applications shouldn't be indexed like documents. they have a potentially infinite number of states and the information on these states isn't really aligned with the semantics of searching the web, which is geared towards finding documents.
there's also an issue of fairness. if you're sourcing content from another site, then doesn't that content belong to *that* site, and thus should only contribute to that site's placement in the index?
finally, if both sites belong to you, then wouldn't you rather have the content incorporated on the server side for the sake of efficiency anyway?
interesting experiment, though most people don't use document.write these days, so I'm not sure what this shows. most dynamic sites tend to use innerHTML or dom methods to create nodes and content....either way, the crawler would basically have to be a full web client to support this kind of thing in any reasonable way...
I don't know if this is a big problem at all, or if it is just a quirk of the wayt Google's algorythms work. If I recall correctly, the philosophy behind Google's search is to depend less on Google's own interpretation of the content of the website, and more on what other sites say about your website.
zonkdogfology
I think you just undid your googlewhack.
Very interesting study. The results are no wonder, but a practical proof and study was in need.
Thanks
[...] Many new sites are adapting the new technologies that makes it easier and faster for the end user to browse their site. Gmail is the best example of useful application of the AJAX technology. You don’t need to refresh the whole page to search, browse, open nor send emails. So good that it became even better than using Outlook. However care should be taken when you apply those kind of technologies. They are good for the user who is already in your site, it is useful for specific kind of applications, but what about people looking for your site. How will they find it? Assume you have a rich news site, and it’s so fast and easy to use. All content is generated through AJAX. Users find it so easy to pick what they want. But despite you may have the best piece of news, people searching the Google will not find the results from your site because they are all dynamically generated using AJAX, and web spiders usually aren’t that smart to index AJAX generated content. Someone did practical test on Googlebot and how it indexes normal content, Javascript generated content, and AJAX content as well. See the results here. Thanks for Slashdot for posting the URL. [...]
@sam:
so long as neither of the other 2 pairs are copied to some html-coded site, the test environment will not be compromised.
unfortunately, searching for 'zonkdogfology' leads us to the site in question, and therefore the other 2 pairs.
best to setup a clean environment to test and report in a few weeks. perhaps even a rolling system w/ new words each day, then we'll know when things change.
No Google (googlebot) does not index javascript in anyway, nor css stylesheets (display:none?) Nor do other popular search engines. End of story.
If Google was to decide to create a javascript parser bot (damn complex when you think of all the libraries such as jquery/prototype) they would be supporting the so called web developers who are doing it 'wrong' in the first place. Javascript must be implemented unobtrusively which means the site is fully accessable to people without javascript enabled browsers (Google, Braile for the blind, and about 4% of neticens). This is in 99% of cases implemented simply by placing your javascript on the document.load event and using it for after affects, ajax requests must always be backed up with a functional form post URL also).
None of the major Web search indexers run JavaScript embedded in your page. They probably don't even download the JavaScript. So, as you discovered, text generated by JavaScript won't be indexed by Google, Altavista, MSN, etc. It also usually won't be read by people using a text reader, e.g. blind users, which can be a difficulty for Web developers required to make pages that are accessible, e.g. for the US 508 legislation. Also be aware that some corporate (or educational) firewalls block JavaScript files, and/or disallow execution of JavaScript for the users in other ways.
Liam
An easy way to understand what a spider sees is to turn off javascript. If your site is heavily dependent on javascript to generate content, you'll see big holes in your pages where content used to be. What's left is what gets added to the Google index.
Google's inability to crawl javascript extends beyond doc.written content. For instance, using javascript to link to another page via window.open presents its own problems. Google will not follow this link and the linked page will not be added to the index. You run the risk of hiding entire sections of your site because of javascript linking.
There are less than ideal workarounds to this—like publishing a site map containing every destination. Sometimes you don't have control of the resulting HTML and a site map is a cheap fix. It won't help your PageRanks, but registering the site map with Google will give it a cue to discover pages it otherwise wouldn't find on its own.
I'm not against javascript linking per se but I would recommend augmenting your onclick handler with a conventional "blue link" to the page in question.
Change:
a href="#" onclick="window.open('page.html');return(false);"
OR
a href="javascript:window.open('page.html');return(false);"
To:
a href="page.html" onclick="window.open('page.html');return(false);"
Google will find page.html and you still get to keep your onclick javascript handler.
How exectly did you setup the Google alert to get informed when the Googlebot visited?
I used a jha menu for some time on my website. For first I encountered some problem in indexing. My page slowly go down in the google rank.
I Tryed than a flash menu controlling the main page.
Here you are the sample menus available. http://www.nicolamarini.it/pagine/news.htm
A solution for the ranking was to add a link to the "hidden" content on the page and than hiding those links with css, but the problem persist when you have a dynamic site with java generated links.
So (for now) i prefer not using JavaScripted content to creating critical link to other pages inside the same internet site.
Not surprising but like Bashar said, the proof is useful.
Actually I am not sure if it makes sense for google to index the content generated using document.write... Let's consider this:
document.write(navigator.userAgent);
What should google index in this case? Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; InfoPath.1) for IE7 or maybe Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1 for Firefox 2?
One more reason why AJAX is a hack...
Do server side XSLT and some decent DHTML where it makes sense, and life will be a lot better.
This simply proves that google has indexed the strings in the HTML (JS) output that are on the page.
Now, if you echoed the letters separately onto the screen
document.write 'd';
document.write 'o';
document.write 'g';
document.write 'f';
document.write 'u';
document.write 'd';
document.write 'g';
document.write 'e';
and google indexed dogfudge, THEN I would believe that google is executing the javascript, instead of indexing it unparsed
The basic rule still applies then, look at it in a text browser (lynx) and see what you get, that is what google sees. And interestingly is a fair approximation for what a text reader will say. Keep all your core content standards compliant and you can't go far wrong with Google.
If you've got the kind of dynamic content that you want to be searchable then it's probably also the kind of content that you want to be bookmarkable.
Surely then the goal would be to make sure you implement a unique URL for each document state. This way you get the triple whammy:
1. Search engine visibility
2. Back button support
3. Bookmarkability and permalinks
Really Simple History seemed to do the trick: http://codinginparadise.org/projects/dhtml_history/README.html
Safari needs to be fixed to work with it but as far as I know Safari doesn't support any dynamic back button solutions .
Until someone runs the javascript new content is not an offcial part of the page so it won't be indexed... I don't think google will start running javascript since it is a security problem (and usually requires user input).
Btw if you want to add something dynamically to your page (with javascript) and want it indexed try generating html (with php or something like it) which will have a javascript data structure in it and just reference it from javascript code... This way data will be part of html during indexing (but you loose real dynamics)
w3c is phasing out document.write and document.writeln, you should be using xml nodes.
I think using innerhtml maybe frowned upon
It is also worth noting that the Google Cache of the page does not show either of the javascript interjections in it's "text only" mode.
Looking at how other cached pages appear in text-only mode leads me to believe that the text-only cache is the page in the form Google digests it. If you look at the page source, you see that full comments from the original page are preserved, but many tags have been stripped out. Some of the preserved comments contain javascript code, as if they were placed there to protect non-javascript aware browsers from the code, but the script tags which would surround it have been stripped.
Cool.
That's a cool idea, thanks.
Here's a thought though: if in two weeks you do start seeing the javascript words on that indexed, you won't know if it's from spidering that specific page and seeing the word there or from someone else's link to your now-public test page if they use the specific nonsense word in their link to you.
Hm,
that was to be expected, don't you think so? To find the words that are emitted by JavaScript the Google-Bot needs to behave like a browser, that means the bot needs to download the HTML *and* the JavaScript *and* to execute the JavaScript. And now the mess starts: is your script a clutter of "if (browser.name() ... browser.version())"? If so which browser do you expect the Google-Bot to emulate?
Regards,
Angelo
Good work, Greg. It's one of those things that I've sometimes wondered about, but never bothered to check
Russtopia said:
"This is the classic Halting problem outlined by Turing — Google can’t really index *everything* that’s dynamically generated without running it — and that could be dangerous. I’m sure some hacker would quickly code up an ECMAscript DoS attack on Google, waiting for their spiders to fall into the trap. "
Shoot me if I'm being foolish, but I don't think this should be a problem. I think Google should write a simple javascript parser which searches for javascript strings, and adds them to its index.
I don't think it really matters when/how/why the strings gets document.write()-ed. Of course, small words and phrases which aren't part of the real content (eg: if (something_bad) { alert("Don't do that"); } ) would get indexed as well, but that probably won't affect results too much. After all, a lot of static pages have misc. text which has no bearing on the pages content.
If Google was really feeling energetic, they could make a simple javascript parser which creates a flow-tree (perhaps limiting the max depth of the tree to prevent infinite loops and abuse). They could use the tree to find out which strings actually get printed to the page, and which are used internally by javascript (eg: document.getElementById("this string is never printed") ).
Anyway, that's just my $0.02.
Cheers, Colin
Russtopia; one compromise that won't expose Google to that sort of risk is to do a very simple parse/pseudo-execution of the Javascript. This would take into account all content generated by document.writes with static content, and by those with trivial initialised-string content (e.g. var a = "message", later on document.write("Hello, I wanted to say " + message), etc.
How far this can go is up to Google; obviously it won't cover everything.
Very interesting observation indeed! In the era of web 2.0 ignoring javascript will be a crime!
Hey,
I think you'll find the script source will get indexed but I don't think it will get indexed with the page thats calling it. I have been playing about with a counter site. If you do a search for little counter in google you'll see that indexed is http://www.littlecounter.com/client/counter.php?clientID=9264 which only reference is a javascript source url.
Great work btw I hope to see the rest of the results soon.
Great work. This was great expirement to implement.
I agree with you that overtime/near future Google will start inspecting the content of all tags.
Keep up the good work.
Saeed
I tried hooking up adsense (with a new account) to an AJAX-laden site at one point. I did a trick to "refresh" the Google script when the content was updated each time. At first, the ads were appropriate to whatever dynamic content was present. After a couple of days that stopped and I got generic ads. The explanation from Google was that while the crawling index was being built, it was working off of whatever it happened to find when the script executed, but gradually switched over to their crawled index. Funny way of doing things. Clearly Google CAN handle dynamic content since it just scans the DOM, but they choose not to. In fact, it was politely pointed out that what I was attempting was a violation of their ToS.
Nice and useful. Can you do one for text that's hidden or in a div with display:none? I don't show all the content on a page at one time because it takes too much room. I'm guessing it doesn't get indexed, but I'd like to know for sure.
Very interesting study! One idea I embrace when coding dynamic/AJAX pages is that of extending functionality that's already there. This means that instead of relying on javascript to generate the 'base page', I have that pre-built in html. Then I use javascript to remove elements and replace them with dynamic sections. This ensures not only that people who have javascript disabled can use the site, but also that Google and other search engines will be able to index its content.
One other item to mention: you can still use meta tags to keyword your page and provide a limited sort of access to the data that would otherwise be dynamically generated in it.
As of 6:46 AM on March 12, [zonkdogfology] returns zero results at search.yahoo.com, search.live.com, and ask.com.
AOL search finds the word, but their search is powered by Google.
Looks like Google's indexing is a good bit faster than the other guys.
Now that you have some publicity, you have to be careful that no one links to your test site using the JavaScript-created words -- otherwise the JavaScript-created words might turn up your test site, even if Google ignores all JavaScript.
Instead of pending an experiment such as this - why not just ask google?
This would require at minimum a JavaScript lexer so it can find all of the strings. This would not be too difficult and would be O(n). It would not be able to distinguish real words from something like "SWITCH" "FIRST" "SECOND" if you're using them as defined constants (which I like to do in JavaScript, because nobody cares if JS is a little inefficient). Though google might ignore all strings of one word and all caps.
The other problem involves javascript that constructs strings. Say "one " + "day, " + "I " + "woke " + "up.". You cannot expect google to try to do stuff like this. Maybe a few special cases (the example I gave was simple), but to try to execute code in all ways imaginable so you can find out if there are other strings that can be derived is an NP-hard problem. Google should not try to do this. They should say that if you want your javascript to be used for their search engine, then make your strings easy to spot.
@Russtopia: Turing didn't just 'outline' the problem, he proved it insolvable.
And, more importantly, running JavaScript for output isn't necessarily the halting problem, viz. we don't really care whether the JavaScript halts, but rather want to see some of the content on the page. While it is impossible to wait and see 'all' output, as tormp points out that the number of states of a DHTML may be practically infinite, or at least as large as your database, there's nothing to stop Google from running, say, 500 or 1000 evaluation steps on the JavaScript to see if there is any immediate output.
Since Google could easily control the number of evaluation steps taken in the Googlebot's JavaScript interpreter, I hardly think they're at risk for a DoS.
The logic/premise of this article is interesting but flawed to a tiny degree. Any properly coded Ajax/document.write page should be coded in a degrading manner so that if the client does not support javascript, the page will still be completely accessible.
If this is the case, when the http page is served to google it will serve the degraded ajax-less version of it, detecting that "googlebot" isn't a proper client.
And as "8. Greg" said, to capture all the javascript would require a full client on googles end and it could no longer simply crawl the web.
What's more, is that when you get into JavaScript code, there could be a virtually endless potential of code to be dynamically written. There would be no way to capture and accurately display all of the code, especially if by executing 10 different js events, the page will be displaying 10 different sections of content. And what if on top of those 10 sections, each section has 2 modifiers? Now we're up to 30 inline client-based renders for one page... It's just too much.
If it's a developers concern, they just need to write better code.
I knew it!
I wanted to do a similar test a long time ago.
Thanks! A useful test that I have been meaning to get around to doing for some time, but hadn't!
It comes as no great surprise that Google doesn't do much to 'test' Javascript for output (because it would be extremely dangerous to do this arbitrarily, imho), but it's still a worthwhile experiment.
More to the point, it highlights the issue for us developers, and gives us clear guidance on what to do if we absolutely have to remain 'visible'. KISS priniciple, really, innit?
Well done, and thanks again.
Most badges and tag clouds use the document.write method of inserting data into the page. I've had a site up since the beginning of the year and the document.write content has yet to be indexed.
hard to believe, but it's still the only page that devotes itself to the study of zonks, dogs and fos. googlebot seems to be pretty resilient to slashdotting.
You may be interested in a paper that Tim Berners-Lee and I wrote on choosing the right language to use for your Web content. It makes the point that putting your content into imperative languages like JavaScript is going to greatly reduce the chance that the information can be repurposed, whether as input to a search engine crawler, or for other good uses, when compared to putting the same content into a declarative language such as HTML .
See the W3C Technical Architecture Group (TAG) Finding titled: The Rule of Least Power at http://www.w3.org/2001/tag/doc/leastPower.html .
Noah
Google should be in the business of finding what is there not imposing its will upon what it is trying to "search". I like Google as a tool, not as a master.
Greg is right, normal people don't use document.write. Actually, I believe it's essentially deprecated. Creating dynamic nodes is the most supported way to fill in ajax content, followed closely by innerHTML.
Finally, all content on your page should be fully readable with javascript disabled (without any extra code if you coded your server-side code and DHTML properly). Obviously functional javascript applications, like GoogleMaps, are exempted.
I don't have much sympathy for "dynamic" sites that aren't indexed properly because of poor implementation and js hacks.
[...] Probably something most of us figured but Greg Bulmash did some test to answer the question, Does Google index dynamic javascripted content. The test page had three pairs of nonsense words that, at the time of its creation, generated no hits in a Google search. Two were placed in the page via straight HTML. Two were placed in the page via a JavaScript that was part of the document. Two were placed in the page via a JavaScript on a different server that was sourced from within the page. [...]
[...] In the new age of Web 2.0 development, often times you come across sites that are beautiful to look at and have some fairly complicated AJAX interfaces. Tools like Scriptaculous and JQuery almost make it too easy to incorporate this type of functionality into your your site. However, today I read a post from Brain Handles where they experimented with dynamic JavaScript content, specifically adding text to the page via document.write commands, to see whether Google parses the content. The results are quite interesting from an SEO standpoint… [...]
Even if Google does eventually index those words, I think it is still interesting from a search engine optimization standpoint that those words are not indexed as quickly as the others, or even on the same time frame for that matter.
I seem to remember that the googlebot does not enable client-side javascript. If so, then content that is generated via javascript upon loading is then not seen by the googlebot at all.
[...] Here’s an interesting article discussing whether the googlebots are able to index content published to a page using javascript. [...]
Yea Javascript sucks developers should really cut back on using it
it has become some type of showmanship to produce CMS/blog software
that depends on CSS and DHTML
its really stupid and really ugly stuff
just take a look at Drupal's website
even their own main pages have positioning problems
its cute to have dropdown menus and pages that will expand
to fit a browsers width
but seriously when you have stacks of boxes on the page
with content in them and they are all jumbled up unless you
open your browser full screen
I mean why even take time to design that
lets all go back to HTML 1.0 and do away with everything
that has become good design standards
DHTML and Javascript should be restricted to situations
where there is no other way to do things
not only because of cross browser problems
but just because its good design
Just because you know how to fireup a Gui Editor
and put a hundred DHTML boxes on your page
doesnt mean you know how to design an attractive site
that has good function
[...] via brainhandles [...]
You will never see js output in Google's cache, because Google will never run the javascript on your pages. Here's why:
indexing speed/cost.... google can't really afford to run everyone's slow ass javascript...it would slow down indexing a LOT. Javascript is slow as hell to begin with due to the fact that it's interpreted...add the fact that most js code out there is crap and can't be trusted to not go into infinite loops, etc... and there's no way google can afford to execute your js.
security .... google doesn't want to run random javascript code on their servers...who knows what that code might try to do. Sure you can sandbox the javascript engine, but unknown exploits could make google very vulnerable.
Googles indexing is based on the text it really touches. When the indexing is done it crawls the available content, which happens to be the html text. On the document writing part by javascript, Google ignore just for the simple reason that it may not be the basic part of the crawling page, and if it is then it may be some other stuff, may be some other link .... which is not relevent to the searching keyword.
George Lee's blog: 利用 Google 和 Slashdot 獲利的方法
剛剛看到一個 slashdot 新聞:Googlebot and Document.write,有一個人在好奇 Google 到底會不會把 javascript document...
As a proud owner of a zonkdog, i'm glad to see that i'm not alone in my love of study of zonkdogs.
Perhaps we could meet up and discuss the finer points of zonkdogfology sometime.
i have some interesting slides.
[...] Un geek di livello Omega ha fatto la prova. Ecco il link. [...]
you should so a similar test but more about content on submit, so when someone clicks a link and the content is put in the page, as this is how most 'Web 2.0' sites work. See if google reads that content. + Does the google search crawler/spider/bot have javascript parsing ability?
[...] Brain Handles’ Greg Bulmash experiments with this long-asked question. [...]
This is why fallback on ajax matters, if the content is generated by server side anyway why not make a link to it like the traditional way and only use javascript to fetch the content when javascript is available.
Going back to the article, google will index content even on ajax based side if they provide fallback for it.
Fallback matters, not every device/browser supports javascript.
We use a really complex templating engine and actually have two versions of the website, one for the user (which is laced with web2.0 goodness) and another for spiders which, in lieu of ajax just has the content rendered out for indexing...
works a treat!!!
just for the record,
I wouldn't ever expect the likes of google to index javascript content, its up to the developers to ensure their applications are as compatible as possible and are index according to their standards, expecting a company like google to "get with the times" is quite an ask really!
[...] No, there is a reason for this post. I stumbled across an article on slashdot, that asks Does Google Index Dynamic javascript content, posted by Greg from brainhandles.com? In this post, the author made a test page with a few different nonsense words on it which generated no hits on Google. Unfortunately I feel his test will now be somewhat invalidated mainly due to buggers such as myself putting his nonesense words into Google. [...]
[...] Link: http://www.brainhandles.com/…(via Slashdot) [...]
[...] Back in March, I did an experiment on whether Google indexes content inserted into your page with JavaScript. Weeks later, the results are conclusive… no. The words inserted into the test page via JavaScript were never indexed. Lots of pages talking about the odd words I used come up in searches for them, but the page I created doesn’t come up in the Google results. [...]
Google will probably not change how they treat JavaScripted content because they "have to" or because good content might be hidden there.
For years, they were not indexing frames - as if it was just technologically impossible. Javascripted content is dangerous for them to run, but also may contain a large amount of SEO spam.
[...] In the new age of Web 2.0 development, often times you come across sites that are beautiful to look at and have some fairly complicated AJAX interfaces. Tools like Scriptaculous and JQuery almost make it too easy to incorporate this type of functionality into your your site. However, today I read a post from Brain Handles where they experimented with dynamic JavaScript content, specifically adding text to the page via document.write commands, to see whether Google parses the content. The results are quite interesting from an SEO standpoint… [...]
There are absolutely instances where Google does indeed index javascript content. This is most obvious in backlinks. If a site has a URL string to your site in a script tag, Google WILL index it, and can actually count it as a link to your site.
I won't promote any sites here, but this can be easily verified with a few "link:" lookups using a few SEO websites as your target.
[...] I have found an article with an experiment on javascript generated content (AJAX) indexing. As I already knew the Google does not index such content. [...]
That's really bad news.
As for dynamic content which needs indexing, a simple example is source size reduction by making table template exporters.
For instance, I have a JS script which is given a matrix of values and calculates "row/colspanness" automatically and conditionally outputs styles and the table via doc.write.
That saves some bandwith and it's WAY better than specifying layout and structure directly in code.
Its ok if search engines cannot execute Script to display the link from document.write(), but i was wondering if they display results from links inside the js sourcecode?
Like ($link="http://www.domain.com")
i think it's fair to say that DHTML web applications shouldn't be indexed like documents. they have a potentially infinite number of states and the information on these states isn't really aligned with the semantics of searching the web, which is geared towards finding documents.
This is the classic Halting problem outlined by Turing -- Google can't really index *everything* that's dynamically generated without running it -- and that could be dangerous. I'm sure some hacker would quickly code up an ECMAscript DoS attack on Google, waiting for their spiders to fall into the trap.
[...] a discussion on the issue of Java and Google, along with a [...]
Both of the words that have been generated by Javascript on the local server *are* being found by a Google search now. I just read this article now for the first time so thought I'd point it out. I wonder if the page just needs to be marked as 'good' before it's allowed or something similar?
@Dan,
I went and checked, and they are showing up now. Furthermore, for the one I checked, the pages linking to my test page did not use that word to link to it.
I'll need to do a second test to see if this is perhaps a change in Googlebot's default behavior or if you're correct and it's that Googlebot can/does do it, but the page has to pass some tests which may take a while.
I try to use javascript with a Flash which pulls content from plain html files.., since the flash resides on index.html, i need a way to change the nonflash content, so that google sees and indexes it. I use a technique called swfaddress (http://asual.com/swfaddress/ ) which provides deeplinking for flash using anchors, and was trying to make javascript look at the url and anchor, and then load stuff depending on the anchor link (ie. if the user, in this case google, don't have flash installed). I guess the only way to make this happen, while preserving googlefriendliness, would be using a serverside language like php. Any ideas?