Home » Articles » Search engine optimization |
|
Robots.txt file guide By David Callan It's for this reason that some people like to optimize pages for each particular search engine. Usually these pages would only be slightly different but this slight difference could make all the difference when it comes to ranking high, however because search engine spiders crawl through sites indexing every page they can find they might come across your search engine specific optimized pages and notice that they're very similar. Hence the spiders may think you're spamming and will do one of two things, ban your site altogether or severely punish you in the form of lower rankings. What can you do to say stop Google indexing pages that are meant for Altavista, well the solution is really quite simple and I'm surprised that more webmasters who do optimize for each search engine don't use it more. It's done using a robots.txt file which resides on your webspace. A Robots.txt file is a vital part of any webmasters battle against getting banned or punished by the search engines if he or she designs different pages for different search engines. The robots.txt file is just a simple text file as the file extension suggests. It's created using a simple text editor like Notepad or Wordpad, complicated word processors such as Microsoft Word will only corrupt the file. Here's the code you need to insert into the file: Red text is compulsory and never changes while the blue text you'll have to change to suit the file and the engine which you want to avoid it. User-Agent: (Spider
Name) The User-Agent is the name of the search engines spider and Disallow is the name of the file that you don't want that spider to spider. I'm not entirely sure if the code is case sensitive or not but I do know that the code above works, so just to be sure check that the U and A are in caps and likewise the D in disallow. You've to start a new batch of code for each engine, but if you want to list multiply disallow files you can one under another. For example - User-Agent: Slurp
(Inktomi's spider) In the above code I have disallowed Inktomi to spider two pages optimized for Google (internet-marketing-gg.html & advertising-secrets-gg.html) and two pages optimized for Altavista (internet-marketing-al.html & advertising-secrets-al.html). If Inktomi were allowed to spider these pages as well as the pages specifically made for Inktomi, I run the risk of being banned or penalized so it's always a good idea to use a robots.txt file. I mentioned earlier that the robots.txt file resides on your webspace, but where on your webspace? The root directory that's where, if you upload your file to sub-directories it won't work. If you want to block certain engines from certain files that do not reside in your root directory you simply need to point to the right directory and then list the file as normal, for example - User-Agent: Slurp
(Inktomi's spider) Here's the names of a few of the big engines, do a search for 'search engine user agent names' on Google to find more. Excite - ArchitextSpider Be sure to check over the file before uploading it, as you may have made a simple mistake which could mean your pages are indexed by engines you don't want to index them, or even worse none of your pages mightn't be indexed. A little note before I go, I have listed the User-Agent names of a few
of the big search engines, but in reality it's not worth creating different
pages for more than 6-7 search engines. It's very time consuming and results
would be similar to those if you created different pages for only the
top five, more is not always best.
|
|