Dragging Your Pages From Googles Supplemental Index

Sometimes Google can be just too smart and it’s a real pain. I found over 40 pages from my blog in the and I thought they were all unique content. Here’s how Google made me feel like a kid in detention writing out 50 times “I Will Make My Content Unique To Stop Myself Being Punished” Sorry Sir!!

Now here’s the thing, the blog CONTENT is unique but the PAGE content can be construed as hence the banishment to the SI. Google looks at the whole page and this includes your template. So if your template has, say 200 words and this is consistent throughout the site in headers, footers and sidebars then a short blog post is going to flag as duplicate content.

A lesson is of course to write longer posts to match or exceed the word count for your template. Additionally, oh no it doesn’t stop there, blogs are notorious for being flagged as having – even some of my RSS feeds have been supplementaled (is that a word?).

As far as WordPress blogs are concerned use of the “more” tag and / or “the_excerpt” in the WP Loop is absolutely essential otherwise the full post on your blog’s home page and the full post on the blog’s actual address counts as duplication.

A Thank You: By the way HUGE Thanks go to Jeff over at Perishable Press, he is now one of the Good Guys, for taking the time to sort out a WP issue I was having using the “more” tag. That’s a story for another post – Thanks Jeff.

And it doesn’t stop there.

You need a robots.txt file to keep Google away from what it perceives to be duplicated such as archives. The robots.txt file I am using came from JohnTP.com and there is more information on this over there. Here is the file I am presently using – it goes in your root directory by the way.

User-agent: *
# disallow files in /cgi-bin
Disallow: /cgi-bin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
# disallow all files ending in .php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
#disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow all files with ? in url
Disallow: /*?
# disallow any files that are stats related
Disallow: /stats*
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*

# disallow all files with ? in url
Disallow: /*?*

# disable duggmirror
User-agent: duggmirror
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Allow: /*

# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Allow: /*

Be sure to Check out your file with first to make sure it is working correctly or you may end up blocking your whole site from the bots. The robots.txt checking tool identifies exactly what is being indexed and what is being blocked so it’s really useful.

Now we come to how I am dragging my pages from the Supplemental Index. On each post I am editing the content and adding sufficient unique words so as to have a minimum of 500 words for each post. This was when I started feeling like a schoolboy sitting in detention because my original essay wasn’t good enough. “Take 100 lines boy and make them unique” bellowed the Google schoolmaster.

I am in the process of submitting each post to the Social Bookmarking and News sites and also intend writing enough articles so I can link to each post from the resource box.

Sorry sir, Won’t do it again sir! Hope this post is unique sir.

