"Truthful, Reliable And Researched Information
You Can Use With Total Confidence"
Feel free to comment or contact us at any time

July 24, 2007

Dragging Your Pages From Googles Supplemental Index

Filed under: SEO Stuff — Ade Martin @ 4:31 am

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

Sometimes Google can be just too smart and it’s a real pain. I found over 40 pages from my blog in the and I thought they were all unique content. Here’s how Google made me feel like a kid in detention writing out 50 times “I Will Make My Content Unique To Stop Myself Being Punished” Sorry Sir!!

Now here’s the thing, the blog CONTENT is unique but the PAGE content can be construed as hence the banishment to the SI. Google looks at the whole page and this includes your template. So if your template has, say 200 words and this is consistent throughout the site in headers, footers and sidebars then a short blog post is going to flag as duplicate content.

A lesson is of course to write longer posts to match or exceed the word count for your template. Additionally, oh no it doesn’t stop there, blogs are notorious for being flagged as having - even some of my RSS feeds have been supplementaled (is that a word?).

As far as WordPress blogs are concerned use of the “more” tag and / or “the_excerpt” in the WP Loop is absolutely essential otherwise the full post on your blog’s home page and the full post on the blog’s actual address counts as duplication.

A Thank You: By the way HUGE Thanks go to Jeff over at Perishable Press, he is now one of the Good Guys, for taking the time to sort out a WP issue I was having using the “more” tag. That’s a story for another post - Thanks Jeff.

And it doesn’t stop there.

You need a robots.txt file to keep Google away from what it perceives to be duplicated such as archives. The robots.txt file I am using came from JohnTP.com and there is more information on this over there. Here is the file I am presently using - it goes in your root directory by the way.

User-agent: *
# disallow files in /cgi-bin
Disallow: /cgi-bin/
Disallow: /comments/
Disallow: /z/j/
Disallow: /z/c/
# disallow all files ending in .php
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.txt$
#disallow all files in /wp- directorys
Disallow: /wp-*/
# disallow all files with ? in url
Disallow: /*?
# disallow any files that are stats related
Disallow: /stats*
Disallow: /about/legal-notice/
Disallow: /about/copyright-policy/
Disallow: /about/terms-and-conditions/
Disallow: /about/feed/
Disallow: /about/trackback/
Disallow: /contact/
Disallow: /tag
Disallow: /docs*
Disallow: /manual*
Disallow: /category/uncategorized*

# disallow all files with ? in url
Disallow: /*?*

# disable duggmirror
User-agent: duggmirror
Disallow: /

# allow google image bot to search all images
User-agent: Googlebot-Image
Disallow:
Allow: /*

# allow adsense bot on entire site
User-agent: Mediapartners-Google*
Disallow:
Allow: /*

Be sure to Check out your file with first to make sure it is working correctly or you may end up blocking your whole site from the bots. The robots.txt checking tool identifies exactly what is being indexed and what is being blocked so it’s really useful.

Now we come to how I am dragging my pages from the Supplemental Index. On each post I am editing the content and adding sufficient unique words so as to have a minimum of 500 words for each post. This was when I started feeling like a schoolboy sitting in detention because my original essay wasn’t good enough. “Take 100 lines boy and make them unique” bellowed the Google schoolmaster.

I am in the process of submitting each post to the Social Bookmarking and News sites and also intend writing enough articles so I can link to each post from the resource box.

Sorry sir, Won’t do it again sir! Hope this post is unique sir.


Did You Find This Content Worthwhile? If so, in the spirit of Web2.0 and keeping junk content off the Internet, could I ask you to…
Add to Onlywire
and also Plug It? …
and if it’s not too cheeky can you Digg It? ;-)

Thanks, it will only take a couple of minutes to vote.

Feel free to comment. Once a week I spend a couple of hours looking for and submitting good content to the Bookmarking and News sites.

This is not a bribe to leave a good comment, I don’t play the game that way, but it makes sense that it is easier for me to start with people from my blog to visit their website and take a look. This is how we play the game and help each other.

Technorati Tags: , , , , , , ,

9 Comments »

  1. I’d be very careful using:

    Disallow: /tag

    Removing archives is fine, but I would definitely avoid removing tag pages - Big G seems to like them very much nowadays and it’s best to leave it up to her to decide whom should go supplemental …

    Comment by Sante — July 24, 2007 @ 12:43 pm

  2. There is a duplicate content plugin–

    http://www.seologs.com/wordpress-duplicate-content-cure/

    Comment by Face Natural — July 24, 2007 @ 6:01 pm

  3. Thanks for your feedback. You know I have been looking into this Supplemental Index problem with WP blogs for ages and your couple of posts have, I think, just about brought me to the end of my journey - at least until the next Google or WP update anyway ;-)

    Checking up on the /tags issue appears to be good information so it’s worth acting on. Why WordPress themselves put this information in I’ll never know. Just goes to show you can’t even trust the owners!

    Never thought of a duplicate content plugin for WP and I have found another one which I think will suit my purposes better called All In One SEO Pack at: http://wp.uberdose.com/2007/03/24/all-in-one-seo-pack/

    There is also a lot of information on this plugin due to the 744 comments posted. Wow! That’s a full time job…

    It appears that there is not one single to the problem of WP and duplicate content. You need a combination of a 2 plugins and also a robots.txt file

    Ubedose from uberdose.com recommends http://fucoder.com/code/permalink-redirect/ and a robots.txt looking like:

    User-agent: *
    Disallow: /wp-
    Disallow: /feed
    Disallow: /comments/feed
    Disallow: /feed/$
    Disallow: /*/feed/$
    Disallow: /*/feed/rss/$
    Disallow: /*/trackback/$
    Disallow: /*/*/feed/$
    Disallow: /*/*/feed/rss/$
    Disallow: /*/*/trackback/$
    Disallow: /*/*/*/feed/$
    Disallow: /*/*/*/feed/rss/$
    Disallow: /*/*/*/trackback/$

    (This is turning into a post not a comment)

    So now we have the feed, category, archives, 301’s taken care of leaving the tags and other bits of goodness for the SE’s to spider.

    Let’s see how this lot works out.

    Ade

    Comment by Ade Martin — July 25, 2007 @ 12:17 pm

  4. Maybe these links could be useful for your reference to avoid supplemental result:

    http://www.angellica2017.com/tips-how-to-avoiding-supplemental-result-in-wordpress-1
    http://www.angellica2017.com/tips-how-to-avoiding-supplemental-result-in-wordpress-2
    http://www.angellica2017.com/how-to-know-duplicate-content-and-supplemental-index
    http://www.angellica2017.com/how-to-optimizing-robotstxt-for-seo

    Sorry… I’m not spamming
    Just share knowledge from other sources….

    Comment by yosinta — July 25, 2007 @ 6:41 pm

  5. Thanks Yosinda,

    Pages read. I think Angellica is a Stomper.

    Comment by Ade Martin — July 25, 2007 @ 10:02 pm

  6. Another excellent article, Ade. I have been working on rescuing a few hundred of my pages from the dreaded supplemental index. So far, things seem to be going well, thanks to insightful articles such as this one. I am currently dissecting the article’s robots.txt rules and comparing them with my own. Lots of great stuff there - I had better get on with it then!

    Comment by Jeff — July 30, 2007 @ 4:43 pm

  7. Nicely done Ade, I hadn’t thought about changing the number of words or anything like that!

    Also the robots.txt file you are using above is originally from my post at http://www.askapache.com/seo/wordpress-robotstxt-seo.html

    Just yesterday I published an updated robots.txt file which I am still tweaking. http://www.askapache.com/seo/updated-robotstxt-for-wordpress.html

    Comment by AskApache — August 10, 2007 @ 7:45 pm

  8. It’s quite complicated staying on top of SEO, isn’t it?

    Thanks for the good material!

    Vincent Harrison
    http://coolmarketingproducts.blogspot.com

    Comment by Vincent Harrison — August 20, 2007 @ 10:39 pm

  9. Another great article about supplemental index
    http://seomization.blogspot.com/2007/09/supplemental-index.html

    Comment by matt — September 29, 2007 @ 11:03 am

RSS feed for comments on this post. TrackBack URI

Leave a comment

Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)