Published on October 8th, 2013 | by Derek Devlin2
Google’s Penguin Algorithm Demystified
I find that I’m often repeating myself in long drawn out emails back and forth so I wanted to put a post together to provide a layman’s explanation for what is going on with Penguin and hopefully explain the motives that lie behind the algorithm from Google’s perspective whilst also sharing some insight.
It’s first useful to understand how the Google Web spam team works.
Google’s primary objective is to present search results that are relevant and offer high quality. They set out the rules that govern how webmasters should behave in their Google Webmaster Guidelines.
Any action that violates the Google Webmaster Guidelines can be considered SPAM.
The first important thing to understand is that Google has separate teams that attempt to police the Webmaster guidelines and fight web spam:
Mathematicians | Software Engineers | Analysts:
One portion of the search quality team at Google are the computer science experts (engineers) whose job it is to detect patterns that can be associated with websites that are creating spam. The engineers consider both onsite and offsite factors but are primarily concerned with tackling problems that can be diagnosed at scale, across the whole web.
The Google Engineer’s job is to develop complex formulas known as algorithms that can automatically detect the patterns associated with SPAM or poor user-experience and then find ways to take action against that particular segment of websites to reduce their impact or visibility in the search engine results pages (SERPS).
Eric Enge refers to the pattern used in search algorithms as a websites “signature”.
In Eric’s article (Google Doesn’t Care if it Ranks your site properly), he explains:
If your site has been hit by an algorithm, or is hit by one in the future, it means that it shares some characteristics with the types of sites that Google was targeting. Google targeted a specific set of test case sites, and overall search quality went up. Therefore, all the impacted sites have something in common with the test case sites.
I refer to this as having a signature that Google associates with poor quality sites. It doesn’t mean you have a poor quality site, just that you have something in common with them.
Human Quality Raters:
Google has 000’s of employees located all around the world who effectively work as crowd sourced “mystery shoppers” – these guys get paid by the hour and perform the menial task of manually checking the quality of search results.
It’s the feedback loop from these “search quality raters” that provides Google with vital information that has actually been cross checked by real human beings rather than a computer program, this information is used to refine future algorithm changes, it’s also worth noting that search quality raters do not directly impact rankings.
If you’re interested in reading the guidelines search quality raters use, you can view an ‘official’ copy of the document here.
If you have been guilty of using excessively SPAMMY tactics in the hope that you could “modify your rankings” then Google may choose to take a “manual action” against your site.
Manual Actions, also known as penalties are more serious than algorithmic fluctuations.
Although… rather confusingly, it was probably the same algorithm that caught you but because you tripped an excessive threshold of tolerance you got a manual action as a result (remember…
Google likes to automate, even if they don’t like us SEO’s to do the same #doaswesaynotaswedo!).
Websites that get manual actions / penalties fall within the ‘Black Hat SEO’ bucket, since you have clearly abused the Google Webmaster Guidelines to excess.
The result is that part or all of your website is automatically de-indexed from Google until you can prove that you have cleaned up all violations.
Your only way back into the index is by submitting a ‘reconsideration request’ and then having a member of the web spam team review your case and confirm that you are clean and able to return to the index.
So where does Penguin fit into all this?
Penguin is an algorithm.
Just one of many algorithms that work together in harmony to help Google serve relevant search results.
It was developed using sophisticated machine learning as an automated detection system for offsite spam through interrogating website’s backlink profiles.
If your link graph has a high proportion of spammy tactics then you are likely to be caught and have your rankings demoted as a result.
The algorithm runs periodically to act like a filter and suppress or ‘mark down’ websites that match a particular set of criteria. It means your site has some degree of spammy behaviour but not so much that it tripped the thresholds to get an all out manual penalty.
When Google runs the Penguin Algorithm, they effectively push a button to run a computer program that works it’s way through the whole Google Index re-ordering each website based on the new parameters.
If your site drops significantly then the algorithm has decided that your backlinks were artificially inflating your sites authority.
The rankings you have as a result, were not a true reflection of where you should be sitting.
The links you obtained must have been in violation to the Google Webmaster guidelines and my feeling is that these links are given a negative score thus suppressing your site so that you are unable to rank purely on the basis of adding more links.
Some have stated that they believe all Penguin does is devalue those bad links, thus cutting the Pagerank you were benefiting from having them. Whereas, I’m more inclined to believe that this is targeted suppression on your site, either at the keyword or page level.
The only way to remedy this problem is to remove the bad links or correct the issues that are overly optimized, which in turn removes the ‘negative ranking factor’ associated with your site.
People often refer to a loss of rankings after a Penguin update as a ‘penalty’ because it’s associated with a sustained period of lost traffic that isn’t lifted until something on or off the site is changed.
However, I don’t really like to use the word penalty in this scenario because I think it can be confusing to complicate this with a manual action.
Matt Cutts said in a Webmaster Central blog post that “well under 2% of domains we’ve seen are manually removed for webspam” but how do you know if you have a manual action against your site?
You can diagnose whether you have got a “manual penalty” by checking the Google Webmaster Tools (GWT) section called “manual actions”, you’ll find it under “Search Traffic”.
You will receive a notification inside GWT when the penalty is applied.
If you are in the clear, you will see this:
If you are in deep water and have a manual action, you will see this:
The site in the above example, actually has two manual actions…
Yes… you can have multiple actions against your site at the same time!
The Negative Ranking Factor
(When a penalty’s not really a penalty)… well, sort of…
If your website sees a drop in organic visibility coinciding with a confirmed Penguin update, such as the Penguin 2.0 or the recent Penguin 2.1 but did not receive a notification in GWT, then it means that your site is subject to a “negative ranking factor”.
Strictly speaking, I wouldn’t call this a penalty since you don’t have a confirmed manual action.
What you have experienced is an algorithmic demotion of your site on the basis that you share a common pattern with other websites that have also been impacted negatively and in turn; you stood out in some way from the vast majority of sites that were unaffected…. there’s that darn signature we were talking about earlier!!
To understand the most likely reasons why you ‘stood out’ against those sites that were unaffected, you should compare and contrast your link graph against high-ranking competitors… this is what the Link Research Tools were built for.
The suite is my go-to toolkit for profiling links and identifying ‘anomalies’, which can provide a key indicator of areas that should be corrected to get your site back inline.
Given that Penguin hits are mostly algorithmic, you will not be notified in Google Webmaster Tools that you have an issue.
The only way to diagnose it is to compare the known dates when the update was released against your organic search visibility. If you see a sharp decline that doesn’t recover coinciding with the time of the algorithms implementation then it’s quite likely that you have been negatively impacted – assuming all other factors remained constant.
Here’s what that looks like:
Can I recover my site from Penguin?
Yes… but not only do you need to remove bad links but you need new high quality natural looking links too.
Sites that have been negatively impacted by Penguin most often do not make sudden “recoveries” in the true sense of the word, not at least how most webmasters would define recovery.
The rankings you had prior to the Penguin Algorithm being run were artificial.
They were based on links that violated the Google Webmaster Guidelines. In order to be able to rank again, you have to remove these links. That means that your new rankings, although free from any suppression by Penguin are likely to still be lower than you had previously… unless of course, you manage to build a similar number of links but this time making sure they are of high quality and natural.
The reason for this is that although a link may be bad because it comes from a spammy domain, it can still pass Pagerank (a vote for your site).
This is the exact reason why Google are working to make these links negative signals because they undermine the core formula for scoring sites – Pagerank.
Removing a vast number of links decreases your Pagerank score, even though the links were bad. Therefore, the only way to truly recover your rankings is to not only remove the links that are harming your site but also to build clean natural links that restore your Pagerank.
Build enough cleans links to replace the spammy ones and your rankings will return at the next Penguin refresh (assuming you have removed and disavowed your bad links).
One of the really demoralising things with Penguin is that it does not work in real-time.
That is, we cannot make changes and instantly see if our site responds positively or negatively. This is one of the big challenges. Instead, we need to wait until there is a new update or ‘refresh’ of Penguin to gauge our progress. At this point all the websites are again reshuffled according to the Penguin parameters as before but a new order is restored, favouring websites that fit into the ‘clean’ link graph bucket.
This makes the process pretty frustrating for Webmasters sitting in limbo waiting to see if they will improve or decline. Especially when you consider that the official Penguin refreshes have thus far been coming at intervals of about 5-6 months apart.
How often does Google Penguin update?
Do I really have to wait 6 months?
Hopefully not… Glenn Gabe of GSQi has reported seeing sites improve during Panda refreshes so the feeling is that although these aren’t confirmed Penguin refreshes, Google possibly runs the Panda and Penguin algorithms at the same time to keep SEO’s guessing about what is going on but also to give sites that have cleaned up a chance to get back out the hole.
Here is the confirmed update history for Penguin, you can see it’s pretty much 2 refreshes a year:
- 1.0 – 24th April 2012
- 1.1 – 26th May 2012
- 1.2 – 5th October 2012
- 2.0 – 22nd May 2013
- 2.1 – 4th October 2013
Google has attempted to make the process of cleaning your link profile more transparent and easier to manage by releasing a ‘link disavow tool’.
The purpose of the tool is to allow webmasters to upload a list of domains that you identify as low quality and untrusted. Google claims to ignore these links in future ranking algorithms.
Many SEO’s question whether the disavow file is actually acted upon in totality.
Google has said the disavow file is a “hint” to them that the links should not be trusted, it’s not to be treated as a ‘get out of jail free card’ so this would imply that they reserve the right to not take disavowed links into account.
Having said that Cyrus Shepard proved irrefutably that the disavow tool does work and that it can be very damaging to your site if used unnecessarily – in an experiment he disavowed every link to his blog, just to see what would happen – his rankings tanked and as of the time of writing have not returned.
That being said, Google wants webmasters to clean up the web and the best way of knowing if a link will not be considered is to get it deleted from the website in question, which unfortunately relies on the goodwill of the site owner and a lot of manual work in contacting sites with link removal requests.
Fortunately, software exists for making this job a lot easier – the market leader is called Link Detox. Link Detox is my favourite software for diagnosing link problems at scale. Simply run the tool and the software will grade your links using similar techniques that Google does.
Link Detox can handle up to 5million links, which I think is pretty impressive. To my knowledge, no other tool offers this power.
Once you have identified the toxic links, you can then take action on these links by contacting sites for link removal.
Buzzstream was originally conceived as a link building tool but we have found it invaluable for finding contact details for website owners and managing email communications between parties.
What’s different about Penguin 2 compared to Penguin 1.0?
Penguin 2.0 targets very similar “footprints” and “website signatures” as in Penguin 1.0.
The difference being that it’s more sophisticated and goes deeper.
It looks at links to internal pages as well as the homepage, whereas Penguin 1.0 only looked at the homepage. Google is trying to reduce its reliance on anchor text as a relevancy factor and is now favouring “brands”.
Be careful of:
- Over-optimised anchor text.
- Links from a ‘bad neighbourhood’.
- Too many links from irrelevant off-topic sites and sites outwith a geography that makes sense for your site.
- Black or grey hat tactics such as comment spam, links from spun content, guest posts from questionable sites.
- Site wide links with money anchor text skewing your anchor text density.
Over-optimisation of anchor text is the single most damaging factor!
It is the easiest for Google to measure and it is the most abused aspect of link building.
In just about every site I have analysed this has been the single biggest common theme, closely followed by excessive abuse of link directories.
My belief is that the effectiveness of anchor text as a ranking signal is decreasing rapidly and is only going to get less important, even for competitive terms.
Tips for future proof SEO – How to avoid being burned by Penguin:
- Based on the verticals I’ve analysed, the sites Google now favours in the SERPS have a very small proportion of links with the “money keywords”.
- This is very true of local and mid-trafficked keyword verticals where many high ranking sites don’t even have a single money anchor text link!
- Bigger keyword verticals appear to have a greater tolerance for money anchor text and so it may be acceptable to a degree.
- Without fully understanding your market and the tolerance accepted in your vertical, building exact match, money keyword anchor texts is SEO suicide.
- Anchor text is the single biggest indicator to Google that you have been trying to ‘modify your rankings’.
- Relevancy is being determined by your position in the link graph.
- The quality, subject matter, content and type of site linking to your website are very strong signals that you should be aware of.
- Internal linking is more important than ever before.
- Internal Linking should be the focus of your efforts in terms of anchor text for targeting keywords, although make sure and keep it natural and diversified –
- Forget hammering offsite money anchor text links and focus on building your brand.
- Be conscious of the quality of site linking to you, the easier it is to place the link – the less likely it is that the link will be of benefit to you!
- Use trust metrics such as “Cemper Trust” or “Majestic SEO’s Trust flow” to gauge the worthiness of a link.
- Avoid getting embroiled in techniques that are done using any form of automation: mass blog comments, forum profile links, web 2.0 posting.
- Similarly, do-follow press releases and do-follow blogs with no editorial guidelines should be avoided.
Finally, if I can offer you one bit of advice…
The key to long-term sustainability is to ‘blend in’ – remember that signature I talked about?
Model good practice by aggregating the characteristics of top ranking sites in your vertical… if any aspect of your link profile stands out against the sites that Google favours, then chances are you are open to being swept up by an algorithm.
I’ve written this post in the hope that it explains some of the mystery surrounding how Google’s Penguin Algorithm works.