One of the most difficult decisions made in any field is to consciously choose to miss deadlines. In the past few months, a team of talented engineers, data scientists, project managers, editors, and marketers worked hard to release the new Page Authority (PA) on September 30, 2020. Almost all aspects of current PA are absent, but our recent quality control measures show an anomaly that we cannot ignore.
Therefore, we made the difficult decision to postpone the release of Page Authority 2.0. So let me take a moment to review the steps of how we got here, where we left us and how we intend to proceed.
Open your eyes to see old problems
Historically, Moz has repeatedly used the same method to build the Page Authority model (and Domain Authority). The advantage of this model is its simplicity, but there are many shortcomings.
The page authorization model on the previous page was trained for SERP, trying to predict whether a URL will rank on another URL based on a set of link metrics calculated from the link resource manager backlink index. The key problem with this type of model is that it cannot meaningfully resolve the maximum strength of a specific set of link indicators.
For example, imagine the most powerful URLs on the Internet. The links to these URLs are: the homepage of Google, YouTube, Facebook, or the shared URL of the social network buttons that follow. No SERP can make these URLs conflict with each other. Instead, these very powerful URLs usually rank first, and their subsequent indicators are greatly reduced. Imagine if Michael Jordan, Kobe Bryant, and LeBron James each had one-on-one high school students seizing time. Everyone will win every time. However, it is difficult to infer from these results whether Michael Jordan, Kobe Bryant or LeBron James will win in a one-on-one match.
When responsible for revisiting domain authorization, we finally chose a model with extensive experience: the original SERP training method (although there are many adjustments). With the help of Page Authority, we decided to use another training method by predicting which page will have more natural visits. The model proposes some promising qualities, such as being able to compare URLs that do not appear on the same SERP, but it also raises other difficulties, such as pages with high link fairness, but only in the subject area that is not frequently searched. We solved many of these issues, such as enhancing the training set to use non-link metrics to measure competitiveness.
Measuring the quality of the new Page Authority
The results are very encouraging. First, the new model clearly predicts the possibility that one page will have more valuable organic traffic than another. This is expected, because the new model is aimed at this specific goal, and the current “page authority” only tries to predict whether one page will rank above another.
Second, we found out whether the page predicted by the new model is better than the previous Page Authority. This is particularly pleasing because it worries us a lot, because the new training model makes the new model perform poorly in the old quality control. How much better is the new model than the current PA in predicting SERP? At each time interval (all the way down to position 4 to 5), the new model is tied to the current model or performs poorly. It will never be lost.
all is well. Then, we began to analyze outliers. I like to call it “Does it look stupid?” test. Machine learning makes mistakes like humans do, but humans tend to make mistakes in very special ways. When a person makes a mistake, we often understand exactly why the mistake was made. This is not the case with ML, especially neural networks. In the new model, we granted URLs that happen to have URLs with zero natural traffic, and included them in the training set to learn these errors. We quickly saw that the 90+ singular PA dropped to the more reasonable 60s and 70s… another victory. We conducted the final test.
Brand search problem
Some of the most popular keywords on the web are navigational. People search Facebook, YouTube, and even Google itself on Google. The search volume of these keywords relative to other keywords is astronomical. Subsequently, a few strong brands may have a huge impact on models that use total search volume as part of their core training goals.
The last test involves comparing the current page authority with the new page authority to determine if there are any bizarre outliers (the PA has changed significantly and there is no obvious cause). First, let’s look at a simple comparison between the LOG of the link root domain and the page permissions.
Not too shabby. We see that there is usually a positive correlation between the link root domain and page permissions. But can you spot the strangeness? Go on, take a moment…
There are two anomalies in this chart
There is a strange gap between the main distribution of URLs and the outliers above and below. The biggest difference in a single score is PA99. The number of PA99 is large, with extensive link root domains.
This is a visual view that will help identify these anomalies
The gray space between green and red represents the odd gap between the majority of the distribution and the outliers. Outliers (red) tend to cluster together, especially above the main distribution. Of course, we can see the uneven distribution at the top of the PA 99s.
Keep in mind that these issues are not enough to make the new Page Authority model less accurate than the current model. However, after further inspection, we found that the errors produced by the model were very serious enough to adversely affect the customer’s decision-making. It’s better to have a model that deviates a little bit everywhere (because the adjustments made by SEO are not incredibly fine-tuned) than having a model that is correct almost everywhere but is exceptionally wrong in a few cases.
Fortunately, we are very confident about where the problem is. It seems that the PA on the homepage is too high, and the possible culprit is the training set. Until we complete the retraining, we cannot be sure that this is the reason, but it is a strong clue.
Good news and bad news
For now, we are in good shape because we have multiple candidate models that outperform the existing Page Authority. We are in the stage of vulnerability compression, not model building. However, unless we are confident that it will lead our customers in the right direction, we will not release new scores. We attach great importance to the decisions made by our customers based on our indicators, not just whether these indicators meet certain statistical standards.
Considering all these, we decided to postpone the launch of Page Authority 2.0. This will provide us with the necessary time to resolve these major issues and develop outstanding indicators. Is it frustrating? Yes, but it is also necessary.
As always, we thank you for your patience and cooperation, and we look forward to producing the best Page Authority metrics we have ever published. CONTACT US: BEST SEO SERVICES IN PAKISTAN.