1 out of 5 Google Knowledge Graph Entries for Trending Keywords are Outdated

1 out of 5 Google Knowledge Graph Entries for Trending Keywords are Outdated


Given that Wikipedia is user edited and therefore a fluid, ongoing source of information, we wanted to know what percentage of Knowledge Graph entries differed from their most recent Wikipedia entry. And when the two do differ, how far behind the most recent entry is Google?

In May, Google rolled out what they are calling the “Knowledge Graph”, a collection of information culled from a number outside sources that appears in the right frame of the search page for many queries.

knowledge graph for lebron james

A major informational source for Google’s Knowledge Graph is Wikipedia.  Wikipedia has been shown to be highly visible in the SERPs before Knowledge Graph—with Wikipedia appearing on page one for 6 out of 10 informational keywords—but given the increased prominence it now has in appearing at the top of the SERP for many informational queries with the launch of Knowledge Graph, we were curious to see how in-sync Knowledge Graph results are with actual Wikipedia results.

lebron james wiki

Specifically, given that Wikipedia is user edited and therefore a fluid, ongoing source of information, we wanted to know what percentage of Knowledge Graph entries differed from their most recent Wikipedia entry.  And when the two do differ, how far behind the most recent entry is Google?

We hypothesized that ‘active’ queries—those queries that have experienced a recent spike in search activity—would show a higher mismatch rate than ‘normal’ keywords since a spike in search volume would stem from recent real world events around the subject, resulting in both more frequent and more recent Wikipedia editing.

For example, LeBron James’s Wikipedia entry would have recently been edited to reflect his winning the NBA championship.  Our evaluation of the Knowledge Graph-Wikipedia mismatch and lag is, therefore, user impacting because substantial lag means searchers will not be viewing relevant information about the subject in Google’s Knowledge Graph that reflects recent events.

Two Distinct Groups Evaluated

To measure the mismatch rate of trending and ‘normal’ keywords between the Knowledge Graph and Wikipedia we built two keyword lists of 50 queries each.  Although a sample size of several hundred would have been ideal, a portion of the analysis was fairly manual and therefore time consuming to collect.  To ensure uniformity in the analysis and to select samples that are likely to have both a Wikipedia and Knowledge graph entry, we focused on ‘people’ keywords:

For each query we compared the Knowledge Graph result on the SERP to its Wikipedia entry and noted whether it was or was not an exact match.

When they did not match, we measured the lag distribution of the mismatched queries by using WikiBlame to determine when the change occurred and, subsequently, the number of days the Knowledge Graph was behind.

1 out of 5 High Activity Queries Do Not Match

Looking at the results, we see that our hypothesis seems to hold up.  High activity queries, whose Wikipedia entries are likely to change on a more frequent basis, are mismatched far more often than lower activity queries, with one out of five (20%) not matching compared to a 4% mismatch for low activity keywords.

knowledge graph wikipedia info research

Half of Mismatched Queries Lag By Two Days or More

When we dig deeper into the size of the lag between Knowledge Graph and Wikipedia for mismatched queries, we see that half of the mismatched queries are two or more days behind.  This finding may surprise readers even more than the percentage of mismatches and may ultimately say something about Google’s Knowledge Graph infrastructure (e.g. the frequency with which they can refresh data from Wikipedia).

knowledge graph lag time

Google Can Do Better

Our analysis of both low and high activity queries tells us that Google and Wikipedia are mismatched for a substantial ‘one out of five’ high activity queries.  And, when they are mismatched, half the time, Google is behind by 2 or more days.  The implication is that searchers may not be seeing the most relevant information for their query.  For some context, in our LeBron James example, this means his Knowledge Graph entry could have been without reference to his recent championship for up to four days.

While a real time Wikipedia update may ultimately not be practical, if Google is indeed positioning Knowledge Graph as the future of search, we have to believe that they can do better than the 2-4 day lag many of their mismatched keywords currently reflect.

About Nathan Safran

Nathan is the Director of Research at Conductor and leads Conductor’s research and content team. Nathan is a monthly columnist at Search Engine Land and Search Engine Watch. Nathan’s research on digital marketing has been widely covered in both industry publications and mainstream media such as Techcrunch, Venture Beat and the Washington Post. Prior to joining Conductor, Nathan was an analyst at Forrester Research.

  • http://paydayrainbow.blog.co.uk/ payday loans

    Hello There. I found your weblog the use of msn. This is a very smartly written article.
    I’ll make sure to bookmark it and come back to learn more of your useful info. Thanks for the post. I will certainly return.

  • http://martinlevinson.com martin h levinson

    Very interesting article.
    fyi Google my name, Martin H. Levinson, and a Google Knowledge Graph pops up with Steven H. Levinson as the author of my six books. Everyday for the past four months I have sent online feedback to Google about the problem. I have also sent letters to Eric Schmidt and Larry Page asking them to have Google correct its false Knowledge Graph results. So far, no response.

Scroll To Top