We constantly work for content-rich websites where sometimes hundreds of new articles are published on a daily basis. Analyzing traffic trends on these large properties and creating actionable reports is still time-consuming and inefficient. This is also very true for businesses investing in content marketing that need to dissect their traffic and evaluate their marketing efforts against concrete business goals (i.e. increasing subscriptions, improving e-commerce sales and so on).
As result of this experience, I am happy to share with you a Google Data Studio report that you can copy and personalize for your own needs.
Data is meant to help transform organizations by providing them with answers to pressing business questions and uncovering previously unseen trends. This is particularly true when your biggest asset is the content that you produce.
With the ongoing growth of digitized data and the explosion of web metrics, organizations usually face two challenges:
Finding what istruly relevant to untap a new business opportunity.
Make it simpler for the business user to prepare and share the data, without being a data scientist.
Semantic Web Analytics is about delivering on these promises; empowering business users and let them uncover new insights – from the analysis of the traffic of their website.
We are super lucky to have a community of fantastic clients that help us shape our product and keep pushing us ahead of the curve.
Before enabling this feature, both the team at Salzburgerland Tourismus and the team at TheNextWeb had already improved their Google Analytics tracking code to store entity data as events. This allowed us to experiment, ahead of time, with this functionality before making it available to all other subscribers.
What is Semantic Web Analytics?
Semantic Web Analytics is the use of named entities and linked vocabularies such as schema.org to analyze the traffic of a website.
The natural language processing that WordLift uses to markup the content with linked entities enables us to classify articles and pages in Google Analytics with – real-world objects, events, situations or even abstract concepts.
How to activate Semantic Web Analytics?
Starting with WordLift 3.20, entities annotated in webpages can also be sent to Google Analytics by enabling the feature in the WordLift’s Settings panel.
Here is how this feature can be enabled.
You can also define the dimensions in Google Analytics to store entity data, this is particularly useful if you are already using custom dimensions.
As soon as the data starts flowing you will see a new category under Behaviour > Events in your Google Analytics.
Events in Google Analytics about named entities.
WordLift will trigger an event labeled with the title of the entity, every time a page containing an annotation with that entity is open.
Using these new events we can look at how content is consumed not only in terms of URLs and site categories but also in terms of entities. Moreover, we can investigate how articles are connected with entities and how entities are connected with articles.
Show me how this can impact my business
Making sense of data for a business user is about unlocking its power with interactive dashboards and beautiful reports. To inspire our clients, and once again with the help of online marketing ninjas like Martin Reichhart and Rainer Edlinger from Salzburgerland, we have built a dashboard using Google Data Studio – a free tool that helps you create comprehensive reports using data from different sources.
Using this dashboard we can immediately see, for each section of the website, what are the concepts driving the traffic, what articles are associated with these concepts and where the traffic is coming from.
An overview of the entities that drive the traffic on our website.
Entities associated with an article about structured data.
This helps publishers and business owners analyze the value behind a given topic. It can be precious to analyze the behaviors and interests of a specific user group. For example, on travel websites, we can immediately see what are the most relevant topics for let’s say Italian speaking and German speaking travelers.
WordLift’s clients in the news and media sector are also using this data to build new relationships with advertisers and affiliated businesses. They can finally bring in meetings the exact volumes they have for – let’s say – content that mentions a specific product or a category of products. This helps them calculate in advance how this traffic can be monetized.
Are you ready to make sense of your Google Analytics data? Contact us and let’s get started!
Here is the recipe for a Semantic Web Analytics dashboard in Google Data Studio
With unlimited, free reports, it’s time to start playing immediately with Data Studio and entity data and see if and how it meets your organization’s needs.
To help with that, you can use as a starting point the report I have just created. Create your own interactive report and share with colleagues and partners (even if they don’t have direct access to your Google Analytics).
Simply take this report, make a copy, and replace with your own data!
1. Make a Copy of this file
Go to the File menu and click to make a copy of the report. If you have never used Data Studio before, click to accept the terms and conditions, and then redo this step.
2. Do Not Request Access
Click “Maybe Later” when Data Studio warns you that data sources are not attached. If you click “Resolve” by mistake, do not click to request access – instead, click “Done”.
3. Switch Edit Toggle On
Make sure the “Edit” toggle is switched on. Click the text link to view the current page settings. The GA Demo Account data will appear as an “Unknown” data source there.
4. Create A New Data Source
If you have not created any data sources yet, you’ll see only sample data under “Available Data Sources” – in that case, scroll down and click “Create New Data Source” to add your own GA data to the available list.
5. Select Your Google Analytics View
Choose the Google Analytics connector, and authorize access if you aren’t signed in to GA already. Then select your desired GA account, property, and the view from each column.
6. Connect to Your GA Data
Name your data source (at the top left), or let it default to the name of the GA view. Click the blue “Connect” button at the top right.
Are you ready to build you first Semantic Dashboard? Add me on LinkedIn and let’s get started!
We had the opportunity to interview Bill Slawski, Director of SEO Research at Go Fish Digital, Creator and Author of SEO by the Sea. Bill Slawski is among the most authoritative people in the SEO community, a hybrid between an academic researcher and a practitioner. He has been looking at how search engines work since 1996. With Andrea Volpini we took the chance to ask Bill a few questions to understand how SEO is evolving and why you should understand the current picture, to keep implementing a successful SEO strategy!
When did you start with SEO?
Bill Slawski: I started doing SEO in 1996. I also made my first site in 1996. The brother of one of the people I worked on that site, she was selling computers for a digital equipment corp at that time., she sent us an email saying, “Hey, we just started this new website. You guys might like it.” It was the time in which AltaVista was a primary search engine. This was my first chance to see a search engine in action. My client said, “We need to be in this.” I tried to figure out how, and that was my first attempt at doing SEO!
After the launch of Google Discover, it seems that we live in a query-less world? How has SEO changed?
Bill Slawski: It has changed, but it hasn’t changed that much. I remember in 2007 giving a presentation in an SEO meetup on named entities. Things have been in the atmosphere. We just haven’t really brought them to the forefront and talked about them too much. Query-less searches example? You’re driving down the road 50 miles an hour, you wave your phone around in the air and it’s a signal to your phone asking you where you’re going. “Give me navigation, what’s ahead of us? What’s the traffic like? Are there detours?” And your phone can tell you that. It can say there’s a five-minute delay up ahead. You really don’t need a query for that.
What do you then, If you don’t need a query?
Bill Slawski: Well, for the Google Now, for it to show you search suggestions, it needs to have some idea of what your search history is like, what you’re interested in. In Google Now, you can feed it information about your interests, but it can also look at what you’ve searched for in the past, what you look like you have an interest in. If you want to see certain information about a certain sports team or a movie or a TV series, you search for those things and it knows you have an interest in them.
Andrea Volpini: It’s a context that gets built around the user. In one analysis that we run from one of our VIP customers, by looking at the data from the Google search console I found extremely interesting how it had reached 42%! You can see actually this big bump is due to the fact that Google started to account this data. This fact might be scaring a lot of people in the SEO industry. As, if we live in a query-less world, how do you optimize for it?
Can we do SEO in a query-less world?
Bill Slawski: They (SEO practitioners) should be happy about it. They should be excited about it.
Andrea Volpini: I was super excited. When I saw it, for me, it was like a revelation, because I have always put a lot of effort into creating data and metadata. Before we arrived to structure data, it’s always been a very important aspect of the website that we build. I used to build CMS, so I was really into creating data. But I underestimated the impact of a content recommendation through Google Discover when it comes to the traffic of a new website. Did you expect something like this?
Bill Slawski: If you watch how Google is tracking trends, entity search, and you can identify which things are entities by them having an entity type associated with them, something other than just search term, so you search for a baseball team or a football team and you see search term is one category associated with it, and the other category might be professional Chicago baseball team. The professional Chicago baseball team is the entity. Google’s tracking entities. What this means is when they identify interests that you may have, they may do that somewhat broadly, and they may show you as a searcher in Google Now in Discover things related to that. If you write about some things with some level of generalization that might fit some of the broader categories that match a lot, you’re gonna show up in some of those discovery things.
It’s like when Google used to show headers in search results, “Search news now,” or “Top news now,” and identify your site or something you wrote as a blog post as something fits top news now category, you didn’t apply to have that. You were a beneficiary of Google’s recommendation.
Andrea Volpini: Yes. When I saw this, I started to look a little bit at the data in the Google search console of this client and then another client and then another client again. What I found out by comparing these first sites is that Google is tending not to make an overlap with Google search and Discover, meaning that if it’s bringing traffic on Google search, the page might not be featured on Discover. The pages that are featured on Discover that are also on Google search as high ranking. But I found extremely interesting the fact that pages that didn’t receive any organic traffic had been discovered by Google Discover as if Google is trying to differentiate these channels.
Is this two-level search effect widening?
Bill Slawski: I think they’re trying to broaden, we might say, broaden our experience. Give us things that we’re not necessarily searching for, but are related. There’s at least one AI program I’ve worked with where it looks at my Twitter stream and recommends storage for me based upon where I’ve been tweeting. I see Google taking a role like that: “These are some other things they might be interested in that they haven’t been searching for. Let me show them to them.”
There’s a brilliant Google contributor video about the Semantic Search Engine. The first few minutes, he starts off saying, “Okay, I had trouble deciding what to name this video. I thought about The Discover Search Engine. Then I thought about A Decision Search Engine and realized Bing had already taken that. A Smart Search Engine. Well, that’s obvious.”
But capturing what we’re interested in is something Google’s seeming to try to do more of with the related questions that people also ask. We’re seeing Google trying to keep us on search results pages, clicking through, question after question, seeing things that are related that we’re interested in. Probably tracking every click that we make as to what we might have some interest in. With one box results, the same type of thing. They’ll keep on showing us one box results if we keep on clicking on them. If we stop clicking on them, they’ll change those.
Andrea Volpini: Where are we going with all of these? How do you see the role of SEO is changing? What would you recommend to an SEO that starts today, what should he become? You told us how you started in ’96 with someone asking you to be on AltaVista, and I remember AltaVista quite well. I also worked with AltaVista myself, and we started to use AltaVista for intranet.
What would you recommend to someone that starts SEO today?
Bill Slawski: I’m gonna go back to 2005 to a project I worked on then. It was for Baltimore.org. It was a visitor’s center of Baltimore, the conference center. They wanted people to visit the city and see it and see everything they had to offer. They were trying to rank well for terms like Baltimore bars and Baltimore sports. They got in their heads that they wanted to rank well for Baltimore black history. We tried to optimize a page for Baltimore black history. We put the words “Baltimore Black History” on the page a few times. There were too many other good sites which were talking about Baltimore’s black history. We were failing miserably to rank well for that phrase. I turned to a copywriter and I said, “There are great places in Baltimore to see they have something to do with this history. Let’s write about those. Let’s create a walking tour of the city. Let’s show people the famous black churches and black colleges and the nine-foot-tall statue of Billie Holiday, the six townhomes that Frederick Douglas bought in his 60s.
“He was an escaped slave at one point in time, came back to Baltimore as he got older and a lot richer and started buying properties and became a businessman. Let’s show people those places. Let’s tell them how to get there.”
We created a page that was walking tour of Baltimore. After three months, it was the sixth most visited page on that site, a site of about 300 pages or so. That was really good. That was successful. It got people to actually visit the city of Baltimore. They wanted to see those things.
Aaron Bradley ran this series of tweets the other day where one of the things he said was, “Don’t get worried about the switch in search engines to entities. Entities are all around us. They surround us. They’re everywhere. They’re everything you can write about. They’re web pages. They’re people. They’re places.”
It’s true. If we switch from a search based on words, on matching words, on documents to words and queries, we’re missing the opportunity to write about things, to identify attributes, properties associated with those things to tell people about what’s in the world around us, and they’re gonna search for those things. That’s a movement that search engine makes you, being able to understand that you’re talking about something in particular and return information about that thing.
Andrea Volpini: The new SEO should become basically a contextual writer, someone that intercept the intents and can create good content around it.
Is there something else in the profession of SEO in 2020?
Bill Slawski: One of the things I read about recently was something called entity extraction. Search engine being able to read a page, identify all the things that are on that page that are being written about, and all the contexts that surround those things, all the classes, all the … you see the example on the post I wrote about was a baseball player, Bryce Harper. Bryce Harper was a Washington National. Bryce Harper hits home runs. That’s the context. He’s hit so many home runs over his career. Having search engine being able to take facts on a page, understand them, and make a collection of those facts, compare them to what’s said on other pages about the same entities, so they can fact check. It can do the fact check in itself. It doesn’t need some news organization to do that.
Andrea Volpini: Well, this is the reason when we started our project, my initial idea was to create a semantic editor to let people create link data. I didn’t look at SEO as a potential market, but then I realized that immediately, all the interest was coming from, indeed, the SEO community. For instance, we created your entity on the WordLift website. This means that when we annotate the content with our tool, we have this permanent linked data ID. In the beginning, I thought it was natural to have permanent linked data IDs, because this was the way that the semantic web worked. But then I suddenly realized there is a very strong SEO effect in doing that because Google is also crawling this RDF that I’m publishing.
I saw a few months back that it’s actually a different class of IP that Google uses for crawling this data.
Do you think that it still makes sense to publish your own linked data ID, or it’s okay to use other IDs? Do you see value in publishing data with your own systems?
Bill Slawski: Something I haven’t really thought about too much. But it’s worth considering. I’ve seen people publishing those. I’ve tried to put one of those together, and I asked myself, “Why am I doing this? Is there gonna be value to it? Is it gonna be worthwhile?” But when I put together my homepage, a page about me, I wanted to try it, see what it was capable of, to see what it might show in search engines for doing that. Some of it showed some of it didn’t. It was interesting to experiment with and try and see what the rest of the world is catching onto when you do create that stuff.
Andrea Volpini: But this is actually how the entity of Gennaro Cuofano was born in the Knowledge Graph. We started to add a lot of reference in telling Google, “Here is Gennaro, is also authors of these books.” As soon as we injected this information into our Knowledge Graph and into the pages, for Google it was easier to collect the data and fact-check and say, “Okay, this is the guy that wrote the book and now works for this company,” and so on and so forth.
Gennaro Cuofano: and Google provided a Knowledge Panel with a complete description. It was something that before, it was not showing up in search, or at least it was just partial information. It felt like, by providing this kind of information, we allowed the search engine, actually Google, to have a better context and fact-check the information which gave it authority to the information that I provided.
Bill Slawski: Have you looked at Microsoft’s Concept Graph?
Andrea Volpini: Yes! It’s even more advanced. I found it more advanced in a way. It’s also very quick in getting the information in. We have a lot more easy experience when we are someone that wants to be in Bing because as soon as we put such data it gets it into the panel.
Bill Slawski: It surprised me because, for a while, stuff that Microsoft Research in Asia was doing was disappearing. They put together probates and it stopped. Nothing happened for a couple of years. It’s been revived into the Microsoft Concept Graph, which is good to see. It’s good to see they did something with all that work.
Gennaro Cuofano: Plus, we don’t know how much integration is also Bink and LinkedIn APIs
Andrea Volpini: It’s pretty strong! Probably the quickest entry in the Satori, the Knowledge Graph of Microsoft, is now for a person to be on LinkedIn, because it is like they’re using this information.
What other ways can we use the structure data currently for SEO?
Bill Slawski: One of the things I would say to that is augmentation queries. I mentioned those on the presentation. Google will not only look at queries associated with pages about a particular person, place or thing, but it will also query the log information and will look at structured data associated with the page, and it will run queries based upon those. It’s doing some machine learning to try to understand what else might be interesting about pages of yours. If these augmentation queries, the test queries that it runs about your page, tend to do as well as the original queries for your page in terms of people selecting things, people clicking on things. It might combine the augmentation query results with the original query results when it shows people them for your page.
New schemas from the latest version of Schema 3.5 is the “knows about” attribute. I mentioned with the knows about attribute, you could be a plumber, you could know about drain repair. Some searches will send you plumbers, and they expect to see information just about Los Angeles plumbers, they may see a result from a Los Angeles plumber that talks about drain repair. That may be exactly what they’re looking for. That may expand search results, expand something relevant to your site that you’ve identified as an area of expertise, which I think is interesting. I like that structured data is capable of a result like that.
What is your favorite new addition to Schema 3.5?
Bill Slawski: FAQ page!
On Schema.org there’s such a wide range. They’re gonna update that every month now. But just having things like bed type is good.
What do you think is the right balance when I add structured data to my pages between an over-complicated data structuring and simplicity?
Bill Slawski: I Did SEO for a site a few years ago that was an apartment complex. It was having trouble renting units. There was a four-page apartment complex, and it showed up its dog park really well. It didn’t show off things like the fact that if you took the elevator to the basement, you got let out to the DC metro where you could travel all throughout Washington DC, northern Virginia, and southern Maryland and visit all 31 Smithsonian, and a lot of other things that are underground, underneath that part of Virginia. It was right next to what’s called Pentagon City, which is the largest shopping mall in Virginia. It’s four stories tall, all underground. You can’t see it from the street. Adding structured data to your page to identify those is something you can do. It’s probably something you should include on the page itself.
Maybe you want to include information, more information, on your pages about entities and include them in structured data, too, in a way that is really precise. You’re using that language identified and Schema that subject matter experts describe as something people might want to know. It defines it well. It defines it easily.
What you’re saying is do what you do with your content with your data. If you put emphasis on an aspect content-wise, then you should also do the proper markup for it?
Bill Slawski: Right! With the apartment complex I was talking about, location sells. It gets people to decide, “This is where I want to live.” Tell them about the area around them. Put that on your page and put that in your data. Don’t show pictures of the dog park if you want to tell them what the area schools are like and what the community’s like, what business is around, what opportunities there are. You can go to the basement, this apartment complex, and ride to the local baseball stadium or the local football stadium. You’re blocks away. DC traffic is a nightmare. If you ride the metro line everywhere, you’re much better off…
Andrea Volpini: That’s big. Also metro in real estate, we say it, it’s always increased 30% the value of the real estate if you have a metro station close by. Definitely is relevant. Something that is relevant for the business should be put into consideration also when structuring the page.
Is it worth also exploring Schema which is not yet officially used by Google?
Bill Slawski: You can anticipate things that never happen. That’s possible. But sometimes, maybe anticipating things correctly can be a competitive advantage if it comes into fruition that it’s come about. You mentioned real estate. Have you seen things like walkability scores being used on realty sites? The idea that somebody can give you a metric to tell you where you can compare easily one location to another based on what you can do without a car, it’s a nice feature. Being able to find out data about a location could be really useful.
Andrea Volpini: This is why, getting back to the linked data ID, this is why having a linked data ID for the articles and the entities that describe the article become relevant because then you can query the data yourself, and then you can make an analysis of what neighborhood that the least amount of traffic, and see, “Okay, did I write about this neighborhood or not?” This is also one of the experiments that we do these days is that we bring the entity data from the page into Google Analytics to help the editorial team think about what traffic entities are generating across multiple pages. Entities in a way can also be used internally for organizing things and for saying, “Yes, in this neighborhood, for instance, we have the least amount of criminality” or things like that. You can start cross-checking data, not only waiting for Google to use the data. You can also use the data yourself.
Is there any other aspect worth mentioning about how to use structured data for SEO?
Bill Slawski:Mike Blumenthal wrote an article based upon something I wrote about, the thing about entity extraction. He said, “Hotels are entities, and if you put information about hotels, about bookings, about locations, about amenities onto your pages so that people can find them, so people can identify those things, you’re making their experience searching for things richer and more …”
Andrea Volpini: We had a case where we had done especially this for lodging business. We have seen that as soon as we have started to add amenities as structured data, and most importantly, as soon as we had started to actually add geographic references to the places that this location we’re in, we saw an increase, not in pure traffic terms. The traffic went up. But we also saw an interesting phenomenon of queries becoming broader. The site, before having structured data to the hotels and to the lodging business, received traffic from very few keywords. As soon as we started to add the structured data and typing amenities and services, we also added the Schema action for booking, we saw that Google was bringing a lot more traffic on long tail keywords for a lot of different location that this business had hotels in, but it was not being visible on Google.
Bill Slawski: It wasn’t just matching names of locations on your pages to names of locations and queries, it was Google understanding where you were located-
What do you think Schema Actions are useful for?
Bill Slawski: There was a patent that came out a couple of years ago where Google said, “You can circle an entity on a mobile device and you can register actions associated with those entities.” Somebody got the idea right and the concept wrong. They were thinking about touchscreens instead of voice. They never really rewrote that so that it was voice activated, so you could register actions with spoken queries instead of these touch queries. But I like the idea. Alexa has the programs, being able to register actions with your entities is not too different from what existed in Google before. Think about how you would optimize a local search page where you would make sure your address was in a postal format so that it was more likely to be found and used. Of course, you wanted people to drive to a location, you’d want to give them driving directions, and that’s something you can register in action for now, but it’s already in there. It feels like you’re helping Google implement things that it should be implementing anyway, or you’re likely to be.
Andrea Volpini: Of course. I think that’s a very beautiful point, that we’re doing something that we should do. We’re now doing it for Google, but that’s the way it should be done. I like it. I like it a lot.
How much do you think structured data’s gonna help for voice search?
Bill Slawski: I can see Schema not being necessary because of other things going on, like the entity extraction, where Google is trying to identify. But Google tends to do things in a redundant way. They tend to have two different channels to get the same thing done. If one gets something correct and the other doesn’t, it fails to, they still have it covered. I think Schema gives them that chance. It gives site owners a chance to include things that maybe Google might have missed. If Google captures stuff and they have an organization like Schema behind them, which isn’t the search engine, it’s a bunch of volunteers who are subject matter experts in a lot of places or play those on TV, some are really good at that. Some of them miss some things. If you are a member of the Schema community mailing list, the conversations that take place where people call people on things, like, “Wouldn’t you do this for this? Wouldn’t you do that? Why aren’t you doing this?” It’s interesting to read those conversations.
Andrea Volpini: Absolutely. I always enjoy the mailing list of Schema, because as you said, you have a different perspective and different subject matter expert that of course are in the need of declaring what their content is about. Yeah, I think that Schema, I see it as a site map for data. Even though Google can crawl the information, it always values the fact that there is someone behind that it’s curating the data and that might add something that they might have missed, as you say, but also give them a chance to come to check and say, “Okay, this is true or not?”
Bill Slawski: You want a scalable web. It does make sense to have editors curating what gets listed. That potentially is an issue with Wikipedia at some point in the future. There’s only so much human edited knowledge it’s gonna handle. When some event changes the world overnight and some facts about some important things change, you don’t want human editors trying to catch up as quickly as they can to get it correct. You want some automated way of having that information updated. Will we see that? We have organizations like DeepMind mining sites like the DailyMail and CNN. They chose those not necessarily because they’re the best sources of news, but because they’re structured in a way that makes it easy to find that.
What SEOs should be looking at as of now? What do they need be very careful about?
Bill Slawski:It would be not to be intimidated by the search engine grabbing content from web pages and publishing it in knowledge panels. Look for the opportunities when they’re there. Google is business, and as a business, they base what they do on advertising. But they’re not trying to steal your business. They may take advantage of business models that maybe need to be a little more sophisticated than “how tall is Abraham Lincoln? “You could probably build something a little bit more robust than that as a business model. But if Google‘s stealing your business model from you in what they publish on knowledge panels, you should work around its business model and not be intimidated by it. Consider how much of an opportunity it is potentially to have a channel where you’re being focused upon, located easily, by people who might value your services.
The shift from keyword search to a queryless way to get information has arrived
Google Discover is an AI-driven content recommendation tool included with the Google Search app. Here is what we learned from the data available in the Google Search Console.
Google introduced Discover in 2017 and it claims that there are already 800M active users consuming content using this new application. A few days back Google added in the Google Search Console statistical data on the traffic generated by Discover. This is meant to help webmasters, and publishers in general, understand what content is ranking best on this new platform and how it might be different from the content ranking on Google Search.
What was very shocking for me to see, on some of the large websites we work for with our SEO management service, is that between 25% and 42% of the total number of organic clicks are already generated by this new recommendation tool. I did expect Discover to drive a significant amount of organic traffic but I totally underestimated its true potentials.
A snapshot from GSC on a news and media site
In Google’s AI-first approach, organic traffic is no longer solely dependent on queries typed by users in the search bar.
This has a tremendous impact on both content publishers, business owners and the SEO industry as a whole.
Machine learning is working behind the scenes to harvest data about users’ behaviors, to learn from this data and to suggest what is relevant for them at a specific point in time and space.
Let’s have a look at how Google explains how Discover works.
[…] We’ve taken our existing Knowledge Graph—which understands connections between people, places, things and facts about them—and added a new layer, called the Topic Layer, engineered to deeply understand a topic space and how interests can develop over time as familiarity and expertise grow. The Topic Layer is built by analyzing all the content that exists on the web for a given topic and develops hundreds and thousands of subtopics. For these subtopics, we can identify the most relevant articles and videos—the ones that have shown themselves to be evergreen and continually useful, as well as fresh content on the topic. We then look at patterns to understand how these subtopics relate to each other, so we can more intelligently surface the type of content you might want to explore next.
Embrace Semantics and publish data that can help machines be trained.
Once again, the data that we produce, sustains and nurture this entire process. Here is an overview of the contextual data, besides the Knowledge Graph and the Topic Layer that Google uses to train the system:
This research is limited to the data gathered from three websites only, while the sample was small few patterns emerged:
Google tends to distribute content between Google Search and Google Discover (the highest overlap I found was 13.5% – these are pages that, since Discover data has been collected on GSC, have received traffic from both channels)
Pages in Discover have not the highest engagement in terms of bounce rate or average time on page when compared to all other pages on a website. They are relevant for a specific intent and well-curated but I didn’t see any correlation with social metrics.
Traffic seems to work with a 48-hours or 72-hours spike as already seen for the top stories.
To optimize your content for Google Discover, here is what you should do.
1. Make sure you have an entity in the Google Knowledge Graph or an account on Google My Business
Either your business, or product, is already in the Google Knowledge Graph or it is not. If it is not, there are no chances that the content you are writing about for your company or product will appear in Discover (unless this content is bound to other broader topics). I am able to read articles about WordLift in my Discover stream since WordLift has an entity in the Google Knowledge Graph. From the configuration screenshot above we can actually see there are indeed more entities when I search for “WordLift”:
one related to Google My Business (WordLift Software Company in Rome is the label we use on GMB),
one from the Google Knowledge Graph (WordLift Company)
one presumably about the product (without any tagline)
one about myself as CEO of the company
So, get into the graph and make sure to curate your presence on Google My Business. Very interestingly we can see the relationship between myself and WordLift is such that when looking for WordLift, Google shows also Andrea Volpini as a potential topic of interest.
In these examples, we see that from Google Search I can start following persons that are already in the Google Knowledge Graph and the user experience in Discover for content related to the entity WordLift.
2. Focus on high-quality content and a great user experience
It is good also to remember that the quality in terms of both the content you write (alignment with Google’s content quality policies) and the user experience on your website is essential. A website that loads on a mobile connection in 10 seconds or more is not going to be featured in Discover. A clickbait article, with more ads than content, is not going to be featured in Discover. An article written by copying other websites and patently infringing copyrights laws is not likely to be featured in Discovery.
3. Be relevant and write content that truly helps people by responding to their specific information need
Recommendations tools like Discover only succeed when they are capable of enticing the user to click on the suggested content. To do so effectively they need to work with content designed to answer a specific request. Let’s see a few examples “I am interested in SEO” (entity “Search Engine Optimization“), or “I want to learn more about business models” (entity “Business Model”).
The more we can match the intent of the user, in a specific context (or micro-moment if you like), the more we are likely to be chosen by a recommendation tool like Discover.
4. Always use an appealing hi-res image and a great title
Images play a very important role in Google‘s card-based UI as well as in Discover. Whether you are presenting a cookie recipe or an article, the image you chose will be presented to the user and will play its role in enticing the click. Besides the editorial quality of the image I also suggest you follow the AMP requirements for images (the smallest side of the featured image should be at least 1.200 px). Similarly, a good title, much like in the traditional SERP is super helpful in driving the clicks.
5. Organize your content semantically
Much like Google does, using tools like WordLift, you can organize content with semantic networks and entities. This allows you to: a) help Google (and other search engines) gather more data about “your” entities b) organize your content the same way Google does (and therefore measure its performance by looking at topics and not pages and keywords) c) train our own ML models to help you make better decisions for your business.
Let me give you a few examples. If I provide, let’s say the information about our company, and the industry we work for using entities that Google can crawl. Google‘s AI will be able to connect content related to our business with people interested in “startups”, “seo” and “artificial intelligence“. Machine learning, as we usually say, is hungry for data and semantically rich data is what platforms like Discover use to learn how to be relevant.
If I look at the traffic I generate on my website, not only in terms of pages and keywords but using entities (as we do with our new search rankings dashboard or the Google Analytics integration) I can quickly see what content is relevant for a given topic and improve it.
Use entities to analyze our your content is performing on organic search
Here below a list of pages, we have annotated with the entity “Artificial Intelligence“. Are these pages relevant for someone interested in AI? Can we do a better job in helping these people learn more about this topic?
A few of the articles tagged with the entity “Artificial Intelligence” and their respective query
Learn more about Google Discover – Questions & Answers
Following in this article, I have a list of questions that I have answered in these past days as data from Discover was made available in GSC. I hope you’ll find it useful too.
How does Discover work from the end-user perspective?
The suggestions in Discover are entity-based. Google groups content that believes relevant using entities in its Knowledge Graph (i.e. “WordLift”, “Andrea Volpini”, “Business” or “Search Engine Optimization“). Entities are called topics. The content-based user filtering algorithm behind Discover can be configured from a menu in the application (“Customize Discover”) and fine-tuned over time by providing direct feedback on the recommended content in the form of “Yes, I want more of this”, “No, I am not interested”. Using Reinforcement Learning (a specific branch of Machine Learning) and Neural Matching (different ways of understanding what the content is about) the algorithm is capable of creating a personalized feed of information from the web. New topics can be followed by clicking on the “+” sign.
Topics are organized in a hierarchy of categories and subcategories (such as “Sport”, “Technology”). Read more here on how to customize Google Discover.
How can I access Discover?
On Android, in most devices, accessing Discover is as simple as swiping, from the home screen to the right.
Is Google Discover available only in the US?
No, Google Discover is already available worldwide and in multiple languages and it is part of the core search experience on all Android devices and on any iOS devices with the Google Search app installed. Discover is also available in Google Chrome.
Do I have to be on Google News to be featured in Discover?
No, Google Discover uses also content that is not published on Google News. It is more likely that a news site will appear on Google Discover due to the amount of content published every day and the different topics that a news site usually covers.
Is evergreen content eligible for Discover or only freshly updated articles are?
Evergreen content, that fits a specific information need, is as important as newsworthy content. I spotted an article from FourWeekMBA.com (Gennaro’s blog on business administration and management) that was published 9 months ago under the entity “business”.
Does a page need to rank high on Google Search to be featured in Discover?
Quite interestingly, on a news website where I analyzed the GSC data, only 13.5% of the pages featured in Discover had received traffic on Google Search. Pages that received traffic on both channels had a position on Google Search <=8.
Correlation of Google Discover Clicks and Google Search Position
How can I measure the impact of Discover from Google Analytics?
A simple way is to download the .csv file containing all the pages listed in the Discover report in GSC and create an advanced filter in Google Analytics under Behaviour > Site Content > All pages with the following combination of parameters:
Filtering all pages that have received traffic from Discover in Google Analytics
Discover is yet another important step in the evolution of search engines in answer and discovery machines that help us sift in today’s content multiverse.
Keep following us, and give WordLift a spin with our free trial!
One of the most fascinating features of deep neural networks applied to NLP is that, provided with enough examples of human language, they can generate text and help us discover many of the subtle variations in meanings. In a recent blog post by Google research scientist Brian Strope and engineering director Ray Kurzweil we read:
“The content of language is deeply hierarchical, reflected in the structure of language itself, going from letters to words to phrases to sentences to paragraphs to sections to chapters to books to authors to libraries, etc.”
Following this hierarchical structure, new computational language models, aim at simplifying the way we communicate and have silently entered our daily lives; from Gmail “Smart Reply” feature to the keyboard in our smartphones, recurrent neural network, and character-word level prediction using LSTM (Long Short Term Memory) have paved the way for a new generation of agentive applications.
From keyword research to keyword generation
As usual with my AI-powered SEO experiments, I started with a concrete use-case. One of our strongest publishers in the tech sector was asking us new unexplored search intents to invest on with articles and how to guides. Search marketers, copywriters and SEOs, in the last 20 years have been scouting for the right keyword to connect with their audience. While there is a large number of available tools for doing keyword research I thought, wouldn’t it be better if our client could have a smart auto-complete to generate any number of keywords in their semantic domain, instead than keyword data generated by us?The way a search intent (or query) can be generated, I also thought, is also quite similar to the way a title could be suggested during the editing phase of an article. And titles (or SEO titles), with a trained language model that takes into account what people search, could help us find the audience we’re looking for in a simpler way.
What makes an RNNs “more intelligent” when compared to feed-forward networks, is that rather than working on a fixed number of steps they compute sequences of vectors. They are not limited to process only the current input, but also everything that they have perceived previously in time.
This characteristic makes them particularly efficient in processing human language (a sequence of letters, words, sentences, and paragraphs) as well as music (a sequence of notes, measures, and phrases) or videos (a sequence of images).
Here above you can see the difference between a recurrent neural network and a feed-forward neural network. Basically, RNNs have a short-memory that allow them to store the information processed by the previous layers. The hidden state is looped back as part of the input. LSTMs are an extension of RNNs whose goal is to “prolong” or “extend” this internal memory – hence allowing them to remember previous words, previous sentences or any other value from the beginning of a long sequence.
The LSTM cell where each gate works like a perceptron.
Imagine a long article where I explained that I am Italian at the beginning of it and then this information is followed by other let’s say 2.000 words. An LSTM is designed in such a way that it can “recall” that piece of information while processing the last sentence of the article and use it to infer, for example, that I speak Italian. A common LSTM cell is made of an input gate, an output gate and a forget gate. The cell remembers values over a time interval and the three gates regulate the flow of information into and out of the cell much like a mini neural network. In this way, LSTMs can overcome the vanishing gradient problem of traditional RNNs.
If you want to learn more in-depth on the mathematics behind recurrent neural networks and LSTMs, go ahead and read this article by Christopher Olah.
Let’s get started: “Io sono un compleanno!”
After reading Andrej Karpathy’s blog post I found a terrific Python library called textgenrnn by Max Woolf. This library is developed on top of TensorFlow and makes it super easy to experiment with Recurrent Neural Network for text generation.
Before looking at generating keywords for our client I decided to learn text generation and how to tune the hyperparameters in textgenrnn by doing a few experiments.
AI is interdisciplinary by definition, the goal of every project is to bridge the gap between computer science and human intelligence.
I started my tests by throwing in the process a large text file in English that I found on Peter Norvig’s website (https://norvig.com/big.txt) and I end up, thanks to the help of Priscilla (a clever content writer collaborating with us), “resurrecting” David Foster Wallace with its monumental Infinite Jest (provided in Italian from Priscilla’s ebook library and spiced up with some of her random writings).
At the beginning of the training process – in a character by character configuration – you can see exactly what the network sees: a nonsensical sequence of characters that few epochs (training iteration cycles) after will transform into proper words.
As I became more accustomed to the training process I was able to generate the following phrase:
“Io sono un compleanno. Io non voglio temere niente? Come no, ancora per Lenz.”
“I’m a birthday. I don’t want to fear anything? And, of course, still for Lenz.”
David Foster Wallace
Unquestionably a great piece of literature 😅that gave me the confidence to move ahead in creating a smart keyword suggest tool for our tech magazine.
The dataset used to train the model
As soon as I was confident enough to get things working (this means basically being able to find a configuration that – with the given dataset – could produce a language model with a loss value equal or below 1.0), I asked Doreid, our SEO expert to work on WooRank’s API and to prepare a list of 100.000 search queries that could be relevant for the website.
To scale up the number we began by querying Wikidata to get a list of software for Windows that our readers might be interested to read about. As for any ML, project data is the most strategic asset. So while we want to be able to generate never-seen-before queries we also want to train the machine on something that is unquestionably good from the start.
The best way to connect words to concepts is to define a context for these words. In our specific use case, the context is primarily represented by software applications that run on the Microsoft Windows operating system. We began by slicing the Wikidata graph with a simple query that provided us with the list of 3.780+ software apps that runs on Windows and 470+ related software categories. By expanding this list of keywords and categories, Doreid came up with a CSV file containing the training dataset for our generator.
The first rows in the training dataset.
After several iterations, I was able to define the top performing configuration by applying the values below. I moved from character-level to word-level and this greatly increased the speed of the training. As you can see I have 6 layers with 128 cells on each layer and I am running the training for 100 epochs. This is indeed limited, depending on the size of the dataset, by the fact that Google Colab after 4 hours of training stops the session (this is also a gentle reminder that it might be the right time to move from Google Colab to Cloud Datalab – the paid version in Google Cloud).
Here we see the initial keywords being generated while training the model
Rock & Roll, the fun part
After a few hours of training, the model was ready to generate our never-seen-before search intents with a simple python script containing the following lines.
Here a few examples of generated queries:
where to find google drive downloads
where to find my bookmarks on google chrome
how to change your turn on google chrome
how to remove invalid server certificate error in google chrome
how to delete a google account from chrome
how to remove google chrome from windows 8 mode
how to completely remove google chrome from windows 7
how do i remove google chrome from my laptop
You can play with temperatures to improve the creativity of the results or provide a prefix to indicate the first words of the keyword that you might have in mind and let the generator figure out the rest.
Takeaways and future work
“Smart Reply” suggestions can be applied to keyword research workand is worth assessing in a systematic way the quality of these suggestions in terms of:
validity – is this meaningful or not? Does it make sense for a human?
relevance – is this query really hitting on the target audience the website has? Or is it off-topic? and
impact – is this keyword well-balanced in terms of competitiveness and volume considering the website we are working for?
The initial results are promising, all of the initial 200+ generated queries were different from the ones in the training set and, by increasing the temperature, we could explore new angles on an existing topic (i.e. “where is area 51 on google earth?”) or even evaluate new topics (ie. “how to watch android photos in Dropbox” or “advertising plugin for google chrome”).
It would be simply terrific to implement – with a Generative Adversarial Network (or using Reinforcement Learning) – a way to help the generator produce only valuable keywords (keywords that – given the website – are valid, relevant and impactful in terms of competitiveness and reach). Once again, it is crucial to define the right mix of keywords we need to train our model (can we source them from a graph as we did in this case? shall we only use the top ranking keywords from our best competitors? Should we mainly focus on long tail, conversational queries and leave out the rest?).
One thing that emerged very clearly is that: experiments like this one (combining LSTMs and data sourcing using public knowledge graphs such as Wikidata) are a great way to shed some light on how Google might be working in improving the evaluation of search queries using neural nets. What is now called “Neural Matching” might most probably be just a sexy PR expression but, behind the recently announced capability of analyzing long documents and evaluating search queries, it is fair to expect that Google is using RNNs architectures, contextual word embeddings, and semantic similarity. As deep learning and AI, in general, becomes more accessible (frameworks are open source and there is a healthy open knowledge sharing in the ML/DL community) it becomes evident that Google leads the industry with the amount of data they have access to and the computational resources they control.
This experiment would not have been possible without textgenrnn by Max Woolf and TensorFlow. I am also deeply thankful to all of our VIP clients engaging in our SEO management services, our terrific VIP team: Laura, Doreid, Nevine and everyone else constantly “lifting” our startup, Theodora Petkova for challenging my robotic mind 😅and my beautiful family for sustaining my work.
While attending this year’s “Da Zero a SEO” conference, held in Bologna from February 15th to 17th, where we were invited to serve both as a sponsor and as speakers, our CEO Andrea Volpini found himself in a very funny and serendipitous situation.
While sitting in the front row of the audience, he noticed that the young man right next to him was using WordLift’s plug-in at that very same moment. What better opportunity to get a warm, live impression from one of our customers than to establish a true relationship, tweet about how fate had brought them together and then ask him a few questions?
The performance of the website managed by Giuseppe has had a real surge in engagement in the last few months thanks to the benefits of WordLift. So we decided to ask him a few questions about his experience and to illustrate the project he manages with the help of our semantic plug-in, and the results of their SEO campaign.
When was the website launched, and what kind of website is PaolaReghenzi.it?
The website was launched in July 2017 and focuses on issues related to technical and cultural topics of photography, graphics, and most recently, the history of art and photography. Paola usually takes care of the editorial side of the project while the rest of the activities are carried out by Giuseppe.
How long have you been using WordLift? How did you learn about our plugin?
I started using WordLift in mid-November 2018. After doing some research in the area of Semantic SEO and structured data, we found WordLift.
Was it easy for you to understand and use it? What did it do the most to help you with?
It was very intuitive to be able to use the WordPress plug-in thanks to its many features: recognition of possible entities within the various articles and the ability to mark pages and articles as entities. We received the best results on the history of photography, where the photographers have been marked as “person” entities, managing to make it clearer.
Did you see an increase in traffic since you started using WordLift?
Since there were few articles published in 2018, the site had little traffic but it was fairly consistent. It lived off of the revenues gained from articles published in 2017. In November, I decided to try WordLift, populating the “vocabulary” section with different entities. Without adding or changing any type of content, but only thanks to structured data and the creation of triples, we noticed a significant and steady increase in organic traffic as well as an increase in impressions detected by the Search Console on fairly competitive keywords. Initially, I had dedicated little time to the plug-in, but, thanks to the performance that was above my expectations, I was able to go deeper into the subject and give a real boost to the website, knowing I had a strong ally like WordLift.
WordLift has played a key role in the growth strategy of Paola Reghenzi’s website and is now used to enrich every piece of new content. Furthermore, the automated markup of structured data and WordLift’s continuous support has contributed to increasing the organic visibility on the site: +24% in just three months! This means that users have increased by around 16% during this period, thanks to the markup implemented on the website and pages. What a success!
Make Your Website Smarter with AI-Powered SEO: just focus on your content and let the AI grow the traffic on your website!