Image SEO: optimizing images using machine learning

Image SEO: optimizing images using machine learning

In this article, I will share my findings while attempting to use neural networks to describe the content of images. Images greatly contribute to a website’s SEO and improve the overall user experience. Fully optimizing images is about helping users, and search engines, better understand the content of an article.

The SEO community has always been quite keen in recommending publishers to invest on visual elements and this has become even more important in 2019 as Google keeps on revamping Google Image Search by adding new filters and new functionalities.

Google’s Image Search user interface

Google’s Image Search user interface

There are several aspects that Google mentions in its list of best practices for images but the work I’ve been focusing on, for this article, is about providing alt text and captions in a semi-automated way. Alt text and captions, in general, improve accessibility for people that use screen-readers or have limited connectivity and help search engines understand what the content of an article is about.

“Google Images and Video search is often overlooked, but they have massive potential.”

“We simply know that media search is way too ignored for what it’s capable doing for publishers so we’re throwing more engineers at it as well as more outreach.”

– Gary Illyes, Google’s Chief of Sunshine and Happiness & trends analyst

Let’s start with the basic of Image SEO with this historical video from Matt Cutts that, back in 2007, explained to webmasters worldwide the importance of descriptive alt text in images.

Agentive SEO: AI that works for webmasters…sort of

The work we do at WordLift with our partner WooRank aims at building agentive technologies for digital marketers. I had the pleasure of meeting Christopher Noessel in San Francisco and learned from him the principles of agentive technology (Chris has written a terrific book that I recommend you to read called Designing Agentive Technologies). One of the most important aspects in designing agentive tech is to focus on efficient workflows to augment humans intelligence with the power of machines by taking into account the strengths and the limitations of today’s AI.

Make Your Website Smarter with AI-Powered SEO: just focus on your content and let the AI grow the traffic on your website!

Courtney McGhee


The workflow to enrich image metadata in WordPress

In this experiment we proceed as follow:

  1. we start by downloading the XML export feed for media files using the WordPress Export tool
  2. we send a request to the Microsoft Vision APIs
  3. we store the results in a CSV file that we can later use to check and validate the outcome of the analysis with Google Sheets (or Excel) using the power of our natural intelligence ?
  4. we add back the descriptions in the CMS with an importer (I didn’t develop this part yet but there are already plugins that import data stored in CSV files in the WordPress database).

Purely relying on machines is not really an option to improve your image SEO and I will show you why. Nevertheless, a strong-willed editor with the code described in this article can curate hundreds of images in a few hours.

Keep on reading if you are interested in ML experiments or simply jump at the end of the article to get the code I finally used to enrich the media library of one of the clients of our SEO managed services.

Get Comfortable with experiments

Machine learning requires a new mindset: way different from the mindset we have in traditional programming. You tend to write less code and to focus most of the attention in the data being used for training the model but … in the end, will the model you are building be usable in a real-world environment? Can you really rely on it to improve your search rankings? Hard to say from the start.

The advantages of setting up your own pipeline for training an ML model are obvious – especially if, like us, you are building a product that thousands of people will use:

  • You are totally independent of external providers (this usually means you keep control of the costs)
  • You can fine-tune the data as well as the model for the needs of your users   

Armed with passion and enthusiasm I set up a model for image captioning roughly following the architecture outlined in this article “Automatic Image Captioning using Deep Learning (CNN and LSTM) in PyTorch“ that is based on the results published in the “Show and Tell: A Neural Image Caption Generator” paper by Vinyals et al., 2014.

The implementation is based on a combination of two different networks:

  • A pre-trained resnet-152 model that acts as an encoder. It transforms the image in a vector of features that is sent to the decoder
  • A decoder that uses an LSTM network (LSTM stands for Long short-term memory and it is a Recurrent Neural Network) to compose the phrase that describes the featured vector received from the encoder. LSTM, I learned along the way, are used by Google and Alexa for speech recognition, Google also uses it in the Google Assistant and in Google Translate.

One of the main dataset used for training in image captioning is called COCO and is made of a vast number of images, each image has 5 different captions that describe it.

I quickly realized that training the model on my laptop would have required almost 17 days no-stop with the CPU running at full throttle. I had to be realistic and I downloaded the pre-trained model that was available.

RNN for sure are not hardware friendly and use an enourmous amout of resources for training. 

Needless to say, I remained speechless as soon as everything was in place and I was ready to make the model talk for the first time. By providing the image below the result was encouraging.

Unfortunately, as I moved forward with the experiments and from the giraffes moved into a more mundane scenery (the team in the office) the results were bizarre, to use a euphemism, and far from being usable in our competitive SEO landscape.

Don’t settle for less than the best model

As I kept experimenting with different images, while happy that I was now able to fully control all the parameters I had to accept that this implementation of the Show and Tell paper was not good enough for our users. Great for generative poetry perhaps but, no good for SEO.

While I am still evaluating new alternatives (there is a very promising attention model implementation in TensorFlow that I would love to test) I had to focus on what the industry considers state-of-the-art for this specific tasks: the Microsoft Vision API. You can play directly online using the http://captionbot.com website and you will see that the results are significantly different than my homebrewed image captioning model in PyTorch.   

Microsoft wisely offers a freemium model and you have up to 5.000 API calls per month to get started without opening your wallet.

Fasten your seatbelts and run the analysis

In order to optimize the description of images for anyone running WordPress, I prepared a script in Python that uses the Microsoft Computer Vision API and that you can find on GitHub.

You will need an API key from Microsoft and the export of your WordPress Media Library in XML that can be generated using the WordPress Export Tool.

The result, from running the script, is a CSV file that contains the URL of the image, the title of the image, the proposed description of the image and a confidence score. This confidence score is very useful to quickly filter the results and to focus your attention where is needed the most (as you can see from the image below there is a big difference between the first image that has a score of 0.5 and the image right after that has a score of 0.8).

Once the data is validated by an editor using Excel or Google Sheet it can be imported back into WordPress using any plugin that imports CSV in the database or a custom script (still need to write it).

Follow the instructions on GitHub or write me an email if are interested in doing image SEO with the help of machine learning. The code is far from perfect and has been only tested on a couple of websites (please use it at your own risk).


Experimenting in ML is essential in today’s SEO automation workflows. A great wealth of resources including pre-trained machine learning models are available and can encode knowledge to help us in SEO tasks.

While the state-of-the-art neural network from Microsoft still interprets a young Bill Slawski (alongside an even younger Neil Patel) as … yes, a woman with a proper workflow you can still get very useful results to scale up your SEO productivity for image tagging.

Bill Slawski and Neil Patel

Bill Slawski and Neil Patel

In the coming weeks, we will keep on testing this approach and hopefully measuring some positive impact in terms of organic traffic (this blog post is still really a work in progress). It is also worth keep on testing new ML networks that take advantage of hierarchical neural attention; these new approaches are superseding models based on RNN / LSTM (here is a good article on the topic).

Keep following us for more insights on SEO, or sign up for a free trial and get the full AI SEO experience.

Meet Doreid, our new SEO Expert!

Meet Doreid, our new SEO Expert!

WordLift is happy to announce a new member of the team – Doreid Haddad!

Doreid is from Syria and moved to Rome in 2014.

Quick Facts

  • Name: Doreid Haddad
  • Age: 29
  • Position at WordLift: SEO Expert
  • Spoken Languages: Arabic (Native Speaker), Italian (C2), English (C1).
  • Bio: An SEO Expert and Digital Marketing Specialist based in Rome. His expertise includes Digital Marketing, Search Engine Optimization, Search Engine Marketing, Keywords Research, and Conversion Rate Optimization. He can’t say no to pizza.


Let’s Get to Know Doreid

  • What’s your Superpower? Analysis and numbers, studying the main web metrics, keyword research and discovery, data analysis, competitor analysis, and content optimization to get results and managing the development process.
  • Where have you lived? Where did you grow up? I was born and grew up in Syria then I moved to Lebanon where I spent some time before settling in Italy in 2013. I worked in the Hospitality & Tourism Sector moving from hotels in my country to the Royal Group of Rome and finally with Marriott International along with the Digital Marketing Sector.
  • What do you like to do in your free time? Football, computer, TV & traveling.
  • If you could describe yourself with an app, what would it be and why? Google Ads App that keeps campaigns running smoothly-no matter where your business takes you, because I am results-oriented, constantly checking in with the goal to determine how close or how far away we are and what it will take to make it happen.
  • If you could be in the movie of your choice, what movie would you choose and what character would you play? La Casa de Papel “Money Heist”, I think I would be a perfect fit for the role of “The Professor”.
  • 3 things you love the most about being a Wordlifter: Working with a highly skilled passionate and well-organized team. Making SEO in all the languages I speak for WordLift international clients. The variety, it is always changing and evolving and I enjoy watching the process of a creative idea grow into a successful business.


The Constant Evolution of Voice Search

The Constant Evolution of Voice Search

The Future of Voice Search

Technology is all around us, and there is no escaping it. The best thing is that it is continually evolving as technological breakthroughs are seen almost on a daily basis. One of the technologies which are getting better every day and making our lives easier is voice search.

When Did It All Begin?

More than half a century ago, IBM introduced IBM Shoebox which was the first speech recognition tool. The father of voice recognition devices was able to recognize 16 words and the digits from 0 to 9. As you will see in the infographic below by SEOTribunal, voice recognition technology has come a long way to become what it is today since its beginnings. Mostly implemented by mobile device manufacturers, today’s voice technology gives users the ability to do online searches, find information about products, ask questions, ask for directions or for the weather forecast, and many other things just by talking to a device.

Evolution of voice search

Evolution of voice search 2

How Does Voice Search Work?

First of all, it processes and transcribes the human speech into text before analyzing it in order to detect questions and commands. After that, it connects to external data sources such as search engines to find the relevant information and translates that information into a digestible format to fulfill the user’s intent.

What Are The Best Voice Search Engines?

There is a continuous battle between big players to make the best voice search engine. This is good news as voice search assistants are becoming more and more sophisticated, thus making our lives easier. Let’s take a look at the top brands manufacturing these devices.

  • Google Assistant is powered by AI and is primarily available on mobile and smart home devices. It was launched in 2016, and the thing which makes it different from its predecessors is that it can engage in two-way conversations.
  • Microsoft’s Cortana was released in April 2014. Available in multiple languages, it has the ability to set reminders, recognizes the natural voice, and answer questions by using the information found on Bing.
  • Amazon’s Echo has many capabilities, such as voice interaction, music playback, creating to-do lists, and streaming podcasts, to name just a few. The best thing about it is that it can be extended by installing other functions which are developed by third-party vendors.
  • Samsung’s Bixby is a voice-powered digital assistant introduced in 2017. It is a major reboot for S Voice. Aside from being used on smartphones and other mobile devices, Bixby is included in Samsung’s Family Hub 2.0 refrigerators, which are the first non-mobile products to include a virtual assistant.
  • Apple’s Siri is part of its iOS, iWatch, MacOS, HomePod, and Apple Tv operating systems. It works by using voice queries and a natural-language user interface to do actions such as answering questions, checking for information, navigating, and many other things.

What Lies Ahead?

As of January 2018, around 1 billion voice searches were made per month, and in the next couple of years, 50% of searches will be made using the voice-enabled technology. It is also predicted that over the next years the voice recognition market is going to experience huge growth, with an estimated $601 million in 2019 only.


We can definitely say that the future of voice-enabled technology is bright. The constant need for improvement is the driving force that makes companies produce the best voice assistants out there. Both the companies and their customers bear the fruit from it.


As the Marketing Manager at SEO Tribunal, part of Tina’s daily engagements involve raising awareness of the importance of digital marketing when it comes to the success of small businesses. As her first step towards this journey was in the field of content marketing, she’s still using every opportunity she gets to put her thoughts into educational articles.

Hristina Nikolovska

Marketing Manager at SeoTribunal.com, Seo Tribunal

Stand out on search in 2019. Get 50% off WordLift until January 7th Buy Now!