{"id":10291,"date":"2023-03-01T12:16:16","date_gmt":"2023-03-01T11:16:16","guid":{"rendered":"https:\/\/wordlift.io\/blog\/en\/?p=10291"},"modified":"2023-03-01T12:35:56","modified_gmt":"2023-03-01T11:35:56","slug":"image-seo-using-ai","status":"publish","type":"post","link":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/","title":{"rendered":"Image SEO: optimizing images using machine learning"},"content":{"rendered":"<p>In this article, I will share my findings while evolving how we use neural networks to describe the content of images. Images greatly contribute to a website\u2019s SEO and improve the <a class=\"wl-entity-page-link\" title=\"UX\" href=\"https:\/\/wordlift.io\/blog\/en\/entity\/user-experience\/\" data-id=\"http:\/\/data.wordlift.io\/wl0216\/entity\/user_experience;http:\/\/yago-knowledge.org\/resource\/User_experience;http:\/\/dbpedia.org\/resource\/User_experience;http:\/\/no.dbpedia.org\/resource\/Brukeropplevelse;http:\/\/de.dbpedia.org\/resource\/User_Experience;http:\/\/en.dbpedia.org\/resource\/User_experience;http:\/\/it.dbpedia.org\/resource\/User_Experience;http:\/\/es.dbpedia.org\/resource\/Experiencia_de_usuario;http:\/\/et.dbpedia.org\/resource\/Kasutajakogemus;http:\/\/id.dbpedia.org\/resource\/Pengalaman_pengguna;http:\/\/pl.dbpedia.org\/resource\/User_experience;http:\/\/da.dbpedia.org\/resource\/Brugeroplevelse;http:\/\/data.wordlift.io\/wl0216\/entity\/user_experience\" >user experience<\/a>. Fully optimizing images is about helping users, and search engines, better understand an article&#8217;s content or the product&#8217;s characteristics.<br \/>The SEO community has always been keen on recommending publishers and shop owners invest in visual elements. This has become even more important in 2023 as Google announced that over 10 billion people already use Lens.<\/p>\n<blockquote>\n<p>With our next generation of AI-powered technology, we\u2019re making it more visual, natural and intuitive to explore information.<\/p>\n<p>Elizabeth Reid VP, Search at Google<\/p>\n<\/blockquote>\n<p><strong>Table of content:\u00a0<\/strong><\/p>\n<ol>\n<li>Google&#8217;s Image SEO best practices in 2023<\/li>\n<li>What is Agentive SEO<\/li>\n<li>How to enrich image alt text on your website using AI.\n<ul>\n<li>Evaluating language-vision models\n<ul>\n<li>Introducing LAVIS (short for LAnguage-VISion)<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/li>\n<li>Running the workflow for automatic image captioning\n<ul>\n<li>Content moderation<\/li>\n<\/ul>\n<\/li>\n<li>Get Comfortable with experiments<\/li>\n<li>Don\u2019t settle for less than the best model\n<ul>\n<li>Visual Question Answering (VQA)<\/li>\n<\/ul>\n<\/li>\n<li>Conclusions<\/li>\n<li>How I worked a few years ago\n<ul>\n<li>Report from the first experiments using CNN and LSTM<\/li>\n<\/ul>\n<\/li>\n<li>Last but not least: Image SEO Resolution<\/li>\n<\/ol>\n<figure id=\"attachment_10292\" aria-describedby=\"caption-attachment-10292\" style=\"width: 800px\" class=\"wp-caption aligncenter\"><img decoding=\"async\" class=\"size-full wp-image-10292\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/google-image-search.png\" alt=\"Google\u2019s Image Search user interface \" width=\"800\" height=\"463\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/google-image-search.png 800w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/google-image-search-300x174.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/google-image-search-768x444.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/google-image-search-150x87.png 150w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><figcaption id=\"caption-attachment-10292\" class=\"wp-caption-text\">Google\u2019s Image Search user interface<\/figcaption><\/figure>\n<p>There are several aspects that Google mentions in its list of <a href=\"https:\/\/developers.google.com\/search\/docs\/appearance\/google-images?hl=en&amp;visit_id=638127477625825740-2430221202&amp;rd=1\">best practices for images<\/a> that have been recently updated, but the work I\u2019ve been focusing on, for this article, is about <strong>providing alt text and captions<\/strong> in a <em>semi-automated way<\/em>. Alt text and captions, in general, improve accessibility for people that use screen readers or have limited connectivity and <strong>help search engines understand what the content of an article is about or what product we are trying to sell.<\/strong><\/p>\n<blockquote cite=\"https:\/\/www.reddit.com\/r\/TechSEO\/comments\/ao3fmk\/i_am_gary_illyes_googles_chief_of_sunshine_and\/\">\n<p>We simply know that media search is way too ignored for what it\u2019s capable doing for publishers so we\u2019re throwing more engineers at it as well as more outreach.<span style=\"font-size: revert\">\u00a0<\/span><\/p>\n<p><span style=\"font-size: revert\"><em>Gary Illyes, Google&#8217;s Chief of Sunshine and Happiness &amp; trends analyst<\/em><\/span><\/p>\n<\/blockquote>\n<p><span style=\"font-weight: 400\">Let\u2019s start with <strong>the basic of Image SEO<\/strong> with this historical video from Matt Cutts that, back in 2007, explained to webmasters worldwide the importance of <\/span><b>descriptive alt text<\/b><span style=\"font-weight: 400\"><\/span> in images.<\/p>\n<p><iframe title=\"Matt Cutts discusses the alt attribute\" width=\"500\" height=\"375\" src=\"https:\/\/www.youtube.com\/embed\/3NbuDpB_BTc?start=3&#038;feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<h2>Google&#8217;s Image SEO Best Practices In 2023<\/h2>\n<p>If you want to understand how images work on Google, I would suggest also watching John Mueller\u2019s latest video on SEO for Google Images.<\/p>\n<p><iframe title=\"SEO for Google Images\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/SfC27XgelgE?start=4&#038;feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe><\/p>\n<p>To summarize, here are the key issues highlighted in <a href=\"https:\/\/developers.google.com\/search\/docs\/appearance\/google-images\">Google\u2019s recent update of its documentation for image SEO<\/a>:<\/p>\n<ul>\n<li>Addition of \u201cWhen possible, <strong>use filenames that are short, but descriptive<\/strong>.\u201c, more emphasis on avoiding generic filenames and removing the need to translate filenames &#8211; in line with <a href=\"https:\/\/lnkd.in\/ebUWqPRp\">John Mueller\u2019s advice<\/a>.<\/li>\n<li><strong>From \u201cchoosing\u201d to \u201cwriting\u201d ALT text<\/strong> &#8211; a small change that could be referring to having human-curated ALTs for web accessibility rather than automated and \u201cchosen\u201d ALTs for the benefit of search engines<em> (one of the reasons we are focusing on this area).<\/em><\/li>\n<li>Replaced the <strong>example.jpg<\/strong> with a descriptive filename example <strong>maine-coon-nap-800w.jpg.<\/strong><\/li>\n<\/ul>\n<p>The credit for spotting this update fully goes to @roxanastingu (head of SEO in Alamy).\u00a0<\/p>\n<p><blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">There have been some *slight* changes to Google&#39;s Image SEO best practices. <a href=\"https:\/\/t.co\/YUYDWfSOPR\">https:\/\/t.co\/YUYDWfSOPR<\/a><br>New version on the left, old on the right with commentary in the\ud83e\uddf5 <a href=\"https:\/\/t.co\/CDe8eWrdxm\">pic.twitter.com\/CDe8eWrdxm<\/a><\/p>&mdash; Roxana Stingu (@RoxanaStingu) <a href=\"https:\/\/twitter.com\/RoxanaStingu\/status\/1620738018742239232?ref_src=twsrc%5Etfw\">February 1, 2023<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script><\/p>\n<h2><span style=\"font-weight: 400\">What is Agentive SEO<\/span><\/h2>\n<p><span style=\"font-weight: 400\">In WordLift we build <\/span><b>agentive technologies for digital marketers<\/b><span style=\"font-weight: 400\">. I had the pleasure of meeting Christopher Noessel in San Francisco and learned from him the principles of <\/span><b>agentive technology<\/b><span style=\"font-weight: 400\"><\/span> (Chris has written a terrific book that I recommend you to read called <a href=\"https:\/\/rosenfeldmedia.com\/books\/designing-agentive-technology\/\"><span style=\"font-weight: 400\">Designing Agentive Technologies<\/span><\/a><span style=\"font-weight: 400\">). <\/span><\/p>\n<p>One of the most critical aspects of designing agentive tech is to focus on efficient workflows to <strong>augment human intelligence with the power of machines<\/strong> by considering the strengths and limitations of today\u2019s AI.<\/p>\n<p>I have been working on this specific task for several years now. I have seen the evolution of deep learning models, vision APIs, and procedures dealing with image and <a class=\"wl-entity-page-link\" title=\"NLP\" href=\"https:\/\/wordlift.io\/blog\/en\/entity\/natural-language-processing\/\" data-id=\"http:\/\/data.wordlift.io\/wl0216\/entity\/natural_language_processing;http:\/\/rdf.freebase.com\/ns\/m.05flf;http:\/\/dbpedia.org\/resource\/Natural_language_processing;http:\/\/be.dbpedia.org\/resource\/\u0410\u043f\u0440\u0430\u0446\u043e\u045e\u043a\u0430_\u043d\u0430\u0442\u0443\u0440\u0430\u043b\u044c\u043d\u0430\u0439_\u043c\u043e\u0432\u044b;http:\/\/ru.dbpedia.org\/resource\/\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430_\u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d\u043d\u043e\u0433\u043e_\u044f\u0437\u044b\u043a\u0430;http:\/\/pt.dbpedia.org\/resource\/Processamento_de_linguagem_natural;http:\/\/bg.dbpedia.org\/resource\/\u041e\u0431\u0440\u0430\u0431\u043e\u0442\u043a\u0430_\u043d\u0430_\u0435\u0441\u0442\u0435\u0441\u0442\u0432\u0435\u043d_\u0435\u0437\u0438\u043a;http:\/\/lt.dbpedia.org\/resource\/Nat\u016bralios_kalbos_apdorojimas;http:\/\/fr.dbpedia.org\/resource\/Traitement_automatique_du_langage_naturel;http:\/\/uk.dbpedia.org\/resource\/\u041e\u0431\u0440\u043e\u0431\u043a\u0430_\u043f\u0440\u0438\u0440\u043e\u0434\u043d\u043e\u0457_\u043c\u043e\u0432\u0438;http:\/\/id.dbpedia.org\/resource\/Pemrosesan_bahasa_alami;http:\/\/ca.dbpedia.org\/resource\/Processament_de_llenguatge_natural;http:\/\/sr.dbpedia.org\/resource\/Obrada_prirodnih_jezika;http:\/\/en.dbpedia.org\/resource\/Natural_language_processing;http:\/\/is.dbpedia.org\/resource\/M\u00e1lgreining;http:\/\/it.dbpedia.org\/resource\/Elaborazione_del_linguaggio_naturale;http:\/\/es.dbpedia.org\/resource\/Procesamiento_de_lenguajes_naturales;http:\/\/cs.dbpedia.org\/resource\/Zpracov\u00e1n\u00ed_p\u0159irozen\u00e9ho_jazyka;http:\/\/pl.dbpedia.org\/resource\/Przetwarzanie_j\u0119zyka_naturalnego;http:\/\/ro.dbpedia.org\/resource\/Prelucrarea_limbajului_natural;http:\/\/da.dbpedia.org\/resource\/Sprogteknologi;http:\/\/tr.dbpedia.org\/resource\/Do\u011fal_dil_i\u015fleme\" >natural language processing<\/a>. I started in 2020 using a pre-trained convolutional neural network (CNN) that extracted the features of the input image. This feature vector was sent to an RNN\/LSTM network for language generation.<\/p>\n<p>In 2023 with the advent of <a href=\"https:\/\/wordlift.io\/blog\/en\/generative-ai-for-seo\/\">Generative AI technology<\/a>, we can introduce completely new workflows that leverage the power of transformer-based models trained on multimodal content. The technology we use has dramatically improved.<\/p>\n<h2>How To Enrich Image Alt Text On Your Website Using AI<\/h2>\n<h3>Evaluating language-vision models<\/h3>\n<p>Evaluating automatic image captioning systems is not a trivial task. Still, it remains an exciting area of research, as human judgments might only sometimes correlate well with automated metrics.<\/p>\n<p>To find the best system, I worked on a small batch of images (50) and used my judgment (as I worked on a domain I was familiar with) to rank the different models. Below is an example of the output provided by a selection of models when analyzing the image on the right.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24216 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Evaluating-language-vision-models_example1.png\" alt=\"Evaluating language-vision models - an example \" width=\"1000\" height=\"353\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Evaluating-language-vision-models_example1.png 1000w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Evaluating-language-vision-models_example1-300x106.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Evaluating-language-vision-models_example1-768x271.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Evaluating-language-vision-models_example1-150x53.png 150w\" sizes=\"(max-width: 1000px) 100vw, 1000px\" \/><\/p>\n<p>I am generally less interested in the model and more interested in <strong>finding the proper framework<\/strong> to work on the different tasks using different models. While running these analyses, I found a modular and extensible library by Salesforce called <a href=\"https:\/\/blog.salesforceairesearch.com\/lavis-language-vision-library\/\">LAVIS<\/a> for language-vision AI.<\/p>\n<h4>Introducing LAVIS (short for LAnguage-VISion)<\/h4>\n<p><span style=\"font-weight: 400\">LAVIS is a library for language-vision intelligence written in Python to help AI practitioners create and compare models for multimodal scenarios, such as image captioning and visual inferencing.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">LAVIS is easy to use and provides access to over <\/span><b>30 pre-trained and task-specific fine-tuned model checkpoints<\/b><span style=\"font-weight: 400\"> of four popular foundation models: ALBEF, BLIP, CLIP, and ALPRO.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">While testing different models in LAVIS, I decided to focus on BLIP2, a contrastive model pre-trained for visual question answering (VQA).\u00a0<\/span><\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24217 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/BLIP2-model.png\" alt=\"BLIP2 model\" width=\"788\" height=\"447\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/BLIP2-model.png 788w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/BLIP2-model-300x170.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/BLIP2-model-768x436.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/BLIP2-model-288x163.png 288w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/BLIP2-model-390x221.png 390w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/BLIP2-model-150x85.png 150w\" sizes=\"(max-width: 788px) 100vw, 788px\" \/><\/p>\n<p><span style=\"font-weight: 400\">These systems combine the ability to extract features from images provided by BLIP2 with frozen large language models such as Google\u2019s T5 or Meta\u2019s OPT. A <\/span><b>frozen language model<\/b><span style=\"font-weight: 400\"> is a pre-trained language model with its parameters fixed and can no longer be updated.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In NLP, this term is commonly used to refer to models used for specific tasks such as text classification, <a class=\"wl-entity-page-link\" title=\"Named-entity recognition\" href=\"https:\/\/wordlift.io\/blog\/en\/entity\/named-entity-recognition\/\" data-id=\"http:\/\/data.wordlift.io\/wl0216\/entity\/named-entity_recognition;http:\/\/rdf.freebase.com\/ns\/m.0658pt;http:\/\/yago-knowledge.org\/resource\/Named-entity_recognition;http:\/\/dbpedia.org\/resource\/Named-entity_recognition\" >named entity recognition<\/a>, question answering, etc. These models can be fine-tuned by adding a few layers to the top of the pre-trained network and training these layers on a task-specific dataset. Still, the underlying parameters remain frozen and do not change during training. The idea behind this approach is to leverage the knowledge and understanding of language learning by the pre-trained model rather than starting from scratch for each new task.\u00a0<\/span><\/p>\n<h2>Running The Workflow For Automatic Image Captioning<\/h2>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24218 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Workflow-For-Automatic-Image-Captioning.png\" alt=\"Workflow For Automatic Image Captioning\" width=\"976\" height=\"170\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Workflow-For-Automatic-Image-Captioning.png 976w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Workflow-For-Automatic-Image-Captioning-300x52.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Workflow-For-Automatic-Image-Captioning-768x134.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Workflow-For-Automatic-Image-Captioning-150x26.png 150w\" sizes=\"(max-width: 976px) 100vw, 976px\" \/><\/p>\n<p><span style=\"font-weight: 400\">In this experiment, we proceed by analyzing a selection of the editorial images taken from the homepage of fila.com (not a client of ours, I used it already in the past for the <\/span><a href=\"https:\/\/wordlift.io\/blog\/en\/internal-linking-category-page\/\"><span style=\"font-weight: 400\">e-commerce internal linking<\/span><\/a><span style=\"font-weight: 400\"> analysis).<\/span><\/p>\n<p><span style=\"font-weight: 400\">These images are particularly challenging as they don\u2019t belong to a specific product or category of products but help communicate the brand\u2019s feelings. By improving the alt text, we want to make the homepage of fila.com more accessible to people with visual disabilities. <\/span><b>Web<\/b> <b>accessibility deeply interconnects with SEO<\/b><span style=\"font-weight: 400\">.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The code is accessible and described in a <\/span><a href=\"https:\/\/wor.ai\/image-captioning\"><span style=\"font-weight: 400\">Colab Notebook<\/span><\/a><span style=\"font-weight: 400\">. <\/span><\/p>\n<p><span style=\"font-weight: 400\">We proceed as follows:<br \/><\/span><\/p>\n<ol>\n<li>We start with Google Sheets (<a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1kbcNp5WRm7jC3V2OVXKelPMs-5EMsdqTJEYKpNk-Bu4\/edit?usp=sharing\">here<\/a>), where we store information on the media files we want to analyze. We use the <a href=\"https:\/\/docs.gspread.org\/en\/v5.7.0\/\"><em>gspread<\/em><\/a> library to read and write back to Google Sheets.<\/li>\n<li>We run the Colab (you will need Colab Pro+ if you want to run the tests on the different options, otherwise, use the simpler model, and you might be able to run it also on a CPU)<\/li>\n<li>We run a validation of the data and some minimal data cleaning (the work here is brief, but in production, you will need to get more into the details).\n<ul>\n<li>To ensure the text doesn\u2019t contain any inappropriate content, we use the <a href=\"https:\/\/platform.openai.com\/docs\/guides\/moderation\">OpenAI moderation end-point<\/a>. You will need to add your OpenAI key.<\/li>\n<li>We also work on rewriting the brand name (from \u201cfila\u201d to \u201cFILA\u201d). This is purely an example to show you that once you have the caption, more can be done by leveraging, for example, the information on the page, such as the title and the meta description or any other editorial rule.<\/li>\n<\/ul>\n<\/li>\n<li>We can now add back the descriptions in Google Sheets, and from there, we will add them to the content management system.<\/li>\n<\/ol>\n<h3>Content moderation<\/h3>\n<p><span style=\"font-weight: 400\">When dealing with brands, we must protect the customer&#8217;s relationship. Automating SEO using ChatGPT is fascinating and highly accessible. Still, when starting a project like setting up a model for automating image captioning, I usually receive the following question \u201c<\/span><b>is it safe to use AI?<\/b><span style=\"font-weight: 400\">\u201d.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">We must be cautious and protect the client\u2019s website against possible misuse of the language. The minimum we can do is work with a <\/span><a href=\"https:\/\/platform.openai.com\/docs\/guides\/moderation\"><span style=\"font-weight: 400\">moderation endpoint<\/span><\/a><span style=\"font-weight: 400\"> like the one provided by OpenAI for free. It uses a GPT-based classifier, is constantly updated, and helps us detect and filter undesired content. <\/span><\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24220 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Screen-Cast-2023-02-28-at-4.31.33-PM.gif\" alt=\"\" width=\"748\" height=\"294\" \/><\/p>\n<p>As we can see from the code snippet below, if we send the first caption being generated, we expect the moderation endpoint to return \u201cFalse\u201d; when trying instead with a violent sentence like the one below, we expect to receive \u201cTrue.\u201d<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24219 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/content-moderation-test.png\" alt=\"content moderation test\" width=\"996\" height=\"166\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/content-moderation-test.png 996w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/content-moderation-test-300x50.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/content-moderation-test-768x128.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/content-moderation-test-150x25.png 150w\" sizes=\"(max-width: 996px) 100vw, 996px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Keep on reading if you are interested in Visual Question Answering experiments or simply access <\/span><a href=\"https:\/\/wor.ai\/image-captioning\"><span style=\"font-weight: 400\">to the code<\/span><\/a><span style=\"font-weight: 400\"> developed while working for one of the clients of our <\/span><a href=\"https:\/\/wordlift.io\/seo-management-service\/\"><span style=\"font-weight: 400\">SEO management services<\/span><\/a><span style=\"font-weight: 400\">.<\/span><\/p>\n<h2>Get Comfortable With Experiments<\/h2>\n<p><span style=\"font-weight: 400\">Machine learning requires a new mindset different from our traditional programming approach. You tend to write less code and <\/span><b>focus most of the attention on the data<\/b><span style=\"font-weight: 400\"> for training the model and the <\/span><b>validation pipeline. <\/b><span style=\"font-weight: 400\">Validation is essential<\/span> <span style=\"font-weight: 400\">to ensure that the AI content is aligned with the brand&#8217;s tone of voice and compliant with SEO and content guidelines.\u00a0<\/span><\/p>\n<h2>Don\u2019t Settle For Less Than The Best Model<\/h2>\n<p>Rarely in our industry can we safely opt for the trendiest model or the most popular API. Setting up your own pipeline for training an ML model, if you are building a product that thousands of people will benefit from is always the recommended path. I could quickly evaluate the results from BLIP, BLI2-OPT, and BLIP2-T5 out of the box using LAVIS. Here below, you can find the percentage of accurate captions generated by each model.<\/p>\n<p>As you can see, based on the human judgment of our SEO team, we generated a suitable caption 71.6% of the time. This percentage dramatically increased as we introduced some basic validation rules (like the rewriting of the brand name from \u201cfila\u201d to \u201cFILA\u201d). These simple adjustments and the fine-tuning of the model typically help us bring the percentage of success above 90%.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24221 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/percentage-of-accurate-captions-generated-by-each-model.jpg\" alt=\"percentage of accurate captions generated by each model\" width=\"855\" height=\"510\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/percentage-of-accurate-captions-generated-by-each-model.jpg 855w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/percentage-of-accurate-captions-generated-by-each-model-300x179.jpg 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/percentage-of-accurate-captions-generated-by-each-model-768x458.jpg 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/percentage-of-accurate-captions-generated-by-each-model-578x346.jpg 578w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/percentage-of-accurate-captions-generated-by-each-model-150x89.jpg 150w\" sizes=\"(max-width: 855px) 100vw, 855px\" \/><\/p>\n<h3>Visual Question Answering (VQA)<\/h3>\n<p>Using LAVIS, we can also experiment with more advanced use cases like VQA: a computer vision task where given a text-based question about an image, the system infers the answer. Let\u2019s review it in action using one of the sample images.<\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24222 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Visual-Question-Answering-VQA.jpg\" alt=\"Visual Question Answering (VQA)\" width=\"986\" height=\"350\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Visual-Question-Answering-VQA.jpg 986w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Visual-Question-Answering-VQA-300x106.jpg 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Visual-Question-Answering-VQA-768x273.jpg 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/Visual-Question-Answering-VQA-150x53.jpg 150w\" sizes=\"(max-width: 986px) 100vw, 986px\" \/><\/p>\n<p>As we can see, the model recognizes and highlights the FILA logo (at least one of the two) in the image.<\/p>\n<h2>Conclusions<\/h2>\n<p><span style=\"font-weight: 400\">Experimenting in ML is essential in today\u2019s <\/span><a href=\"https:\/\/wordlift.io\/blog\/en\/seo-automation\/\"><span style=\"font-weight: 400\">SEO automation workflows<\/span><\/a><span style=\"font-weight: 400\">. Many resources, including pre-trained machine learning models and frameworks like LAVIS, can encode knowledge to help us in SEO tasks.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Below we can appreciate the evolution of the technology and how an image of\u00a0 Bill Slawski (whom I miss a lot \ud83c\udf39) alongside a young Neil Patel is captioned now and how it was captioned a few years back.\u00a0<\/span><\/p>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24223 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/image-captioning-evolution-bill-slawski-and-neil-patel.jpg\" alt=\"image captioning evolution - an example with a picture with Bill Slawski and Neil Patel\" width=\"985\" height=\"562\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/image-captioning-evolution-bill-slawski-and-neil-patel.jpg 985w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/image-captioning-evolution-bill-slawski-and-neil-patel-300x171.jpg 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/image-captioning-evolution-bill-slawski-and-neil-patel-768x438.jpg 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/image-captioning-evolution-bill-slawski-and-neil-patel-288x163.jpg 288w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/image-captioning-evolution-bill-slawski-and-neil-patel-150x86.jpg 150w\" sizes=\"(max-width: 985px) 100vw, 985px\" \/><\/p>\n<h2>How I Worked A Few Years Ago<\/h2>\n<p><span style=\"font-weight: 400\">Here follows how this workflow was originally implemented back in 2019. I left the text untouched as a form of <\/span><b>AI-powered SEO archeology <\/b><span style=\"font-weight: 400\">to help us study the evolution of the techniques.\u00a0<\/span><\/p>\n<h3>Report from the first experiments on automatic image captioning using CNN and LSTM<\/h3>\n<p><span style=\"font-weight: 400\">Armed with passion and enthusiasm I set up a model for image captioning roughly following the architecture outlined in the article \u201c<\/span><a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/04\/solving-an-image-captioning-task-using-deep-learning\/\"><span style=\"font-weight: 400\">Automatic Image Captioning using Deep Learning (CNN and LSTM) in PyTorch<\/span><\/a><span style=\"font-weight: 400\">\u201c that is based on the results published in the <\/span><a href=\"https:\/\/arxiv.org\/abs\/1411.4555\"><span style=\"font-weight: 400\">\u201cShow and Tell: A Neural Image Caption Generator\u201d<\/span><\/a><span style=\"font-weight: 400\"> paper by Vinyals et al., 2014.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The implementation is based on a combination of two different networks:<\/span><\/p>\n<ul>\n<li>A pre-trained <a href=\"https:\/\/arxiv.org\/abs\/1512.03385\"><strong>resnet-152<\/strong><\/a> model that acts as an <strong>encoder<\/strong>. It transforms the image into a vector of features that is sent to the decoder<\/li>\n<li>A <strong>decoder<\/strong> that uses an LSTM network (LSTM stands for Long short-term memory, and it is a Recurrent Neural Network) to compose the phrase that describes the featured vector received from the encoder. LSTM, I learned along the way, is used by Google and Alexa for speech recognition. Google also uses it in Google Assistant and in Google Translate.<\/li>\n<\/ul>\n<p><img decoding=\"async\" class=\"aligncenter wp-image-24224 size-full\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/first-experiments-on-automatic-image-captioning-using-CNN-and-LSTM.png\" alt=\"first experiments on automatic image captioning using CNN and LSTM\" width=\"978\" height=\"474\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/first-experiments-on-automatic-image-captioning-using-CNN-and-LSTM.png 978w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/first-experiments-on-automatic-image-captioning-using-CNN-and-LSTM-300x145.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/first-experiments-on-automatic-image-captioning-using-CNN-and-LSTM-768x372.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/first-experiments-on-automatic-image-captioning-using-CNN-and-LSTM-150x73.png 150w\" sizes=\"(max-width: 978px) 100vw, 978px\" \/><\/p>\n<p><span style=\"font-weight: 400\">One of the main datasets used for training in image captioning is called <\/span><a href=\"http:\/\/cocodataset.org\/#home\"><span style=\"font-weight: 400\">COCO<\/span><\/a><span style=\"font-weight: 400\"> and is made of a vast number of images; each image has 5 different captions that describe it. I quickly realized that training the model on my laptop would have required almost 17 days of non-stop with the CPU running at full throttle. I had to be realistic and downloaded the available pre-trained model.<\/span><\/p>\n<p><span style=\"font-weight: 400\">RNNs for sure, are not hardware friendly and use enormous resources for training.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Needless to say, <\/span><b>I remained speechless<\/b><span style=\"font-weight: 400\"> as soon as everything was in place, and I was ready to <\/span><b>make the model talk<\/b><span style=\"font-weight: 400\"> for the first time. By providing the image below, the result was encouraging.<\/span><\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-10299\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/example.png\" alt=\"\" width=\"331\" height=\"247\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/example.png 331w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/example-300x224.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/example-150x112.png 150w\" sizes=\"(max-width: 331px) 100vw, 331px\" \/><\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-10301\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-giraffe.png\" alt=\"\" width=\"922\" height=\"29\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-giraffe.png 922w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-giraffe-300x9.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-giraffe-768x24.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-giraffe-150x5.png 150w\" sizes=\"(max-width: 922px) 100vw, 922px\" \/><\/p>\n<p><span style=\"font-weight: 400\">Unfortunately, as I moved forward with the experiments and from the giraffes moved into a more mundane scenery (the team in the office) the results were bizarre, to use a euphemism, and far from being usable in our competitive SEO landscape.<\/span><\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-10303\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-13.png\" alt=\"\" width=\"746\" height=\"788\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-13.png 746w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-13-284x300.png 284w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-13-150x158.png 150w\" sizes=\"(max-width: 746px) 100vw, 746px\" \/><\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-10304\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/Pasted_image_at_2019-02-05__3_01_PM.png\" alt=\"\" width=\"800\" height=\"483\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/Pasted_image_at_2019-02-05__3_01_PM.png 800w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/Pasted_image_at_2019-02-05__3_01_PM-300x181.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/Pasted_image_at_2019-02-05__3_01_PM-768x464.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/Pasted_image_at_2019-02-05__3_01_PM-150x91.png 150w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/p>\n<p><img decoding=\"async\" class=\"aligncenter size-full wp-image-10305\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-16.png\" alt=\"\" width=\"824\" height=\"30\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-16.png 824w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-16-300x11.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-16-768x28.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2019\/02\/image-16-150x5.png 150w\" sizes=\"(max-width: 824px) 100vw, 824px\" \/><\/p>\n<h2>Last but not least: Image SEO Resolution<\/h2>\n<p><b>Another important aspect of images in SEO is resolution<\/b><span style=\"font-weight: 400\">. Large images, in multiple formats (1:1, 4:3 and 16:9) are needed by Google to present content in carousels, tabs (rich results on multiple devices) and Google Discover. This is done using <\/span><b>structured data <\/b><span style=\"font-weight: 400\">and following some <\/span><a href=\"https:\/\/developers.google.com\/search\/docs\/advanced\/guidelines\/google-images#supported-image-formats\"><span style=\"font-weight: 400\">important recommendations<\/span><\/a><span style=\"font-weight: 400\">.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">WordLift automatically creates the three versions required by Google for each image, as long as you have at least 1,200 pixels on the smaller side of the image. Since this isn&#8217;t always possible, we&#8217;ve trained a model that can <\/span><b>enlarge and enhance the images on your website<\/b><span style=\"font-weight: 400\"> using the Super-Resolution technique for images.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400\">This is the <\/span><a href=\"https:\/\/wordlift.io\/blog\/en\/ai-powered-image-upscaler\/\"><b>AI-powered Image Upscaler<\/b><\/a><span style=\"font-weight: 400\"> and if you want to learn more about it, how to use it for your image and what results you can get, you can read our article.\u00a0<\/span><\/p>\n<h2>References\u00a0<\/h2>\n<ul>\n<li><a href=\"https:\/\/developers.google.com\/search\/docs\/appearance\/google-images\">Google image SEO best practices<\/a><\/li>\n<li><a href=\"https:\/\/rosenfeldmedia.com\/books\/designing-agentive-technology\/\">Designing Agentive Technology<\/a> &#8211; Christopher Noessel<\/li>\n<li><a href=\"https:\/\/blog.salesforceairesearch.com\/lavis-language-vision-library\/\">Meet LAVIS: A One-stop Library for Language-Vision AI Research and Applications<\/a> &#8211; Dongxu Li, Junnan Li, Steven Hoi, Donald Rose<\/li>\n<li><a href=\"https:\/\/www.analyticsvidhya.com\/blog\/2018\/04\/solving-an-image-captioning-task-using-deep-learning\/\">Automatic Image Captioning using Deep Learning (CNN and LSTM) in PyTorch<\/a> &#8211; JalFaizy Shaikh<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1411.4555\">Show and Tell: A Neural Image Caption Generator<\/a> &#8211; Oriol Vinyals, Alexander Toshev, Samy Bengio, Dumitru Erhan &#8211; Cornell University<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/1512.03385\">Deep Residual Learning for Image Recognition<\/a> &#8211; Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun &#8211; Cornell University<\/li>\n<li><a href=\"https:\/\/vision.cornell.edu\/se3\/wp-content\/uploads\/2018\/03\/1501.pdf\">Learning to Evaluate Image Captioning<\/a> &#8211; Yin Cui, Guandao Yang, Andreas Veit, Xun Huang, Serge Belongie &#8211;\u00a0Cornell University<\/li>\n<li><a href=\"https:\/\/arxiv.org\/abs\/2301.12597\">BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models<\/a> &#8211; Junnan Li, Dongxu Li, Silvio Savarese, Steven Hoi &#8211; Cornell University<\/li>\n<\/ul>\n<p>Learn more about <a href=\"https:\/\/wordlift.io\/blog\/en\/web-stories\/seo-image-optimization\/\"><strong>SEO image optimization<\/strong><\/a>, see our last web story.<\/p>\n\n\n\n","protected":false},"excerpt":{"rendered":"<p>Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.<\/p>\n","protected":false},"author":6,"featured_media":24225,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"wl_entities_gutenberg":"","_wlpage_enable":"","footnotes":""},"categories":[8],"tags":[],"wl_entity_type":[30,3303],"coauthors":[],"class_list":["post-10291","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-seo","wl_entity_type-article","wl_entity_type-faq-page"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Image SEO: optimizing images using machine learning - WordLift Blog<\/title>\n<meta name=\"description\" content=\"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Image SEO: optimizing images using machine learning\" \/>\n<meta property=\"og:description\" content=\"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/\" \/>\n<meta property=\"og:site_name\" content=\"WordLift Blog\" \/>\n<meta property=\"article:published_time\" content=\"2023-03-01T11:16:16+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-03-01T11:35:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrea Volpini\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Image SEO: optimizing images using machine learning\" \/>\n<meta name=\"twitter:description\" content=\"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrea Volpini\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"14 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/\"},\"author\":{\"name\":\"Andrea Volpini\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a\"},\"headline\":\"Image SEO: optimizing images using machine learning\",\"datePublished\":\"2023-03-01T11:16:16+00:00\",\"dateModified\":\"2023-03-01T11:35:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/\"},\"wordCount\":2542,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg\",\"articleSection\":[\"seo\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/\",\"name\":\"Image SEO: optimizing images using machine learning - WordLift Blog\",\"isPartOf\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg\",\"datePublished\":\"2023-03-01T11:16:16+00:00\",\"dateModified\":\"2023-03-01T11:35:56+00:00\",\"description\":\"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.\",\"breadcrumb\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg\",\"contentUrl\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg\",\"width\":1200,\"height\":1200,\"caption\":\"Image SEO using AI\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog\",\"item\":\"https:\/\/wordlift.io\/blog\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Image SEO: optimizing images using machine learning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#website\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/\",\"name\":\"WordLift Blog\",\"description\":\"AI-Powered SEO\",\"publisher\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/wordlift.io\/blog\/en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#organization\",\"name\":\"WordLift\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png\",\"contentUrl\":\"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png\",\"width\":152,\"height\":40,\"caption\":\"WordLift\"},\"image\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a\",\"name\":\"Andrea Volpini\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/image\/466a1652833e48ca11c81b363eba7c25\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg\",\"caption\":\"Andrea Volpini\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Image SEO: optimizing images using machine learning - WordLift Blog","description":"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/","og_locale":"en_US","og_type":"article","og_title":"Image SEO: optimizing images using machine learning","og_description":"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.","og_url":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/","og_site_name":"WordLift Blog","article_published_time":"2023-03-01T11:16:16+00:00","article_modified_time":"2023-03-01T11:35:56+00:00","og_image":[{"width":1200,"height":1200,"url":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg","type":"image\/jpeg"}],"author":"Andrea Volpini","twitter_card":"summary_large_image","twitter_title":"Image SEO: optimizing images using machine learning","twitter_description":"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.","twitter_image":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg","twitter_misc":{"Written by":"Andrea Volpini","Est. reading time":"14 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#article","isPartOf":{"@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/"},"author":{"name":"Andrea Volpini","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a"},"headline":"Image SEO: optimizing images using machine learning","datePublished":"2023-03-01T11:16:16+00:00","dateModified":"2023-03-01T11:35:56+00:00","mainEntityOfPage":{"@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/"},"wordCount":2542,"commentCount":0,"publisher":{"@id":"https:\/\/wordlift.io\/blog\/en\/#organization"},"image":{"@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg","articleSection":["seo"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/","url":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/","name":"Image SEO: optimizing images using machine learning - WordLift Blog","isPartOf":{"@id":"https:\/\/wordlift.io\/blog\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage"},"image":{"@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg","datePublished":"2023-03-01T11:16:16+00:00","dateModified":"2023-03-01T11:35:56+00:00","description":"Images contribute to improve SEO and the user experience on a website. Learn how we used neural networks to describe the content of images.","breadcrumb":{"@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#primaryimage","url":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg","contentUrl":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2022\/03\/automatic-image-captioning.jpg","width":1200,"height":1200,"caption":"Image SEO using AI"},{"@type":"BreadcrumbList","@id":"https:\/\/wordlift.io\/blog\/en\/image-seo-using-ai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog","item":"https:\/\/wordlift.io\/blog\/en\/"},{"@type":"ListItem","position":2,"name":"Image SEO: optimizing images using machine learning"}]},{"@type":"WebSite","@id":"https:\/\/wordlift.io\/blog\/en\/#website","url":"https:\/\/wordlift.io\/blog\/en\/","name":"WordLift Blog","description":"AI-Powered SEO","publisher":{"@id":"https:\/\/wordlift.io\/blog\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/wordlift.io\/blog\/en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/wordlift.io\/blog\/en\/#organization","name":"WordLift","url":"https:\/\/wordlift.io\/blog\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/","url":"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png","contentUrl":"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png","width":152,"height":40,"caption":"WordLift"},"image":{"@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a","name":"Andrea Volpini","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/image\/466a1652833e48ca11c81b363eba7c25","url":"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg","caption":"Andrea Volpini"}}]}},"_wl_alt_label":[],"wl:entity_url":"http:\/\/data.wordlift.io\/wl0216\/post\/image_seo__optimizing_images_using_machine_learning","_links":{"self":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts\/10291"}],"collection":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/comments?post=10291"}],"version-history":[{"count":12,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts\/10291\/revisions"}],"predecessor-version":[{"id":24238,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts\/10291\/revisions\/24238"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/media\/24225"}],"wp:attachment":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/media?parent=10291"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/categories?post=10291"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/tags?post=10291"},{"taxonomy":"wl_entity_type","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/wl_entity_type?post=10291"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/coauthors?post=10291"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}