{"id":20925,"date":"2024-05-11T05:07:42","date_gmt":"2024-05-11T03:07:42","guid":{"rendered":"https:\/\/wordlift.io\/blog\/en\/?p=20925"},"modified":"2024-05-15T11:26:56","modified_gmt":"2024-05-15T09:26:56","slug":"web-scraping-for-seo","status":"publish","type":"post","link":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/","title":{"rendered":"Web Scraping for SEO"},"content":{"rendered":"\n<p>Web scraping is the magical act of extracting information from a web page. You can do it on one page or millions of pages. There are multiple reasons why scraping is essential in SEO:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>We might use it for auditing a website<\/li>\n\n\n\n<li>We might need it in the context of <a href=\"https:\/\/wordlift.io\/blog\/en\/web-stories\/programmatic-seo\/\">programmatic SEO<\/a>&nbsp;<\/li>\n\n\n\n<li>We could use it for providing context to our web analytics<\/li>\n<\/ul>\n\n\n\n<p>Here at WordLift, we primarily focus on structured data and improving the data quality of content knowledge graphs. We depend on crawling to cope with missing and messy data on various use cases.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Extracting Structured Data from Web Pages using Large Language Models<\/h2>\n\n\n\n<p>Recently, I\u2019ve been exploring the potential of&nbsp;<strong>OpenAI function calling<\/strong>&nbsp;for&nbsp;<strong>extracting <a class=\"wl-entity-page-link\" title=\"What Is Structured Data And How to Implement It\" href=\"https:\/\/wordlift.io\/blog\/en\/entity\/structured-data\/\" data-id=\"http:\/\/data.wordlift.io\/wl0216\/entity\/what_is_structured_data_;http:\/\/www.wikidata.org\/entity\/Q26813700\" >structured data<\/a> from web pages<\/strong>. This could be a game-changer for those who, like us, are actively looking to synergize Large Language Models (#LLMs) with<a class=\"wl-entity-page-link\" title=\"Knowledge-Graph\" href=\"https:\/\/wordlift.io\/blog\/en\/entity\/knowledge-graph\/\" data-id=\"http:\/\/data.wordlift.io\/wl0216\/entity\/knowledge_graph;https:\/\/www.wikidata.org\/wiki\/Q33002955\" > Knowledge Graphs<\/a> (#KGs).<\/p>\n\n\n\n<p>Why is this exciting? Because the integration of LLMs with KGs is fast becoming a hot topic in tech, and developing a unified framework that can enrich both LLMs and KGs simultaneously is of significant importance.<\/p>\n\n\n\n<p>By using this&nbsp;<a href=\"https:\/\/wor.ai\/extract-entities\" target=\"_blank\" rel=\"noreferrer noopener\">Colab Notebook<\/a>, you can extract entity attributes from a list of URLs &#8211; even from&nbsp; pages built in JavaScript! I used in this implementation the&nbsp;<a href=\"https:\/\/wordlift.io\/blog\/en\/schema-markup-for-hotels-bb-and-resorts-a-complete-guide\/\" target=\"_blank\" rel=\"noreferrer noopener\">schema for LodgingBusiness<\/a>&nbsp;(hotels, b&amp;b and resorts).<\/p>\n\n\n\n<p>A few lessons learned from this exploration:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>We can seamlessly extract data from webpages using LLMs.<\/li>\n\n\n\n<li>It\u2019s wise to continue using existing scraping techniques where possible. For instance, BeautifulSoup is excellent for scraping titles and meta descriptions.<\/li>\n\n\n\n<li>Using LLMs is slow and expensive, so optimizing the process is key.<\/li>\n\n\n\n<li>After extraction, it\u2019s crucial to thoroughly check and validate the data to ensure its accuracy and reliability. Data integrity is paramount!<\/li>\n<\/ol>\n\n\n\n<p><strong>ScrapeGraphAI &#8211; the New Frontier in Web Scraping<\/strong><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"1280\" height=\"276\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/download-2024-05-10T184409.188.png\" alt=\"\" class=\"wp-image-27024\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/download-2024-05-10T184409.188.png 1280w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/download-2024-05-10T184409.188-300x65.png 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/download-2024-05-10T184409.188-1024x221.png 1024w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/download-2024-05-10T184409.188-768x166.png 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/download-2024-05-10T184409.188-150x32.png 150w\" sizes=\"(max-width: 1280px) 100vw, 1280px\" \/><\/figure>\n\n\n\n<p>I have recently discovered a new fantastic library for AI scraping called <a href=\"https:\/\/github.com\/VinciGit00\/Scrapegraph-ai\"><strong>ScrapeGraphAI<\/strong><\/a>. This Python library uses LLM and <strong>direct graph logic<\/strong> to create scraping pipelines for websites and any type of document (XML, HTML, JSON, etc.).  <\/p>\n\n\n\n<p>This library &#8211; at a first glance &#8211; has proven to be powerful, adapting seamlessly to various web pages, which prompted me to  update the<strong> <a class=\"wl-entity-page-link\"  href=\"https:\/\/wordlift.io\/blog\/en\/entity\/streamlit-seo-automation\/\" data-id=\"http:\/\/data.wordlift.io\/wl0216\/entity\/streamlit;http:\/\/www.wikidata.org\/entity\/Q107384634\" >Streamlit<\/a> web application<\/strong> that you can now immediately use.&nbsp;<\/p>\n\n\n\n<p class=\"has-text-align-center\"><strong><a href=\"https:\/\/wordlift.io\/web-scraping-for-seo-free-tool\/\">Jump to the web application here [using now ScrapeGraphAI] \ud83c\udf88<\/a><\/strong><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"here-is-how-the-scraping-app-works\">Here is how the scraping app works&nbsp;<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Input your OpenAI API key<\/strong> to enable the AI processing.<\/li>\n\n\n\n<li><strong>Provide the URL<\/strong> of the web page you want to crawl. <\/li>\n\n\n\n<li><strong>Enter your scraping instructions<\/strong> in the form of a user prompt. This could include details like the title, price, or SKU, formatted in a way that guides the AI to understand what data to extract.<\/li>\n\n\n\n<li><strong>Hit \u201cCrawl\u201d<\/strong> and let ScrapeGraphAI analyze the page based on your instructions.<\/li>\n\n\n\n<li>Voil\u00e0 <strong>the work is done<\/strong>, and you can now <strong>download a CSV <\/strong>containing, for the page, the required attributes.<\/li>\n<\/ol>\n\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" width=\"2160\" height=\"1292\" src=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo.jpg\" alt=\"\" class=\"wp-image-27025\" srcset=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo.jpg 2160w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo-300x179.jpg 300w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo-1024x613.jpg 1024w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo-768x459.jpg 768w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo-1536x919.jpg 1536w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo-2048x1225.jpg 2048w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo-578x346.jpg 578w, https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2024\/05\/web-scraping-seo-150x90.jpg 150w\" sizes=\"(max-width: 2160px) 100vw, 2160px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"existing-limitations\">Existing limitations<\/h2>\n\n\n\n<p>This is <strong>a demonstrative web app<\/strong>. The UI is a bit clunky when you start refining rules, and in general, it is limited to crawling <strong>only a few URLs<\/strong>. If you are looking for something that scales, I would recommend <a href=\"https:\/\/advertools.readthedocs.io\/\" target=\"_blank\" rel=\"noreferrer noopener\"><strong>Advertools,<\/strong><\/a> a well-known python library developed by the mythical <strong><a href=\"https:\/\/g.co\/kgs\/9yohzX\" target=\"_blank\" rel=\"noreferrer noopener\">Elias Dabbas<\/a><\/strong>. <\/p>\n\n\n\n<p>If you want to see how you can use it, watch this webinar. Here, Elias Dabbas and Doreid Haddad show <strong>how to build a Knowledge Graph using Advertools and WordLift<\/strong>. <\/p>\n\n\n\n<figure class=\"wp-block-embed aligncenter is-type-video is-provider-youtube wp-block-embed-youtube wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe title=\"Build a Knowledge Graph with Advertools and WordLift\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/EwlhuUcYJMI?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p><\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"is-web-scraping-illegal\">Is web scraping illegal?<\/h3>\n\n\n\n<p>No, web scraping is, generally, legal, which is why commercial search engines exist. However, there are some considerations to be made:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Some websites might have terms and conditions that do not allow scraping;<\/li>\n\n\n\n<li>Technically speaking, scraping is a task that consumes a significant amount of bandwidth and computational resources. We shall do it only when it is needed. Google itself is reviewing its indexing policies to be more environmentally friendly; we should do it too.<\/li>\n\n\n\n<li>How we use the extracted data makes a huge difference. We want to be respectful of others&#8217; content and aware of potential copyright infringements.&nbsp;<\/li>\n<\/ol>\n\n\n\n<p>You can find more useful information around this topic <a href=\"https:\/\/www.scraperapi.com\/blog\/is-web-scraping-legal\/\" target=\"_blank\" rel=\"noreferrer noopener\">here<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"how-can-we-scrape-information\">How can we scrape information?&nbsp;<\/h3>\n\n\n\n<p>Here is the thread for you:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"500\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">\ud83d\udc38 How to scrape content? \ud83d\udc38<br><br>Today, I&#39;ll walk you through all free and paid options you have to extract content from a webpage. <br><br>Part of our job is to extract content to use it for our analysis. So it can be useful to know how to do it. <a href=\"https:\/\/t.co\/uPH2cb2eT3\">pic.twitter.com\/uPH2cb2eT3<\/a><\/p>&mdash; Antoine Eripret (@antoineripret) <a href=\"https:\/\/twitter.com\/antoineripret\/status\/1494288488556109826?ref_src=twsrc%5Etfw\">February 17, 2022<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div><figcaption class=\"wp-element-caption\"><br><br><\/figcaption><\/figure>\n\n\n\n\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Web scraping is the magical act of extracting information from a web page. I found ScrapeGraphAI very powerful and I built a simple Streamlit web application that you can immediately use.<\/p>\n","protected":false},"author":6,"featured_media":27032,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"wl_entities_gutenberg":"","_wlpage_enable":"","footnotes":""},"categories":[28,8],"tags":[],"wl_entity_type":[30,3303],"coauthors":[4226],"class_list":["post-20925","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-world-summit-ai","category-seo","wl_entity_type-article","wl_entity_type-faq-page"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.6 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Web Scraping for SEO - WordLift Blog<\/title>\n<meta name=\"description\" content=\"Web scraping is the magical act of extracting information from web pages. ScrapeGraphAI is powerful and I built a Streamlit web application.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Web scraping for SEO using ScrapeGraphAI\" \/>\n<meta property=\"og:description\" content=\"Web scraping is the magical act of extracting information from a web page. I found ScrapeGraphAI very powerful and I built a simple Streamlit web application that you can immediately use.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/\" \/>\n<meta property=\"og:site_name\" content=\"WordLift Blog\" \/>\n<meta property=\"article:published_time\" content=\"2024-05-11T03:07:42+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-05-15T09:26:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-86.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"1200\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Andrea Volpini\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:title\" content=\"Web scraping for SEO using ScrapeGraphAI\" \/>\n<meta name=\"twitter:description\" content=\"Web scraping is the magical act of extracting information from a web page. I found ScrapeGraphAI very powerful and I built a simple Streamlit web application that you can immediately use.\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-86.jpg\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Andrea Volpini\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/\"},\"author\":{\"name\":\"Andrea Volpini\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a\"},\"headline\":\"Web Scraping for SEO\",\"datePublished\":\"2024-05-11T03:07:42+00:00\",\"dateModified\":\"2024-05-15T09:26:56+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/\"},\"wordCount\":719,\"publisher\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#organization\"},\"image\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg\",\"articleSection\":[\"AI &amp; Machine Learning\",\"seo\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/\",\"name\":\"Web Scraping for SEO - WordLift Blog\",\"isPartOf\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg\",\"datePublished\":\"2024-05-11T03:07:42+00:00\",\"dateModified\":\"2024-05-15T09:26:56+00:00\",\"description\":\"Web scraping is the magical act of extracting information from web pages. ScrapeGraphAI is powerful and I built a Streamlit web application.\",\"breadcrumb\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg\",\"contentUrl\":\"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg\",\"width\":1200,\"height\":1200,\"caption\":\"Web Scraping for SEO\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog\",\"item\":\"https:\/\/wordlift.io\/blog\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Web Scraping for SEO\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#website\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/\",\"name\":\"WordLift Blog\",\"description\":\"AI-Powered SEO\",\"publisher\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/wordlift.io\/blog\/en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#organization\",\"name\":\"WordLift\",\"url\":\"https:\/\/wordlift.io\/blog\/en\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png\",\"contentUrl\":\"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png\",\"width\":152,\"height\":40,\"caption\":\"WordLift\"},\"image\":{\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/\"}},{\"@type\":\"Person\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a\",\"name\":\"Andrea Volpini\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/image\/466a1652833e48ca11c81b363eba7c25\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg\",\"caption\":\"Andrea Volpini\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Web Scraping for SEO - WordLift Blog","description":"Web scraping is the magical act of extracting information from web pages. ScrapeGraphAI is powerful and I built a Streamlit web application.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/","og_locale":"en_US","og_type":"article","og_title":"Web scraping for SEO using ScrapeGraphAI","og_description":"Web scraping is the magical act of extracting information from a web page. I found ScrapeGraphAI very powerful and I built a simple Streamlit web application that you can immediately use.","og_url":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/","og_site_name":"WordLift Blog","article_published_time":"2024-05-11T03:07:42+00:00","article_modified_time":"2024-05-15T09:26:56+00:00","og_image":[{"width":1200,"height":1200,"url":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-86.jpg","type":"image\/jpeg"}],"author":"Andrea Volpini","twitter_card":"summary_large_image","twitter_title":"Web scraping for SEO using ScrapeGraphAI","twitter_description":"Web scraping is the magical act of extracting information from a web page. I found ScrapeGraphAI very powerful and I built a simple Streamlit web application that you can immediately use.","twitter_image":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-86.jpg","twitter_misc":{"Written by":"Andrea Volpini","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#article","isPartOf":{"@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/"},"author":{"name":"Andrea Volpini","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a"},"headline":"Web Scraping for SEO","datePublished":"2024-05-11T03:07:42+00:00","dateModified":"2024-05-15T09:26:56+00:00","mainEntityOfPage":{"@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/"},"wordCount":719,"publisher":{"@id":"https:\/\/wordlift.io\/blog\/en\/#organization"},"image":{"@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage"},"thumbnailUrl":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg","articleSection":["AI &amp; Machine Learning","seo"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/","url":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/","name":"Web Scraping for SEO - WordLift Blog","isPartOf":{"@id":"https:\/\/wordlift.io\/blog\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage"},"image":{"@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage"},"thumbnailUrl":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg","datePublished":"2024-05-11T03:07:42+00:00","dateModified":"2024-05-15T09:26:56+00:00","description":"Web scraping is the magical act of extracting information from web pages. ScrapeGraphAI is powerful and I built a Streamlit web application.","breadcrumb":{"@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#primaryimage","url":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg","contentUrl":"https:\/\/wordlift.io\/blog\/en\/wp-content\/uploads\/sites\/3\/2023\/06\/Blog-Covers-2024-05-10T190403.506.jpg","width":1200,"height":1200,"caption":"Web Scraping for SEO"},{"@type":"BreadcrumbList","@id":"https:\/\/wordlift.io\/blog\/en\/web-scraping-for-seo\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog","item":"https:\/\/wordlift.io\/blog\/en\/"},{"@type":"ListItem","position":2,"name":"Web Scraping for SEO"}]},{"@type":"WebSite","@id":"https:\/\/wordlift.io\/blog\/en\/#website","url":"https:\/\/wordlift.io\/blog\/en\/","name":"WordLift Blog","description":"AI-Powered SEO","publisher":{"@id":"https:\/\/wordlift.io\/blog\/en\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/wordlift.io\/blog\/en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/wordlift.io\/blog\/en\/#organization","name":"WordLift","url":"https:\/\/wordlift.io\/blog\/en\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/","url":"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png","contentUrl":"https:\/\/mk0wordliftblog7j5te.kinstacdn.com\/wp-content\/uploads\/sites\/3\/2017\/04\/logo-1.png","width":152,"height":40,"caption":"WordLift"},"image":{"@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/logo\/image\/"}},{"@type":"Person","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/574352082cc71dab8d164410f1cabe0a","name":"Andrea Volpini","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wordlift.io\/blog\/en\/#\/schema\/person\/image\/466a1652833e48ca11c81b363eba7c25","url":"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/6b9d3d311b50a8749201fe4b318907a8?s=96&d=mm&r=pg","caption":"Andrea Volpini"}}]}},"_wl_alt_label":[],"wl:entity_url":"http:\/\/data.wordlift.io\/wl0216\/post\/web-scraping-in-seo-20925","_links":{"self":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts\/20925"}],"collection":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/comments?post=20925"}],"version-history":[{"count":24,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts\/20925\/revisions"}],"predecessor-version":[{"id":27036,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/posts\/20925\/revisions\/27036"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/media\/27032"}],"wp:attachment":[{"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/media?parent=20925"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/categories?post=20925"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/tags?post=20925"},{"taxonomy":"wl_entity_type","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/wl_entity_type?post=20925"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/wordlift.io\/blog\/en\/wp-json\/wp\/v2\/coauthors?post=20925"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}